Audio mastering is the collective name for the final steps used in making an audio recording, prior to mass replication. Mastering consists of physical mastering and pre-mastering. For vinyl and pressed CDs, physical mastering is the process of making the master used for pressing. Pre-mastering is described in the rest of this article.
In the digital era, the main mastering steps to consider are putting the tracks in the correct order, TOC creation, compression, volume leveling and sub-code insertion. Frequency equalisation can also be employed, to balance the relative volumes of each of the main frequency bands. Different equalisation curves may be used when generating different pre-masters for different delivery media, such as FM radio or MP3-like downloads.
A common mastering situation, that we cover here, is to take a 24-bit stereo pre-master and convert it to a 16-bit stereo final master for CD pressing. The input and output sample rates may both be 44.1 kHz, although increasingly the input sample rate may be higher when SACD masters are also to be produced.
It is also common to start with a number of separate pre-masters called separations. These can be mono tracks, each provided with an extra pan parameter, but more commonly they are stereo tracks, including stereo reverb returns. Separations allow more freedom at the mastering stage, but separation mastering is not covered further by this web page.
Pre-masters should contain a silent leader of a few seconds. The silent leader actually contains low-level background noise. Frequently this noise arises from limit cycles in mixdown effects units, but it can also be ambient live noise from a studio. The mastering process can employ an adaptive noise canceller that is trained from the leader that eliminates the background noise. The noise canceller should be faded out once the music level gets to be ten or so dB higher than the noise.
Note, any precusrsor of noise from effects returns is best removed at source by fading up the effects returns as the piece starts: a simple step on an automated mixer.
As well as the 1.4 Mbps audio data stream, a CD carries a pair of much lower-rate data streams known as the P and Q subcodes. These carry a time code consisting of track/index/minute/second/frame point as well as track titles, SCMS copy protection and record label ISRC catalog/recording codes. During mastering, this information is frequently held in cue sheet file (.cue). The subcode can be inserted by the mastering software, but a better approach can be for the mastering software to produce a wave (.wav) file for each track and an overall cue sheet. These are combined by the CD writing software (e.g. the linux cdrecord program).
The pre-master will generally have more dynamic range than desired. Although the pre-master may sound fine on recording studio monitors or a good-quality HiFi, it is generally not suitable for everyday listening where there are other sounds present. For instance, when played on the road in a vehicle, the listener might find himself frequently adjusting the volume so that the music is not too loud, but it still can be heard above the engine and road noise in the quiet parts. A baseline mastering compression would have 3dB range, no-prelookup and 10 second release/decay time.
Before mixing, the peak to average ratio (PAR) of any individual track is realtively high. It can be reduced by soft clipping the tops of a handful of transients. Hard-clipping introduces harmonics that alter the tone and can be unpleasant. When mastering, the soft clipping technique should preferably be used on each individual separation. Applying this technique to a mixed track introduces intermodulation distortion between the components, and so it must be applied far more carefully.
Automated clip detection and removal is a converse function. Sometimes there is no chance of re-recording a track (eg. a live event) and a clipped pre-master must be used. An engineer can manually edit each hard clip into a smoother shape, but software can help by detecting clip events and then offering the engineer a first draft rounded version for audition and as a basis for any manual editing. Again, clip removal should preferably be done on a single track before mixdown and effects return, or, if needs be, on a separation master.
Volume leveling, also known as sound pressure level (SPL) maximizing, is essentially the process of altering the gain such that the track is as loud as possible without clipping. The most simple form of volume leveling calculates and applies a single gain value for the duration of a piece. Increasingly advanced systems of leveling use progressively greater numbers of volume settings and phase adjustments.
The above plot shows the volume levels occurring in a 170 second piece of music. The wiggly curve was produced by taking the maximum absolute value occurring in each second. This is called the raw volume. The smoother curve is the convex hull of the raw volume where the first and last volumes have been replaced with the average for the piece.
The reciprocal of the convex hull forms a suitable gain profile for volume leveling. The following plot shows the same track after being processed in this way. The original recording had some 3 dB of headroom whereas the processed version has zero headroom and the convex hull is characteristically flat throughout the body.
Since many tracks are intended to begin and end quietly, the choice of starting and ending gain used can make quite a difference to the overall artistic effect. In the above plots, the average level was used, but if a lower value is used, such as the measured value in the plots, the effects of fades in and out is diminished. We use the term `artistic distortion' to describe variations in gain that alter the mix in undesirable ways.
Since the pre-master may not be evenly balanced in volume between left and right channels, the complex hull method can be applied to each channel in turn, but, again for preservation of artistic content, the effect of re-balancing should be limited to a few decibels by default.
The straightforward convex hull method is not suitable for very long tracks, since the volumes at certain points in the pre-mastering can end up altering the volumes a long way earlier and/or later in the master. However, the basic hull algorithm is readily altered to pay less attention to high values that are a long way off. An ideal algorithm uses something between the convex hull and a bi-directional running average, but must ensure that it correctly encloses every peak that would be enclosed by the hull, otherwise there will be negative headroom and hard clipping would result.
The following plot shows the mastering hull created for the same piece of music, when the horizion is limited to a 25 second range.
Another common form of volume leveling operates by splitting each channel into three frequency bands and altering the volume levels in each channel using the convex hull. Hence a total of six hulls are required. Again, constraints should be imposed so that on the relative action in each band, so that artistic distortion is limited to a few decibels.
The human ear is not sensitive to phase whereas varying the relative phases of signals that are summed can alter the peak value of that sum. A phase shifter is an all-pass filter that has unity gain at every frequency, but whose phase response is a function of frequency. Phase shifters are easy to implement. They alter the waveform considerably, but have no audible effect provided their internal parameters are adjusted slowly and both left and right channels receive identical treatment. Each individual channel of the pre-master is a sum of many mixdown channels and each of these is the sum of various harmonics, hence phase shifting can be usefully applied without having a multi-channel pre-master.
It is hard to predict the effect of a phase shifter on the waveform and hence the easiest way to use a phase shifter during mastering is to select a part of the track where a peak is present and perform a number of trials with different phase-shifter parameters. The result that gives the lowest peak-to-peak waveform at that point is selected as the desired setting for that point in the track. This is then repeated for each peak in the section, resulting in a number of desired phase shifter parameter sets. A slowly-changing waveform is then created that passes through each of the parameter sets at the appropriate time in the piece. Selection is based on the lowest peak-to-peak range rather than on the lowest peak amplitude on the assumption that a stage of low-frequency insertion/removal will then be applied.
Phase shifters can be created from FIR filters or bi-quads. When implemented using analog components, phase shifters were formed with a chain of four to six sections each with transfer function (1+st)/(1-st) where `t' was the time constant of an RC filter. Parameterisation is achieved by changing this time constant. The phase delay of each section changes by 180 degrees as the frequency passes through wCR=1. The bi-quad approach to digital implementation places one of these sections in each bi-quad.
The constraint on phase shifting the left and right channels identically can be rephrased if the signal is converted to M/S form before phase shifting and restored to L/R form afterwards. M/S form consists of two channels, one formed by L+R and the other by L-R. The L+R channel is a mono representation of the original content and L-R is the stereo difference signal. M/S form is widely used in audio engineering where a pair of channels are available but their relative phases may vary uncontrollably (eg. when using a pair of ISDN lines to carry a stereo outside broadcast). When using this for mastering, each trial needs to be converted back to L/R form before looking at the peaks.
The human-ear cannot hear DC or very low frequencies. Most CD players will not reproduce any frequencies below 10 Hz and hence we have freedom to add or remove low-frequency waveforms. Such addition and removal will greatly alter the peak excursion of the master. This form of processing will also compensate for any DC offset present in the pre-masters.
A mastering technique based on low-frequency compensation operates by first finding a set of points where peaks occur in the track and then measures the DC offset around each peak. For instance, at a point where all the peaks happen to be positive, adding a negative offset to the waveform will reduce the peak without an audible effect. Again, we must produce a slowly-changing waveform that passes through each of the measured DC offsets at the appropriate time in the piece. Ideally, it should also track the low-frequency components elsewhere in the piece, to avoid introducing new peaks in the final step. The final step is that this waveform is subtracted from the original, removing some low frequency components and adding new ones.
The MP3-encoded waveform of a piece of music tends to have little visible resemblance to the original, even though it might be a high-quality `lossless' MP3. Clearly, it will have different-shaped peaks in its waveform and hence might be audibly louder when played with the same peak amplitude. Hence, it might be a better candidate for mastering than the original.
There are two aspects to an MP3 codec that are having an effect: phase shifting and psycho-acoustic masking. MP3 is a so-called busgang system, which means the sound is converted into a parallel set of frequency bands. Quadrature mirror filters can be used for this since they have trapezoid roll off. Any other filter bank system will serve, provided that it has flat gain after output summation. Each filter can have different phase delays, and hence the overall waveform is changed as with the all-pass phase shifters. The more significant part of MP3 encoding is the elimination of power in frequency bands that are quiet compared with their neighbours. This is called psycho-acoustic masking. Any procedure that approximates to the MP3 system will change the peak waveform in an unpredictable way.
To adapt a busgang approach for mastering, it is best to think of the overall output as the sum of a large number of wavelets, each with its own place in the piece. We are free to invert any wavelet, or otherwise phase shift it, such that its contribution to the output peaks is minimised. This gives a complicated knapsack problem, where wavelets with the most energy are considered the most important.
Again, busgang techniques are best implemented in M/S pair form, to avoid undesirable phasing effects in the resultant audio.
The length of audio masters for CD pressing should be a multiple of 588 samples (2352 bytes), the CD frame size, but most physical mastering (CD writing) software will round up to the next boundary as needed.
Remember to listen to your mastered tracks on as many different systems as possible to be sure they all sound good. Include a car system, a ghetto blaster, headphones, a laptop and something that is mono.
Try the Newmew Compostor CD mastering program: Newmew Composter.
(C) 2003-6 DJ Greaves. Mixerton ST/University of Cambridge.