Digital Audio Clock Accuracy When Recording Audio for Video with Long Takes
Digital audio is recorded by sampling the level of the analog audio waveform at regular intervals. The rate at which the audio is sampled is of course known as the “sampling rate” and the standard for film
and video production is generally 48 KHz. If we do a little math, we can calculate that with a sampling rate of 48 KHz, the audio must be sampled 48 times every thousandth of a second (ms). The result of inaccuracies in the sample rate clock is audio drift. When audio alone is recorded, minor drifts are virtually undetectable. When audio must be synchronized against picture, however, even minor drifts can create chaos in the editing room, with the result being that for any given shot there will be more or less audio, depending on the nature of the drift.
When you multiply the amount of audio and video recorded by the number of cameras and number of tracks being recorded in a given reality production, even small drifts become enormously costly (at least in terms of time) to constantly correct. Of course, other things go wrong during production that can cause time code issues as well, and in my opinion that makes it all the more imperative to start with a completely stable timebase for the audio recording.
Let’s look at how drift plays out with a high end pro-sumer mixer/firewire audio interface, such as a Mackie Onyx 1640 or PreSonus StudioLive. Both of these products have excellent value, combining good sound quality and ease of use with low price. However, they lack one important feature: The ability to sync to an external clock such as wordclock. The end user is therefore required to use the device’s internal sample rate clock. The component that drives the sample rate in any digital audio interface is a crystal controlled oscillator, and its accuracy is measured in Parts Per Million, or ppm. Every manufacturer has to make compromises to get their product to market, and since this interface was designed mainly for music recording, the crystal typically chosen for this purpose is spec’d with an accuracy of 50 ppm, which is good for even high end music recording. But when using this product to record sound for picture, that number tells a very different story:
An oscillator with an accuracy of 50 ppm translates to a timebase drift of about .05 ms, or 2.4 samples per second. Remember, MetaCorder rigs are often left recording for a few hours at a time, but for the sake of simplicity, let’s say that the rig is making a one hour recording. Multiplied out, 2.4 samples per seconds becomes 8,640 samples per hour. Since there are about 1,600 samples per video frame, this equates to 5.4 frames per hour of drift – that’s both audio and time code drift. Of course, some individual units may be more accurate than 50 ppm (the spec indicates the maximum oscillator drift), but without the ability to sync to an external source, the Mackie and Presonus mixers tie the customers hands.
There are a few ways to insure audio recordings made will be accurate enough for recording with picture:
1) Use a master wordclock generator with high accuracy and low jitter. Two examples are the Rosendahl Nanosync HD and the Brainstorm Electronics DCD-8. Both devices also have the added benefit of being able to sync from not just external word clock but external video sources as well. The Nanosync can natively generate video sync signals and timecode (ensuring perfect phase accuracy between video, timecode and word clock, while the DCD-8 can optionally generate video sync ‘ features perfect for multicamera video shoots.The Rosendahl Nanosync HD specifies a crystal of 0.5 parts per million, which translates to a drift of .054 frames per hour, or 1 frame in 18 hours – 100 times more accurate than the typical mixer with built in firewire interface. The Rosendahl or Brainstorm would then supply wordclock to the audio interface ‘ just remember to set the device to external sync!
2) Along the lines of option number one, you can use a professional audio recorder designed for film and television production to supply wordclock. The Sound Devices 788T, for example, is specified with a crystal capable of being tuned to 0.2 ppm.
3) Use an audio interface designed with film and television applications in mind. The Metric Halo 2882, ULN-2 2d interfaces, for example are specified with an accuracy of 5 ppm (and are often more accurate in practice). The RME Fireface 800 with the TCO (video and Time Code) option is another example, and is the only interface that can natively generate a sampling rate of 47952 and 48048 ‘ useful in some film workflows.
Remember, no matter what your workflow is, the most important element is to test it. For some productions, any audio drift can be dealt with by simply varispeeding the audio to match the picture. Other productions may find that solution intolerable, and require frame accurate audio and timecode.
Did you enjoy this article? Sign up to receive the StudioDaily Fix eletter containing the latest stories, including news, videos, interviews, reviews and more.