Podcast Perspective #5: Audio Processing

Update: 2011-05-27

Description

The “dark art” of audio processing is a powerful tool to that you can use to help your podcast sound its best. Don’t be “that guy” who sounds like he’s recording with tin cans and string in a cave. No matter your equipment, a little magic can go a long way. Learn the concepts behind compression, limiting, expansion, gating, and equalization, and how to use them to your advantage.

Audio Processing

Audio processing refers to any intentional alteration of sound. There are a variety of types of processing, but the ones of interest to us that we are covering are compression/expansion/limiting/gating and equalization..

Remember: Garbage In/Garbage Out–make sure your raw audio is the best quality it can be. For more information, go back to Podcast Perspective Episode #2. Additionally, some elements covered build upon fundamental audio concepts from Podcast Perspective #1.

The biggest pitfall is that it’s easy to go nuts; it’s a useful tool, but don’t over do it. No amount of processing can turn you in to James Earl Jones. Audio processing can be a very complex subject–It’s an art, not a science. There are no one size fits all settings to use; it’s all dependent on your environment, equipment, personal taste, and goals. This is to teach you the concepts and give you the tools to go out there and find what works best for you.

Dynamic Range

Compression, limiting, expansion, and gating–all various forms of the same fundamental concept, collectively referred to as dynamic range processing.

In audio, dynamics, or more specifically, dynamic range, means the difference between the loudest and softest sounds. The human voice is extremely dynamic–not only can we talk extremely soft or extremely loud, but there are rhythmic variations in volume from word to word, from each syllable to the next.

Dynamics processing helps even out these changes in volume to keep your podcast at a comfortable listening level.

If you’ve ever been listening to something at a comfortable volume, until you have to turn it up to hear, and then you are suddenly being blasted as it becomes super loud, you’ve experienced dynamics first hand. Even when the volume setting never changed, the percieved volume could be quite different.

You may have peak normalized your podcast to 100%, or 0 db, so that you can’t turn up the gain, you can’t increase the volume without clipping and introducing distortion. Peak level doesn’t mean actual loudness, it’s simply the highest single peak; average level (RMS) has more to do with perceived loudness–think density.

If all of this sounds a little complicated, or too much work, you’re in luck. There’s a free, cross-platform tool called The Levelator from the Conversations Network, which is free and runs on Windows and Mac, that will get your audio to a consistent, comfortable level. There’s no confusing options or settings–give it your raw audio, and it outputs a “levelated” version.

Levelator is designed for voice, so if you use it, be sure to use it only on your vocal track, before you add any production elements such as music, which may result in unpredictable undesirable artifacts.

I do recommend Levelator–it works well for what it does. But “one size fits all” won’t be perfect. You may get better results by delving in processing the file yourself.

Compression is kind of like automatic volume control–when the volume goes above the threshold, the gain is turned down, determined by the ratio.

If my threshold is set to -9 db, and I’m talking, but I get excited and briefly go up to -6 db, you went over the threshold by 3 db. The ratio determines how much it is turned down in relation to how much you went over the threshold, indicating the difference between the input coming in and the output. If my ratio is 3:1, and I went over the threshold 3 db, it turns it down so that it is only 1 db, so instead of peaking at -6 db, I peak at -7 db. A compressor and limiter are effectively the same thing–a limiter is a compressor with a high ratio, such as 10:1 or higher. For example, a brick wall limiter has a an effective ratio is infinity, all but preventing levels from going above the threshold entirely, but doesn’t sound as natural and can introduce distortion.

Attack and release provide control over how quickly the compressor acts. Attack time is how long the compressor waits when the threshold has been reached before it starts working, and release is how long it waits after the signal has fallen below the threshold, both in milliseconds (ms), where 1000 ms equals 1 second. If your attack and release are too slow, the compressor will constantly be switching on and off.

An expander or gate are the same concept, but in reverse, turning down the audio when it goes below the threshold, such as background noise when you pause speaking. A gate is to an expander what a limiter is to a compressor–a gate is an expander with a very high ratio. Using a gate may result in an audible click as it switches on and off, especially if the threshold is set too high; using a modest expander instead may give a more transparent sound.

Example Settings:

Compressor

Threshold: – 9db

Ratio: 2:1

Attack: 100 ms

Release: 300 ms

Expander

Threshold: -48 db

Ratio: 2:1

Attack: 100 ms

Release: 300 ms

The maximum ratio I’d recommend using is 4:1, which is a substantial amount, and higher values will sound unnatural. You can also probably use decimal numbers, such as a ratio of 2.5 or 1.5:1 for more subtle control. Remember that the key is subtlety–if used too aggressive, it can be distracting and make things sound worse.

Equalization

Equalization is the process of adjusting frequencies in an electronic signal for balance, or in our case, adjusting audio frequencies for aesthetic reason or to reduce unwanted sounds. Equalization can give fine, surgical control over individual frequencies.

Pitch is determined by frequency–the higher the frequency the higher the pitch, and the lower the frequency the lower the pitch.

The range of human hearing is approximately 20 Hz – 20 kHz, though those lowest frequencies we more feel than hear, and we lose the ability to hear those highest frequencies as we age, so this is a best case. The range of the human voice is approximately 60 Hz to 16 Hz, depending on your individual voice–male voices are usually lower and deeper than female voices, which tend to be slightly higher.

If you are using a 44.1 kHz sample rate–as you should be–the frequency range goes up to 22.05 kHz. The frequency range is not linear, logarithmic–for example, the “mids” are not around 10 kHz, but around 3.5 kHz, and highs are above 10 kHz.

The simplest type of equalization is the high pass/low cut and low pass/high cut filter. A high pass/low cut filter “passes over” frequencies frequencies higher and turns down frequencies below. A high pass filter and low cut filter are two different terms for the same thing. A low cut filter is useful for reducing bass frequencies below the human voice where unwanted noise such as microphone rumble is common; for example, 80 Hz. Similarly, a low pass/high cut filter is useful for reducing frequencies above the human voice, such as anything above 16 kHz.

Equalization is truly an art form that requires experimentation and practice, and a discerning ear, but here are some possible “trouble frequencies”:

Sub-Bass/Rumble: Below 60 Hz

Boomyness: 200-400 Hz–reducing may increase intelligiblity

Sibilance: harsh “s” sounds: 3-5 kHz

Plosives: popped “p” and “b”: can be fixed by selecting just the pop and applying a low cut, for example 150 Hz

Presence: 2-4 Hz–subtle boost can make the voice sound more “forward”

(All frequencies approximate)

It might sound counter-intuitive, but sometimes, instead of turning up certain frequencies (where audio may clip), it might be better to turn other ones down. For example, if you want to give it that little extra punch, instead of boosting the lows and the highs, cut some of the mids.

Equalization is a powerful tool that is often misused. (Ever hear a podcast where it sounds like someone turned the “bass boost” up to 11?) It’s important to use subtly.

It’s important to remember that unless you have professional studio monitors, your speakers or headphones are designed to sound aesthetically pleasing, not necessarily accurate. What sounds good on your speakers might sound horrible on someone else’s headphones, so it’s important to listen critically. If there aren’t any apparent problems, no fine adjustment may be necessary.

Real-Time (Hardware) vs. Post-Production (Software)

There are two different philosophies for how to apply audio processing, both with their own inherent strengths and weaknesses: real-time, using external hardware, or in post-production, in software.

Hardware is quicker, because you don’t have to do any processing in post-production, and is especially great for “live to hard drive” productions or if you stream your show over the Inte