The idea behind perceptual CODECs – Legato Communications llc

Question: If you were to drop a rock and a tiny pin onto a table, and both land at the same time…do they both make a sound?

Yes they do.

Can you hear them both? Probably not…

Why?

Logic would dictate the reason why is because the sound of the rock is so loud that it would “cover” those of the tiny pin. This concept is known as masking. The really loud sound of the loud rock masked the tiny sound that the pin made. The Rock And The Pin

This is a nice introduction to how perceptual audio coding works. In order to make things like Internet Radio and iPods possible, we need some technology to make normally huge digital audio files smaller.

How is this done? It’s pretty complicated, but really interesting once you get the hang of it. Don’t worry. I’ll just explain it in really simple form. For those wanting to know exactly what’s happening, you’ll have to do some searching, or wait for me to get around to writing about it! <evil grin>

A linear audio file, say the raw data from a CD, is quite huge. It contains a LOT of audio information. The premise behind encoders such as AAC and MP3 is that most of those sounds we don’t even notice because we are so focused on what’s going on in the foreground (the loudest sounds in the recording). So these “unnoticeable” sounds are removed, and the “hole” left behind is partially covered over by the loud sounds in the recording, and by the CODECs (enCOder / DECoder) ability to smooth over what you might otherwise hear when masking is insufficient.

The techniques used by .mp3 and AAC are quite good, and the vast majority of people run around totally unaware that they are only listening to 10-20% of what was originally in the linear digital sound files!

Unlike Gunzip or Winzip files, the audio encoding process when using AAC and MP3 is destructive, and permanent! This means that there is no way to get back to the original full quality using the encoded copy.

There are encoding methods out there (such as flac) that create an additional file that contains a description of what was removed to get the file size down, and it uses this file to reconstruct the audio to its full glory from the encoded copy if the end user so chooses to go that route. This also assumes the end user still has this descriptor file handy!

While the perceptual codec method works fairly well, it isn’t perfect by any means. The process of destructive perceptual coding creates unintended side effects to the recovered audio. These side-effects are commonly called “coding artifacts”. This is why radio stations whose music library is composed mostly of Mp2, MP3’s and even AAC file sources have a challenge in sounding their best on the dial.

For audible examples of these artifacts, check out the “Listening for coding artifacts” article.

(Photo by Barry Mishkind)