Spoon's Audio Guide

Products Buy Support Forum Registrations Professional About

Spoon's Audio Guide: In the Know

Digital audio technology is fast moving, constantly innovating, this introduction brings you up to speed in no time.

Audio Representation

Digital Audio signals are represented by three different parameters, each of these has an effect on audio quality, for best quality match the encoder with the source, example: compressing an Audio CD, encode to 2 Channel, 16 bit, 44.1 KHz.

Channels

Audio CDs contain 2 channels of audio, that is 2 independent audio signals. The idea being your Hi-Fi has two speakers, the listener sits in the middle facing the speakers, two ears detect differences from each speaker (created during mastering), this gives depth to the audio reproduction, called stereo separation, as well placing the vocalist in the center of the two speakers.

Movies benefit more than music from extra speakers, effects some times need to appear from behind, it is easier to effect this when there are actual speakers at the rear. DVDs have 5.1 sound: 5 speakers and .1 is the low frequency sub-woofer.

Why is music not 5.1? traditionally if a concert was attended, all sound would appear to come from the front, nothing from behind, where as a car chase in a film the police sirens would be behind. That is not to say music cannot improve with more speakers, certain tracks might try to place the listener in the middle of audio, if I had the choice of 2 very good speaker or 5 average ones, I would choose the two good for music.

Channel Count Common Name

1 Mono

2 Stereo

4 Quadraphonic

6 5.1

8 7.1

Frequency (Sample Rate, or Samples Per Second)

Sound is made up from pressure waves. A single constant wave has its frequency measured in Hz (oscillations per second). Humans ears on average can hear from a lowest frequency of 10's of Hz, up-to higher frequencies just below 20,000 Hz, or 20 KHz.

When talking about digital audio, frequency has a different meaning, it is the rate each sound sample is recorded. Imagine you were told the temperature out side once a day, your friend was told the temperature four times a day, who would have the more accurate picture? your friend. The higher the frequency, the more accurate a representation, up to a point...human hearing can not hear on average above 20 KHz, so reproducing 50,000 KHz would be a waste of space (each sample takes up space). Nyquist's theorem states: that to reproduce a 22 KHz sound signal, it must sampled (recorded) at more than 2x the required frequency, a sample rate of 44.1 KHz can reproduce a 22 KHz signal.

It just so happens that audio CDs have a sample rate of 44.1 KHz, so why is DVD audio 96 KHz, or 192 KHz? is it a marketing ploy? yes and no. Yes it is a ploy in that more appears to be better, it has already been said that an audio CD can reproduce a sound that has a higher frequency than people can hear. No, as it is easier (cheaper) to create a piece of audio equipment that plays back a 18 KHz signal without distortion, when fed a 192 KHz signal rather than a 44.1 KHz signal. High-end gear, would not have much distortion, so there is no point in 96, or 192 KHz audio, just the cheaper consumer gear which improves.

Bit Depth (and Amplitude)

Consider these two audio sine waves:

B has a higher amplitude (2x) than A, it is louder, but B is not twice as loud as A, perceived audio loudness works on a logarithmic scale. The human ear was designed this way, so that the quietest mouse can be heard whilst the loudest jet tolerated (there is many order of magnitudes difference between the two).

Bit depth is the resolution audio samples can be stored with, consider these crude representation of bit depth:

8 bit 16 bit 24 bit

8 bit has the worst detail, whilst here is is shown as 'blocky' audio is not like that with sine waves, 8 bit is still smooth, just has less precision, resulting in more distortion. There is not too much difference between 16 bit and 24 bit, they are both reaching the limits of perception. Audio CDs are 16 bit, whereas DVDs are 24 bit, again is it a marketing ploy? yes and no, yes most people cannot hear the difference between the two, no as 16 bit audio CDs have been spoilt by the loudness race: that is CDs produced now are volume compressed, that is the quiet parts are pushed up louder, so that when played on the radio or TV the track sounds louder (a 1980's CD would sound quiet in comparison to one from 2000). The downside is that 16 bit CDs are no longer effectively 16 bit, the full audible range is not being used. 24 bit helps, but in the long run, the same fate (loudness war) might happen to 24 bit tracks.

Compression

When talking audio, compression can have two meanings: volume compression where the volume levels are 'compressed' to make the overall piece louder and audio compression, used to reduce the file size. We are discussing audio compression, of which there are two types:

Lossy the majority of compressed audio files are lossy, when encoding audio quality is sacrificed to achieve higher rates of compression. How much quality is lost depends on the encoder and settings used for compression, bit rate plays the biggest role in determining final quality, higher bit rate files have better quality than lower bit rate files. Bit rate is normally presented in Kbps (Kilo-bits-per-second).

Bit rate can be fixed at the same value throughout the file, know as Constant Bit Rate, or CBR. Bit rate can constantly vary on demand, an audio track might have quiet parts, it stands to reason that for these quiet parts a lower bit rate could be used, whilst complex parts a higher bit rate could be used. When the bit rate is allowed to change it is called Variable Bit Rate, or VBR. Finally there is Average Bit Rate (ABR), basically it is VBR but with constraints, those constraints are to give the whole file an average set bit rate, so the final file size can be roughly known (with VBR it could be any size).

Typically a lossy 3 minute audio track might be 3 MB in size, around 10 to 1 compression (at 160 Kbps), or 10% of it's uncompressed size. Common lossy encoders are: mp3, ogg vorbis, windows media audio (wma), advanced audio compression (AAC, typically stored in a .m4a container).

Lossless: audio which is compressed using lossless can be uncompressed exactly the same (bit for bit) as the source file, it is without loss. Lossless is slowly gaining ground on Lossy, the main advantage being once your CD collection is ripped into lossless that is it, no more re-ripping, unlike lossy where the need to re-rip might present its self if a newer encoder is released. Lossless can be converted to any other Lossless format without loss, lossless can be converted to any lossy format and has the same quality as though ripping from audio CD.

The main reason Lossless is held back, is the final compression rates which are no where near as good as Lossy, a typical 3 minute audio track might be around 30 MB uncompressed, Lossless could compress down to 15 MB, around 2 to 1 compression, or 50% of it's uncompressed size.

<< Spoon's Audio Guide