Tuesday, January 01, 2008

Why todays CD's sound like crap

Leaving aside your judgment of the "quality" of music produced by Britney Spears , Justin Timberlake, and their ilk; most of todays CD's, even from otherwise great artists, unquestionably sound like crap to anyone who appreciates music as something other than a beat to bob your head to.

The reason why, is often referred to in the business as "the loudness war", and this is about the best simple, but clear explanation of why it's ruining music:

Oh and here's a REALLY great explanation of it, that someone who linked to this piece pointed out (better than mine I think, but I didn't know about it 'til after I wrote this): Imperfect Sound Forever

Music is sound waves, just like any other sound. It has peaks and valleys in the wave form corresponding to frequency and amplitude. The higher the amplitude, the higher the peak, and the louder it is.

Dynamic range, is the difference between the quietest audible sounds in a recording, and the loudest (the big peaks and valleys I just mentioned above). Live music has a maximum dynamic range of around 120db. Vinyl records only have a maximum dynamic range of around 70db. CD's, which in comparison to vinyl are very "wide open" so to speak; still have far less available range than live music, at about 96db.

Now, for any recording, there is a maximum possible level; above which the sound will drop out or "clip", which causes distortion.

When music was mastered for vinyl records, that clipping level was relatively low; because the available dynamic range was relatively low (at 50db less than live music, and 26db less than a CD). Because of this, instead of trying for "loudness", recording engineers tried to get the most dynamic range out of a recording that they possibly could (excepting pop singles intended for dance and radio airplay, where loudness was considered more important).

Also, records were generally listened to in relatively quiet, stationary environments (again, except for 45rpm singles) on hi-fi stereo systems. These listening environments and equipment strongly emphasized detail (or the lack of it) and clarity over loudness; because when you have a medium working against you (the vinyl LP, which has an inherently low dynamic range compared to a CD, never mind a 9 or 14 track master audio tape), you do everything you can to compensate for it.

When the music business transitioned from vinyl to CDs in the late 80s, recoding engineers suddenly had a (relatively) HUGE amount of overhead available to them for the music. This allowed for much greater dynamic range to be reproduced.

Greater dynamic range means more detail, space, and expressiveness in music. It lets you reproduce the sound of live music better.

Unfortunately for us listeners 20 years later, it also makes recordings less loud on average; because the loudest sounds in a rock or pop recording (usually kick drums) are typically 5 or more times as loud as the average sounds; and may be 50 times as loud as the quietest. In classical music (which in general is noted for it's dynamic range) the difference from the loudest, can be 30 or 40 times or more than the average.

Given the basic characteristics of the music, when the kick drums (or brass, or kettle drums in classical) are set to just below the clipping level, the vocals (or the woodwinds), are a little bit quiet in comparison.

Which is as it should be; because human hearing is far less sensitive in the very low bass, and very high treble ranges. So those sounds NEED to be louder, to be heard properly. Also, they are INTENDED to be louder by the artists and performers. If the performers wanted the vocals to be as loud as the drums, they would have sung that way in the first place.

If the vocals are quiet, turn up the volume (not the loudness), and you'll hear BOTH clearly, and without distortion (well... the overly loud brass in todays classical is another discussion entirely).

The average loudness of a pop recording in the pre-loudness war era was -18db (18db below the clipping level), while the loudest sounds in a recording were mastered at just below that level (well... -3 to -6db, but still); indicating a very large variation in sound levels.

Now if you love music, this is how it SHOULD be. If you want it louder, you just turn up the volume knob.

Unfortunately, record company executives don't love music, they love record sales; and quiet music doesn't grab peoples attention as much as louder music. If your track is louder on the radio without the listener having to turn their volume knob up, people are more likely to notice it; and the thinking goes, more likely to buy it.

Also, if your music is being played in a very noisy environment; having the vocals be as loud as the drums, means people can hear the vocals easier without turning up the volume. If you're in a very loud environment, like a club, or in your car in heavy traffic, or listening to your mp3 player on a crowded train; again, people are more likely to notice, and possibly buy.

So starting in about 1989, when digital tools to do so with less distortion became available; record companies started having engineers pump the loudness up.

Prior to this time, the analog compression filters used to hot master those old 45 dance singles, would produce very poor results if you attempted to hot master a CD with them. With a record, you couldn't hear just how bad it was; with a CD, well, clarity works both way. Garbage in, Garbage out.

Also, the nature of analog vs. digital signal was working in favor of the records; because when you slightly overdrive analog amplifiers, with an analog signal, they distort in ways that are warm, and organic sounding. It's a gentle fuzz, with interesting overtones. In fact, it's something that was (and is) often done intentionally in rock music. When you overdrive a discrete solid state amp (the kind most of us have now) with a digital signal however, the results are distinctly unpleasant. You get loud hissing, pops, dropout, and squeals.

Up until very recently, the dominant mode of recording in a studio was on a 1/2", 3/4", 1" or even 2" master tape (9 track is common at the bottom end, all the way up to 28 track tape). A 2" master tape, has the dynamic range to capture just about everything you can get out of a microphone, or an amplifier. Recently digital recording onto hard disk has taken over; and again, you have enough dynamic range in the medium to capture the full expressiveness of a live performance... or at least as much as the mikes can get anyway (microphones don't capture harmonics and overtones very well; in fact they deliberately supress some of them because they can cause odd sounds in the recoding. Recording directly off a pre-amp output for an instrument cant capture them at all).

With the original high dynamic range master recordings of the artist playing live in the studio, the level of the loudest sound (like the kick drums) is set so that to a few decibels below the clipping point. If you do that; then you can't increase the overall volume of the recording without losing and distorting (clipping) a lot of that drum sound.

Unfortunately, record executives want louder music; and in order to get it, they use full track audio compression.

Now, audio compression isn't like file compression, where you reduce the size of the data. With audio compression you are reducing, or "compressing" the differences in amplitude between the loudest, and the quietest sounds. This makes the average loudness of the recording higher; and thus overall louder; but you lose dynamic range.

Losing dynamic range, means losing that detail, space, "air" that made CD's better than vinyl in the first place (ok you analog heads, back off).

Not only that, but increasing the volume like that doesn't increase the music, it just kind of adds dimensionless, detailless noise. It increases the sound pressure level without increasing the content. It sounds like noise, because it IS noise.

Compression has been around forever, and in some cases the effect is good, and not harmful. analog compression has been used in guitar and vocal recordings since the 50's to make them sound "fuller", and stand out more in a mix for example. But those are relatively small variations, focused on the mid and mid high frequencies.

Full track audio compression (dynamic range compression) is a little bit different. Instead of using it to fill out the sound of a vocal, they're taking ALL the sound and making it stand out more; and if everything is "standing out more", obviously nothing is. When everyone in a conversation starts shouting, you don't hear anything clearer... in fact, everything gets muddier and harder to pick out.

Importantly, we aren't talking about just increasing the volume; because that would make everything louder, but keep the range of variation. With dynamic range compression, you're just making the stuff that should be quiet, louder. You are increasing loudness, not volume.

For those of you who understand classical music, or musical notation, what they are doing is effectively taking whatever the loudest sound of the recording is, setting that at fortissimo (even if it should only be forte); and taking the absolute quietest sound which should be pianissimo, and playing it mezzo forte.

It's as if you were taking Mozarts "Agnus Dei", from Requiem; and playing it with as much intensity and loudness as the allegro sections "Eine Kleine Nachtmusik", the whole time, with no variation.

Or for those of you not classical fans, Imagine if Queens "who wants to live forever" which has a HUGE dynamic range from Pianissimo to Fortissimo in the same song, were simply all sung and played at the level of "We Will Rock You".

This practice is called "hot mastering", and as I said, its been with us since the days of 45rpm dance records; but it was never widespread for all types of popular music, until the '90s.

The pre-hot master days (as I mentioned above) had an average recoding loudness of -18db. Todays hot mastered recordings have an average loudness of -9db.

Thats a HUGE difference, because the DB scale is logarithmic, not linear. Every 3db of loudness increase is actually a doubling of the sound pressure, so going from -18 to -9 doesn't just double the sound, it's actually 8 times as loud.

Human hearing isn't calibrated the same way electronics are, so we don't quite hear it that way. To us, a 3db change in loudness is about the smallest clearly audible step (yes, really, it takes a doubling or halving of sound energy for us to notice). We perceive a 9db change as a doubling or halving of sound.

What that means, is that compared to a 1987 recording, a 2007 recording will sound twice as loud for a given volume setting on your stereo; or if you don't WANT to blow your eardrums out, you'll turn the volume down; but when you turn the volume down on a low dynamic range recording, it sounds even worse. It's muddy and lacks definition; and it completely negates the purpose of having hot mastered the thing in the first place.

Which means of course that record company executives are idiots; but we all knew that.

If proper mastering is Pavarotti singing "Nessun Dorma", then hot mastering is Ben Stein reading Ayn Rand VERY LOUDLY.

Here's another graphic illustration of what I'm talking about. These two images are both of the same recording; The track "One of Us" off the very first commercially released popular music CD, ABBA's "The Visitors".

The first image (left and right waveforms are shown in each) is the original CD release from 1981, and the second is a re-relase, remastered in 2005 under todays practices:

You can easily see here what I'm talking about (click to enlarge); where the loudest sound in the track is set to be just a bit below the clipping level, and the quietest audible sound is at perhaps 1/30th that level, with the average being about 1/4th or 1/5th the level.

Now the hot mastered track...

Remember, these are the EXACT SAME SONG, from the exact same original master recordings. Looking at those wave forms however, you can tell just how different they would sound.

In the second recording, instead of the loudest send being set slightly below the clipping level, almost all of the recording is set AT the clipping level. The differences are gone; and clearly, so is the detail. The differences between the loudest and quietest sounds in this track is 1/8th or maybe 1/10th that of the loudest, instead of 1/30th. Worse, with so much of the track at the clipping level (the average on this track is -8db)... well, it just sounds like noise to your ears.

When your ear hears continuous sound at the exact same amplitude level, your brain tends to disregard frequency variations, and it all sort of blends together into mud and noise.


An aside: To anyone who would mock my choice of ABBA as an example above; first, let me say they are a PERFECT choice, because they are the exact type of music that would have been compressed even in the 70s (pop/dance records).

The fact that they maintained a wide dynamic range, clarity, and definition in their recordings; even though they were producing for a "wall of sound" effect, for dance music, is quite impressive. In fact ABBA have always been noted for producing very high quality recordings.

Not only that, but say what you will about the content of their songs (yes, it's mostly silly or sappy bouncy pop. Thats what they were trying for, and that's what they got), they produced perfect pop music, they wrote all their own lyrics and music, and their singing is uniformly excellent. They sold 400 million records in a time when the only artists to have broken 100 million were Elvis, the Beatles, and Frank Sinatra; and they are still today, the second best selling recording group of all time (second only to the Beatles).

So, though you may not like them; you have to respect their achievement.

Well, some artists are finally doing something about it; because they want their music to sound like, oh, I dunno... Music.

Bob Dylan, who has always kept tight control over his music (though the uses for which he licenses it are a different story), has been very vocal in speaking out against this practice; going so far as to say he can't listen to most new CDs without pain. He has always prevented record companies from hot mastering his music; and is actively trying to get other artists to do the same.

Notably, classical and Jazz (with the exception of pop pseudo jazz idiots like Kenny G) artists have also strongly resisted such adulteration of their music; and for the most part record companies have wisely stayed away; because they know that Jazz and classical consumers will not stand for their music being destroyed by stupidity.

Destroyed? Absolutely. A hot mastered Jazz recording isn't music, it's noise.

I have listened to original CD remasters of "Kind of Blue" (actually, I listen to it frequently), and a hot mastered re-release of it produced after Miles death (not the '97 remaster; a compilation set remaster from I think 2004)... The re-release is absolutely unlistenable. I forced myself to listen through the whole thing, at the same volume I used for the original; and at the end of it, I felt like I had been listening to jackhammers for 45 minutes.

I haven't listened to any classical that has been hot mastered; but if you want the same effect, turn up a TV or radio commercial with classical music in it really loud. See, hot mastering was originally (and still is) a technique used to make commercials louder than the music they were played between.

Would you really want to listen to that for an hour?

But wait, it gets worse...

The last few years, CD sales have been heading downward, as digital download sales have been soaring; never mind the effects of digital piracy etc...

Even physical CD's that are being purchased, are more than likely going to end up on a computer, or a personal media player like the iPod; because almost everyone under 35 (and thats almost the entire demographic market for new music purchases) listens to most of their music either in their car, on their computer, or on their PMP.

Unfortunately, digital downloads as we commonly use today, have a few weaknesses when it comes to music reproduction.

A typical CD audio track is mastered at 44,100 samples per second, or 44.1khz, and 16 bits per sample. That's about 706kilobits of data per second; 86.1 kilobytes per second, or about 5 megabytes per minute.

It's that 16 bits per sample by the way that gives us the 96db of dynamic range mentioned above; because you get approximately 6db of dynamic range per sampling bit.

Just as a comparison, DVD-Audo discs use 24 bits per sample (by default anyway, DVD-A also supports increasing sampling rates up to 176khz). This is the current standard for professional audio recording as well; because it provides for approximately 144db of dynamic range (which is about 24db more than almost all live music, and about 4db more than the difference between silence, and a jet engine at full throttle).

A 24 bit bitstream at 44.1khz would result in a bitstream of just under 130 kilobytes per second; or about 7.5 megabytes per minute.

If you could write a raw single channel bitstream, that would only be 550 megabytes for a full 74 minute CD (or 377mb for standard 16 bit).

Of course, you cant just write the raw bitstream to the disc, and have anything other than the original recording equipment read it; and of course there are multiple channels in a sound mix (in cd's two discrete PCM channels, with the potential for superencoded pseudo channels like Dolby ProLogic). There has to be some encoding standard that recorders and readers can all use, to maintain compatibility.

So, CD's do have some encoding involved (it's called Red Book encoding, which uses specific PCM and filesystem parameters ), as well as additional data for error correction etc... Unfortunately, it's rather bandwidth inefficient (it produces relatively large, sparse datastreams that take up more space than some other formats); so a full 74 minute CD, has about 650 megabytes of info, about 9 megabytes per minute, or about 150kilobytes per second (1200 kilobits). Given that an average pop song is 3:05, you're looking at about 27 megabytes for a song.

Given that most people in the US have a 1.5 megabit per second or so internet connection, (6 or so seconds per megabyte in theory; usually its more like about 20 seconds minimum ), a single track download would take 10 minutes, and a full album download would take about 2-4 hours.

Clearly, that wouldn't fly commercially.

What was needed was compression, in the other sense; file compression, or data compression.

In the late 80s, the Frauenhoffer institute used something called psychoacoustic analysis to determine how you could best remove data from an audio recording, and still have it sound acceptable to the human ear. They submitted the CODEC (coding/decoding algoritm) that resulted to the motion picture experts group (mpeg - sounds familiar don't it), and it became the MP3 format.

It was well known that we are most sensitive to upper mid range frequencies (those around the range of the human voice), less sensitive to low bass (below 80hz, and especially below 30hz), and most of us have little sensitivity above 12khz and almost none at all above 16khz (the conventional "limits of human hearing" are regarded as 20hz to 20khz, though some can hear as low as 16hz, or as high as 22khz). They took those sensitivities, and then mapped the actual response of the ear and the brain exactly.

Once they figured out what are brain mostly ignores, they just dropped it. The rest, they compressed, folded, spindled, and mutilated so that our brain and ears "fill in" the missing information. In so doing, they took raw (well... encoded) data that was 1200 kilobits per second; and they reduced it to the point where you brain can't tell the difference, but the file size is only 320 kilobits per second.

That means that they compressed the file from about 9 megs per minute PCM, to about 2.3 megs per minute average; without any perceptible loss of quality (it takes an oscilloscope to tell the difference. Double blind studies have proven it repeatedly; no matter what audiophile snobs like to think).

The length of time it takes to download a full 74 minute album on a 1.5Mb internet connection just went from 4 hours, to an hour and a bit; slightly better than real time download speed... and most albums aren't 74 minutes long.

That's amazing. It's really really impressive. It's also meaningless, because almost no-one uses the near perfect 320kbit encoding rate.

When MP3 first became popular in the mid 90s, DSL and Cablemodems were just becoming widely available, and you were lucky if you could get 256k over DSL, and 512k over your cablemodem. Most people were still on 56k dialup, which in the real world meant 32k (or less); and remember all these numbers are in bits not bytes.

Realworld download speeds for a single 320k music track on 56k dialup were something on the order of 90 minutes or so for a three minute song; or about a day and a half for a full album. Even on 512k cablemodem, you were talking about 6-10 minutes per song, and two hours per album.

Also, portable music players of the day only had 32-512 megabytes of storage space, not the 2-80 gigabytes of todays players, so 8 megabyte songs were just too big.

Obviously this wasn't commercially viable either.

The MP3 standard however doesn't require you to use a specific encoding bit rate. You can encode sound at lower bit rates, for smaller sizes. Of course doing so reduces quality greatly, but some music at lower quality is better than no music right?

You can encode a human voice all the way down to 32k and still have it be intelligible; but it sounds pretty bad. Music encoded at128k though has about the same quality level as FM radio, which most people find acceptable; and the file sizes are only bout 40% that of 320k encoded files. It's also good for internet streaming, because your connection can usually download it faster than your computer can play it (in fact all the youtube clips I've linked here, probably the ultimate expression of streaming media in todays internet, are probably encoded at 96k or 128k, and thus subject to the same issues).

2.5 to 3.2 megabytes a song was doable. You could fit a whole Ramones album on the small 32mb internal memories of the first generation music players; or a single 'Yes" song...

Okay Yes fans, back away from the keyboard slowly. I love them too; but seriously, "Dance of the Dawn" ? It's an incredible song (off an incredible album. You really have to listen to "Tales from Topographic Oceans"), but it's over 23 minutes long.

At any rate, 128k became the dominant encoding rate for MP3s, and today it still is; though bandwidth and storage increases have been slowly changing that. Most newly encoded tracks are at least available at 192k or 256k (though still most often downloaded at 128k); and most people, on most equipment, can't hear the difference between 256k and a CD...


The equalization is adjusted properly for MP3 that is.

Because MP3 uses that psychoacoustic algorithm, it emphasizes certain frequencies, and deemphasizes others. The lower the bit rate, the more it deemphasizes.

Well, at 128k, and even 256k, this results in somewhat overloaded midrange with response dips in odd places, and a lack of bass and treble response.

Unfortunately, most people have no idea how to use an equalizer, if they even have one. Most portable music players have a very limited EQ capability, with a bass and treble slider; if they have any at all.

To compensate for that, the labels have in the last two years, started changing the music even more. In addition to boosting loudness; they are now equalizing for mp3.

They're applying what's called "smile curve" equailiztion, named because if you plot the freqeuncy EQ curve it looks kind of like a smile, with upturned corners (bass and treble), and a dip in the middle (midrange).

Of course if you happen to be listing to a CD on a good stereo system, what you get is loud, noisy, boomy, hollow, tinny, and harsh sounding "music".

... and yet, the record companies wonder why sales are down?