I’ll omit what I wrote about bit depth, as this seems to have been covered already in better detail than I did.
Obviously, over compression and the loudness wars are evil. But adding a few extra bits of depth resolution isn’t going to help that problem - it just raises the ceiling that commercial producers will be obliged to drive their waves into. People will have to turn down their devices - amusing, because the average volume control potentiometer has better channel balance at higher volumes.
Next up: sampling theory. It’s a bit harder to nail down - but the essential point is that for a digital system it is considered completely transparent to sample at a rate >2x that of the highest frequency you wish to reproduce. This is why a standard cd is sampled at ~44.1kHz. Whilst in theory, this process turns a sine wave into a triangle wave at the highest reproducible frequency, in practice because of the nature of physics that “triangle wave” will be reproduced as a sine wave due to the fundamental limitations of the rest of the system. A 96k file can reproduce a sound up to 48kHz, well beyond the hearing of anyone who’s not a dog or bat etc. There is a claim that much in the way infrasound (sub 10Hz) is not perceived as a tone and yet adds to music, ultrasound can also do this - despite a lack of even a proposed physical mechanism of how we would notice it or, y’know, any kind of scientific evidence. The argument made by the proponents is that even if it turns out they’re wrong, there’s no harm in it right? And producers use high bit-rates! The fact is that there is harm in it, and production is a different kettle of fish all together.
That is not to say that there are no benefits of a higher sample rate. A higher sample rate can make it easier to implement analogue anti-aliasing, increase bit depth (virtually, with some clever maths) and reduce noise. We now use digital filters in digital systems and I’ve dealt with bit depth earlier - the noise issue is however valid for consumer audio. I’ll come back to this later, because now it’s time to talk about the drawbacks of oversampling.
In order to understand this, we should first clear up some issues around distortion - the video gets as far as distortion = bad, which whilst somewhat true, is less than helpful. Distortion can be thought of as any deviation from the desired signal. There are two kinds of distortion - linear and non-linear. In linear distortion, as the power of the wanted signal is increased, the distortion increases at a linear rate (wanted signal level is proportional to unwanted level.) In non-linear distortion the power contained in the portion of the signal we call distortion rises more quickly than the wanted portion (wanted signal level raised to a power is proportional to the unwanted signal level.) The main kind of non-linear distortion we worry about is called intermodulation distortion (harmonics are a special case of this mechanism.) Two tones can interact to generate new tones either side of the original two. For instance, if you have tones at 1MHz and 2MHz, you will see shrinking intermod tones in either direction with a spacing of 1MHz. The power represented in these tones depends on the amount of non-linear distortion. All amps have non-linear distortion - it’s merely a question of how much.
There are a few issues with this extra representable frequency range. The first is that if it is at all populated, intermod noise will be scattered throughout the audible spectrum - amps tend to have worse non-linear characteristics the further from their optimal zone they get, and it should go without saying that no-one optimises an audio amp to ultrasound frequencies. Tony Andrews mentions that he only listens to distortion free music (an obvious lie if you’re familiar with what distortion is), and we’ll assume that he means clipping. Producers will recognise clipping as “overdrive” distortion, present in pretty much all rock music for a start. The process of clipping is equivalent to squaring off the top of a wave. A square wave is made by adding successively higher harmonics of a fundamental sine wave together and an infinite number of these harmonics summed will give a perfect square wave - therefore, squaring off a wave is the equivalent of adding high frequency harmonics. The second issue is similar, in that ultrasonic tones will be generated by the intermodulation of the audible tones.
Thus, if that frequency range is available it will be filled with noise - how much depends on the amp quality amongst other things (it’s worth noting that audio amplifiers tend to be optimised to have low distortion around 4kHz. The further from the frequency you go, the worse the distortion becomes.) Whilst it is true that limiting the sample rate detracts from square wave reproduction for the reasons in the above paragraph, only a few harmonics are needed to make a better square wave than the speaker drivers can actually reproduce. In fact, it is current driver technology that acts as the main limitation.
Any given driver is going to have a frequency response curve - and the better it’s characteristics in certain parts of the curve, the worse they will be in others. The traditional way round this is to optimise it to a certain frequency set and then only drive it with those frequencies using a crossover. The average audio tweeter, for obvious reasons, doesn’t even extend into the +18KHz region that well - just like the average woofer won’t touch 20Hz. When a driver is driven with a wanted signal, and an “unwanted” signal outside of it’s operating range two things happen:
-
The driver is able to reproduce the unwanted signal, but not well. The signal comes out and it is distorted making the speaker sound rubbish.
-
The driver is not able to reproduce the unwanted signal in any significant quantity relative to the wanted signal because of it’s response curve. You can probably imagine what happens to the wanted signal when the voice coil is constantly trying and failing to move the driver faster than it can physically move! Degradation of wanted signal quality would be putting it lightly.
The second case is what will happen when you try and reproduce ultrasound with a standard tweeter. So clearly, we need to filter this sound from the tweeter. However, apparently this ultrasound increases our enjoyment - so we’re actually going to want a crossover and a full on ultrasound tweeter. Good luck finding any speakers with that built in. You also have to bear in mind that any filtering (this includes crossovering) will introduce aberrations of their own - filter theory depends on reactance which is a non-linear phenomenom, therefore any filtering adds non-linear distortion.
Thus, whilst oversampling does have uses, in consumer audio the sample rate decision is a trade off between noise (linear distortion) and intermodulation (non-linear distortion). So essentially, for a standard system, as soon as you start sampling higher than audible frequencies, you’re looking at spending a lot of money on non-consumer components in order to make your sound only a little shitter.
Something that the presentation very much lacked was peer reviewed, double blind studies about how audible the effects he describes are, so I’ll try and do that myself. This one: http://www.aes.org/e-lib/browse.cfm?elib=14195 is quite good. They spent a year putting a 16bit, 44kHz ADC-DAC process into various signal chains and seeing if people could notice the difference when compared to a “full quality” signal. It turns out that no matter what the system, no matter who the listener (including self professed golden eared audiophiles and professional masterers) no-one can spot the difference (unless you have it on silent and turned right up - apparently some noise was discernable). Frankly, if you think that background noise at 50dB down from a sigal makes a difference, you’d be better off spending your money on an anechoic chamber rather than the next sound system up.
The issues raised are real - but as soon as the digital technology can reproduce the desired signal more accurately than the rest of the system, no further benefits will come from a higher sample rate. The highest end of current systems cannot render the difference between a 44.1kHz 16bit scheme and a full production quality 192kHz 24bit scheme in such a way that the difference is noticeable (without foreknowledge of which scheme is being used.)
I’ve seen people say that interactions involving ultrasound waves generate subtle, audible tones. This is quite true, however it doesn’t mean we have to record and reproduce the inaudible component sounds in order to hear their audible results. If the ultrasound interaction is “genuine” and desired, then it’s audible results will have been recorded from the source or generated in software (production should be done with higher sample rates, largely for this reason.) If its results weren’t recorded, and the component sounds need to be combined in a non-linear device in order to hear the result, that’s distortion.
One definite benefit that was mentioned earlier in the thread was the ability to use a gentler low pass filter slope - as I mentioned earlier, filtering introduces distortion of its own. As with everything in engineering, it’s a tradeoff. Improvement in one area will spoil another - this is why this sort of thing has to be constantly re-referenced to the actual capabilities of the human ear, demonstrated in double-blind studies.