How Can It Do That?

Roger Skoff answers a question that many audiophiles have, but never ask...

On one of the audiophile groups on Facebook a few days ago, I saw a question from one of the group's members that I had wrestled with, myself, many years ago, when I first got into our hobby. Here it is:

I'm very impressed and confused. Impressed that we can reproduce a symphony in our own houses. But confused at how it all works. Over fifty (and more—including vocals) instruments playing all at once and coming out of just a pair of speakers. I don't get it. How can each instrument be reproduced all at the same time and often times from one driver?

I understand why he's asking. As long as we think of the sound we hear as coming from it source, it's perfectly reasonable to think that the best approach to sound reproduction would be to have a separate speaker for each source. In fact though, that's exactly backwards.

What we need to do in order to understand how multiple sounds can come from a single speaker is to change our thinking about what sound is, and realize that "sound," as we think of it, is different, and perceived differently than what sound actually is as a physical phenomenon.

We think of sound as something in itself—like water coming from a spout in a fountain, and that, like a beautiful and complex fountain (think of the ones in the squares of our great cities), what we perceive is the pattern produced by all of the streams at any given moment. That's certainly true, but because the nature of sound is different from that of water, and because we have only two ears, the process is entirely different.

What we think of as sound is nothing more than a pattern of pressure waves coming to us through the air (or whatever other intervening medium) as a result of something vibrating—a vocal cord, a drum skin, the reed or string or internal column of air of a musical instrument, a spoon banging on a teakettle, or anything else—even an explosion or a car crash—that makes it.

Those vibratory motions are translated to the air by the simple fact that the air is everywhere and in contact with everything, so that, if something vibrates, the air is vibrated, too. And, because, except where it's physically separated by some barrier (a wall, for example), all of the air is in physical contact, with all the rest of it, everything that is vibrating at any location within any body of air communicates that vibration to all of it, subject to diminution in intensity by distance in accordance with the Inverse Square Law, and continuing to diminish until inter-molecular friction ultimately converts the motion to heat.

In short, all of the air is vibrating with what may be an infinite number of sounds all the time. So how is it that we don't hear all of them? It all has to do with how sound is picked up—by our ears, by microphones, or by anything else.

In every case, what happens is that the air in contact with something causes that thing to move, too. In the case of an ear or a microphone, the thing that's set in motion is a thin diaphragm—the pick-up diaphragm of a microphone or the eardrum (tympanum) of an ear. Either of these is made in such a way that it can only move in one direction (back and forth or in and out) in response to variations in air pressure and, as the pressure increases (and to the degree that it increases) the diaphragm is moved (forward) by the increased pressure. When the pressure reverses (remember that sound is a series of alternating increases and decreases in pressure) the diaphragm moves in the opposite direction (backward) until the diaphragm reaches its rest point (neutral position) and, if the pressure reversal is sufficient, continues backward either as far as it can go or until the pressure reverses again. These movements occur over the course of time, and other than what the actual position may be, the most important things about them are that the diaphragm can only be in one position at any given instant, and that that position is determined by the net pressure (positive or negative) on the diaphragm as the algebraic result of all of the pressures present.

For an easy example of how this works, think of a tug-of-war with a constantly varying number of people of varying sizes and strengths on each side of a single rope, all pulling as hard as they can to move the rope to their side of the line. The rope is the diaphragm, the people are the varying positive and negative pressures caused at any given instant by all of the sounds present, and the position of the rope is the position of the diaphragm. That's how multiple sounds result in only one single position of the (rope or the) diaphragm at any one time.

Now, suppose you had a pen that would record the positions of the rope or the diaphragm as they changed over time. Or better yet, think of a phono cartridge with its stylus moving in a groove, with the movements causing electrical output to be produced of greater or lesser positive or negative voltages as the stylus follows the swings of the groove.

That's how your ear works. Pressure waves, directed by your outer ears (the pinnae) through your ear canals, arrive at diaphragms (your eardrums), causing them to move back and forth in response to the changing pressures. These movements wiggle a set of three bones in each ear, the malleus, incus, and stapes (hammer, anvil, and stirrup) which apply correspondingly varying pressures to the cochlea, which apply the same varying pressures to the auditory nerve cells inside them, which creates a composite electrical signal which your brain picks up and you hear as sound.

The stereo effect comes from the fact that your two ears are at two different locations, which means that the sound (air pressure variations) comes to them at slightly different times and slightly different volume levels, corresponding to slightly different angles of phase (positive or negative pressure) for each different sound source, and it is from those time and amplitude differences that our brain is able to compute, by a process akin to triangulation, how many sound sources there are and the relative direction of, and distance to, each one relative to our listening position.

A microphone does exactly the same thing as your eardrum—it has a diaphragm that is moved by the changing air pressures of the sound that impinges on it. That movement (again, the net result of all of the pressures on it), creates a voltage or current flow in the motor mechanism of the microphone, which is recorded, and then, when played-back and amplified, produces an electrical signal which moves a speaker diaphragm (or all of the diaphragms in your two- or three-way speakers) backwards or forward exactly in correspondence with the movements of the microphone diaphragm that picked up the sound originally. (Or the net signals of all of the microphones, if more than one was used)

Because in every case—a microphone, a speaker, or an eardrum—there can be only one net pressure at the diaphragm at any given time, (the algebraic total of all of the pressures, positive or negative at all of the phase angles of all of the sounds heard at that instant at that specific location) there is only one position to which the diaphragm moves, creating only one specific pressure change embodying all of the sounds heard, regardless of how many sounds there may be, whether you're listening to the buzzing of a fly or a full tutti of the Berlioz Symphony of One Thousand."

And that's how it does it.