You are reading the older HTML site

Positive Feedback ISSUE 22
november/december 2005


Computer-Driven Audio - Is it Superior to Optical-Based CD Playback?
by Steve Nugent, Empirical Audio


Steve Nugent is the President of Empirical Audio in Bend, OR As a former engineer at Intel, and holder of many patents in electrical engineering (see my PFO interview with Steve and Janet in Issue 7 at, Steve is well qualified to comment on the subject of PCM audio and the drawbacks of the delivery of same via optical disc. From time to time he contributes articles to PFO; we share these with our readers in order to advance the discussion of alternatives to the traditional delivery of Red Book PCM. Pro bono publico, then…. Ye Olde Editor

I am convinced that computers are the future of high-performance audio, and not optical discs. Besides having a number of ergonomic advantages, such as the ability to build play-lists of favorite tracks, organize music like a file system and download tracks and artist information from the web, there are a number of technical reasons why this technology is inherently superior in performance, particularly when compared to the current generation of optical players. The current generation of CD and DVD players cannot compete with computer systems that are well implemented, but future generations may as they start to resemble computer-driven systems. The primary audio quality improvement that is possible with computer driven audio is a significant reduction in jitter over that of optical disk systems.

Jitter is not a digital error.


Jitter is an attribute of digital signals, particularly clocks that impart the timing to the data that is being moved from point A to point B in any synchronous digital system. Clocks in digital systems are like real-time clocks in that each clock occurrence or "tick" is designed to occur at a particular time cadence, like the seconds ticking on a clock. At each clock "tick," the digital system either performs processing on some digital data or moves it one step further to get it from point A to point B. Jitter can be described as the time variability of a single clock event, where this event can occur either before or after the exact point in time where it is expected to occur. Jitter is synonymous with seconds ticking away on a clock but with individual ticks not occurring at exact one-second intervals. Some are slightly less than a second and some are slightly longer than a second, the average of all the seconds being exactly a second, so no actual time is being gained or lost over a large number of seconds. The jitter is the difference between the shortest and the longest second. Jitter in digital audio systems is measured in nanoseconds, and even picoseconds.

All synchronous digital systems have both data and timing attributes. In certain systems, such as computers where most transactions require only that the data arrive intact, jitter in the clock is actually not important, assuming that the timing requirements of the chips are met. The clock is only used to move the data from point A to point B, and the arrival time of each individual data word can vary to some extent without impacting the function. This is described as non-real-time. Actually most modern personal computers intentionally insert jitter into the master clock of the computer in order to limit the RF emissions from the computer, and thereby pass FCC testing. This is desirable if you want to sell any computers.

Digital audio systems however are different because they use both the data and the timing of the clock to reproduce the original recording. The data stream is transferred "real-time." The timing must match the original sample-rate used when the recording was made to accurately re-create the analog signal. The data words are clocked into the D/A converter at this constant rate. Both the frequency and the jitter of the clock can affect the accuracy of the reproduction. The frequency, if not accurate, can cause pitch and speed of the music to change, and in some systems cause drop-outs if there is no data available when it is expected. Jitter manifests itself as frequency modulation, which can be audible as well. Several studies have been published that measured jitter and tried to correlate it to audibility. Like most audiophile studies, these are not without controversy. Like all listening tests, the accuracy of these tests is very dependent on the system attributes and the recording used for the testing as well as the listeners' hearing acuity.

In any case, these tests have shown some jitter to be audible, in the tens of nanoseconds anyway. I believe it is much more insidious than this result. I believe that the spectra (frequency content) of the jitter has a lot to do with its audibility. In my own reference system I have made improvements that I know for a fact did not reduce the jitter more than one or two nanoseconds, and yet the improvement was clearly audible. There is a growing set of anecdotal evidence that indicates that some jitter spectra may be audible well below 1 nanosecond.

Contributors to Jitter

There are a number of key components in typical digital audio systems where contributions to jitter is significant, including:

  1. The pits in the CD

  2. Reading of the pits by the optical read-head of the transport

  3. Jitter in the master clock in the transport and Jitter in the asynchronous re-clocking in an upsampling DAC

  4. Transmission of the S/PDIF signal

  5. Dispersion of the signal in the S/PDIF or AES/EBU cable

  6. The electrical-optical-electrical conversions in a Toslink interface

  7. Conversion of S/PDIF to extract clock and data

  8. Noisy power supplies and ground-loop noise

1. Pits in the CD

The pits in a CD have two attributes, the data and the timing. The data is encoded in the depth of the pits. The timing information is the physical placement of the pits. When CDs are created, the master has jitter in the pits, and the manufacturing process that fabricates the duplicates creates even more jitter. This is easy to verify. If you rewrite the CD on a CDROM with a good CD writer, the duplicate will usually sound significantly better than the original. This was demonstrated to me by Mark Hampton of Zcable, who created some excellent CD-R's and shared some of them at 2005 CES in Las Vegas. The improvement was not subtle. Several other manufacturers are duplicating CDs on CD-Rs for their own use, and they even insist on using special CD-R media.

2. Optical read head

The laser read head in a CD transport detects the pits by reflection and must differentiate between a "one" and a "zero". The exact point in time at which each pit is detected by the read head varies due to vibration, differences in the depth of the pits, electrical noise, optics, dirt on the disk and other factors.

3. Master clock in the Transport

The master clock provides the synchronization means for the disc to rotate at the correct speed to maintain the cadence/sample rate of the original recording. It also moves the data through the various stages of logic and buffering until it is sent out the digital output. This is a primary source of jitter in transports. Replacement of oscillators with low-jitter clocks can make a significant improvement in most transports. Some more sophisticated systems have a "word-clock," which is an oscillator that originates either in the DAC or external to the CD player and clocks the whole system. The word-clock can help to reduce jitter, but it also suffers from many of the jitter contributors. Some DACs resample the data at a higher sample rate asynchronously, using a local oscillator. This clock can add to or even reduce jitter, depending on implementation.

4. Transmission of S/PDIF format

The typical CD or DVD player has several chips and buffers that the data must pass through before exiting the output connector as S/PDIF. These stages can add significantly to the jitter if the circuit design or PC board design is not well executed, or the power is noisy, etc.

5. Digital cabling and optical conversion

If the digital cable is an S/PDIF "coax" or AES/EBU cable, there are at least two contributors to jitter, including reflections due to impedance mismatches, and dispersion due to losses and changing data pattern. Reflections can occur because the cable, the output driver, or the input is not matched to the correct impedance. Losses on the cable can cause the pulses to flatten out which causes edges to move depending on the data pattern. Cable length can also cause reflections on the cable to cause jitter.

For more on this subject, see my white-paper on S/PDIF cable length in Issue 14 of Positive Feedback Online for more details, published at

6. Optical conversion

Toslink is the worst of the interfaces because the electrical to optical and optical to electrical conversion adds to the jitter. Toslink creates additional stages that the clock must pass through, picking up jitter due to power/ground noise and uncertainty of when the edge (logic change) transitions get detected.

7. Recovery of the clock from the S/PDIF signal

This occurs at the DAC, usually in a "receiver" chip. The first contributor to jitter is the detection of the S/PDIF signal. If each edge transition is not detected at exactly the same voltage, then jitter will be added. Power and ground noise and temperature changes in the receiver chip can dynamically move this detection voltage. If the edge-rates are very slow (typical from a stock transport is 20-25 nsec), then the variability of the edge detection is even greater, which means more jitter is added. The function of extracting the clock from the data stream, because it requires some logic stages and a Phase-Locked-Loop, will add jitter as well.

8. Noisy power and ground loops

Noisy power supply and ground noise caused by ground-loops and imperfect voltage regulation and delivery in a system can also add to the jitter.

Computer-driven audio usually eliminates jitter contributors 1, 2 and 3. In some cases it also eliminates or minimizes jitter contributors 4, 5, 6, 7, and even 8.

Computer audio methods

Computer driven audio is currently generated using one of three methods, each of which has advantages and disadvantages. The original method was using PCI add-in cards, which contain analog to digital converters, and sometimes have S/PDIF digital outputs as well. External "box" converters are now available, either wired with USB or Fire-wire or wireless using 802.11g "WI-FI".

PCI add-in cards have the disadvantage that they utilize the power supply of the PCI bus of the computer. This power supply is typically quite noisy, so it is difficult to get high-quality output from these cards. They are also limited in form-factor and cooling capacity. There are severe interconnect length limitations when either S/PDIF or analog outputs are used, so this places the computer very close to the audio system. AC power of the computer can also easily cause hum in an audio system. I generally advise against using add-in audio cards in favor of external converters. External converters generally interface with the computer using WI-FI wireless, USB, or Firewire, and have their own power supplies.

Wireless converters typically use Ethernet wireless protocol. This has several disadvantages as well. The wireless network can be shared by other devices, such as printers and other computers on the network. Since the Ethernet protocol allows ANY device to demand service and bandwidth, other devices can "hog" the network and cause drop-outs in the audio stream. The current wireless devices are also limited to 44.1kHz sampling rate, so higher rates of data are not supported, such as the common 24-bit/96kHz upsampled data. The advantage of wireless devices is that they are completely isolated from the computer and can be located virtually any distance from both the listener and computer. They can clock the data with ultra-stable clocks and be powered from separate power supplies, even batteries.

USB and Firewire converters have the disadvantage that they are obviously wired and the wires have length limitations. The advantages include dedicated bandwidth, 24-bit/96 kHz data support, and isolation from the computer and separate power supplies, including batteries. These converters currently allow the highest quality playback.

Computer audio "servers" that are "turn-key" use one of the previously described methods, but eliminate the software setup step, making it easier to get started in computer driven audio, particularly for those that are not so skilled on the computer.

Playing from the Hard-Disk

The hard-disk does not read the data like a CD player. When the computer sends the audio stream to an output port, such as USB, the CPU first reads the data in a "burst" fashion from the hard disk and caches blocks of the data in memory. It is then spooled from memory to the output port in a continuous stream. Successive disk accesses are made to keep the memory cache buffer full and the stream running without interruption. The player and driver software often have options to select the cache buffer sizes. The data that is transferred out the USB port is just data, with no precise timing information; however, it must keep up with the demand or drop-outs will occur. The average data rate must match the recorded sample rate. With some of the protocols that are used on USB, drop-outs can happen more easily, particularly if the CPU is utilized to a high percentage during music output by other applications. This is a good reason to dedicate the computer to this purpose.

How external computer converters minimize jitter

In an outboard USB converter, the data is received from the sending computer and precise timing information is added. The jitter from the computer clock can be effectively eliminated. The interface is then translated into an interface that a DAC can understand, such as S/PDIF, AES/EBU or I2S (the native DAC chip interface). The clock that generates the timing can be very precise and does not depend on data rate coming from a rotating optical disk, like a CD player, or the rate at which a hard disk is read. It does however depend on uninterrupted data flow from the computer. The power supply and even the grounding can be isolated from the computer. The power can be further improved by using battery.

There are three possible USB protocols to insure that the data flow is uninterrupted: synchronous, adaptive, and asynchronous. For explanation of these, I defer to the USB expert, John Swenson, who wrote a short explanation on a web discussion group that I thought was very instructive:

From the link above, you can understand that there are several ways in which to implement a USB interface to an audio system, each with its pitfalls. Even the chip manufacturers have struggled with which way is the best. Fortunately, with enough CPU speed, memory size and fast disk accesses, the problems can be minimized.

Pops and Ticks

With the current USB converter protocols, the USB transfer is prone to "underrun," which means that the USB needs data and the computer is too late in delivering it to keep up with the real-time data stream demand. When this happens, ticks and pops can occur in the music. This happens more often in systems that upsample the music data on the computer, which requires a lot of the CPU cycles and competes with the data stream out to the USB port. Fortunately, there are a few steps that can be taken that will usually eliminate these pops, including:

  1. Dedicate the computer to the audio playback task

  2. Defrag the disk that contains the ripped music data

  3. Turn-off all applications except for the player, for example, iTunes, Jriver or Foobar2000

  4. Give the player a high priority

  5. Tune the buffer sizes in the players options

  6. Use a computer or laptop with sufficient CPU speed

  7. Insure a large RAM size for data caching

  8. Specify fast seek-time and fast spindle rate hard disk drive


I2S or I-squared-S is an interface found on some transports, such as Audio Alchemy and on some DAC's, such as the Perpetual Technologies P-3A and the Northstar 24/192, but is not a common interface. This interface replaces the "digital coax" or S/PDIF interface that is common for most CD and DVD digital audio. The I2S is a 4-signal or 4-wire interface, consisting of two clocks, a L/R channel select and a serial data signal. There are several advantages to this interface, one of which is that unlike S/PDIF, it includes a bit-clock, which eliminates jitter contributor 7. Another advantage is that it is the "native" interface for most DAC chips. This means that no translation of the data stream need take place in order to drive a DAC chip. This inherently reduces the amount of hardware that the clock and data signals must pass through. As in most audio equipment, simpler is better and this is no exception. The clock jitter is minimized, but not eliminated in systems that use the I2S interface. It is still possible to have significant jitter even with this interface, if it is not well implemented. An interface that converts USB, Firewire or Wi-Fi from a computer to I2S driving a DAC chip directly has the potential to outperform all other current techniques if implemented well.

Ripping and file compression

I do not claim to be an expert on ripping, but I'll share here what I think I know about it. Ripping is the process of copying the data from a music CD to the hard disk on your computer. There are very sophisticated ripping software programs available now, such as iTunes or Exact Audio Copy (EAC), which are both freeware. These programs do multiple reads and compare the data to be certain that the copy is an exact duplicate of the data on the disk. With a modern CD-RW drive or DVD+/-RW drive, ripping is fast and very accurate. If you are attached to the web at the time you do the rip, the ripping software can "look-up" all of the track information on the tracks that you are ripping and insert this information into the database of the files. This can save you a LOT of keystrokes when ripping. The ripping software queries a "freedb" database that is free and available to anyone on the web if you specify this in the ripper configuration. You can also enter the data, such as track name, artist name and genre manually.

Rippers give you the option of uncompressed or compressed files to be stored on the disk. Uncompressed files are typically files with .wav extensions and represent an exact copy of the data on the CD disk. A .wav file of a music track can average around 60Mbytes in size. Compressed files can be encoded as two types: lossless compression and lossy compression. Lossless compression is popular because none of the music samples are lost, the music file is reduced in size using mathematical techniques so that less disk space is necessary, usually 50% of the uncompressed file size. Lossless compression does not eliminate any of the music information; it only encodes it to reduce the file size. Examples of lossless compression formats would include Apple lossless, Window Media Audio Lossless and FLAC. The player software and sometimes the converter hardware must be capable of uncompressing the lossless format, and all players do not uncompress all formats.

Lossy compression includes MP3, AAC and others. Lossy compression techniques can reduce the file size by more than ten to one. These are generally not interesting for an audiophile system, and mostly are useful for portable music, like iPod's, MP3 players, etc. Lossless compression always eliminates some of the music, such as the quieter passages that are coincident with loud passages. The thinking is that these quieter passages will not be missed in the average music system. In a decent audiophile system, however, lossy compression usually causes a loss of detail, image and soundstage depth and width.

Upsampling on the Computer

Upsampling is one of the BIG advantages of using a computer to drive your music. Digital music is recorded at a particular sample-rate, generally 44.1 kilosamples per second on CD's. Upsampling or resampling adds more samples that were not part of the original recording process, but try to approximate the samples that would have been recorded had the music been recorded at the higher sample rate initially. The added samples are computed with various mathematical algorithms that examine the music waveform prior to and after the time-slots where the new samples will be inserted. These new samples not only add more detail to the music, but improve the dynamics as well. Upsampled data can be 24 bits at 88kHz, 96kHz or 192kHz. Most of the currently available converter chips do not support 192kHz.

The algorithms are where the magic is. I have personally found that some computer upsampling algorithms are more musical and detailed than the hardware chips that I have heard that do upsampling. There are several upsampling codes available in "plug-in" form that can be added to the Foobar2000 player for instance, including one that was written by a third party. I expect to see more of these upsampling codes in the future, as there is definitely an art to doing this well. The plug-ins are generally .dll files that the player accesses.

Another advantage of software upsampling is that the software is often configurable, including various knobs that you can adjust for personal sound preference.


Computer-driven audio is adding new life to the CD format, providing new methods to get even more performance from CD tracks. Those that have discovered the performance and ergonomics advantages of this new technology have found new joy in their music listening. It is changing the way that we organize our music. Rather than remote controls and shelves of CD's, you can have playlists in each genre of only your favorite tracks that can play for hours without intervention. The database organizes your music so you can quickly find what you want to play, or build a new playlist for a party or holiday. A laptop at your side can replace the remote control. Individual tracks can be downloaded one at a time over the web, saving you money and time.

Perhaps this will eliminate that irritating problem of buying a 12-track CD of which only two are good ones. We can only hope!