Comp.DSP FAQ: Part 2

Q2: Algorithms and standards

Q2.1: Where can I get public domain algorithms for general-purpose DSP?

Updated 12/31/96

The following archives contain things such as matrix operations, FFT's and generally useful things like that, as opposed to complete applications.

Netlib serves some of this software via email. Try mail to netlib@ORNL.GOV with "send help" in the subject field.

To Obtain:
For Europe:

X.400: s=netlib; o=nac; c=no;
EUNET/uucp: nac!netlib

For more information:
See Jack J. Dongarra and Eric Grosse, "Distribution of Mathematical Software Via Electronic Mail," Comm. ACM (1987) 30,403--407.

A similar collection of statistical software is available from

The symbolic algebra system REDUCE is supported by

NSWC Library
The Naval Surface Warfare Center has a library of mathematical Fortran subroutines that may be of use. The NSWC library is a library of general-purpose Fortran subroutines that provide a basic computational capability in a variety of mathematical activities. Emphasis has been placed on the transportability of the codes. Subroutines are available in the following areas: Elementary Operations, Geometry, Special Functions, Polynomials, Vectors, Matrices, Large Dense Systems of Linear Equations, Banded Matrices, Sparse Matrices, Eigenvalues and Eigenvectors, l1 Solution of Linear Equations, Least-Squares Solution of Linear Equations, Optimization, Transforms, Approximation of Functions, Curve Fitting, Surface Fitting, Manifold Fitting, Numerical Integration, Integral Equations, Ordinary Differential Equations, Partial Differential Equations

For more information:
NSWC Library of Mathematical Subroutines
Report No.: NSWC TR 90-21, January 1990
by Alfred H. Morris, Jr.

Naval Surface Warfare Center (E43)
Dahlgren, VA 22448-5000

[Witold Waldman]

IEEE Press book "Programs For Digital Signal Processing"
You can get the Fortran source code from the IEEE Press book "Programs For Digital Signal Processing." See question 1.3.6.

Q2.2: What are CELP and LPC? Where can I get the source for CELP and LPC?

Updated 09/10/01

CELP stands for "code excited linear prediction". LPC stands for "linear predictive coding". They are compression algorithms used for low bit rate (2400 and 4800 bps) speech coding.

The U.S. DoD's Federal-Standard-1016 based 4800 bps code excited linear prediction voice coder version 3.2 (CELP 3.2) Fortran and C simulation source codes are available for worldwide distribution (on DOS diskettes, but configured to compile on Sun SPARC stations) from NTIS and DTIC. Example input and processed speech files are included. A Technical Information Bulletin (TIB), "Details to Assist in Implementation of Federal Standard 1016 CELP," and the official standard, "Federal Standard 1016, Telecommunications: Analog to Digital Conversion of Radio Voice by 4,800 bit/second Code Excited Linear Prediction (CELP)," are also available.

To obtain CELP:
Available through the National Technical Information Service:

U.S. Department of Commerce
5285 Port Royal Road
Springfield, VA 22161
(800) 553-6847

FS-1016 CELP 3.2 may also be obtained from

LPC-10 (2.4 Kbps) is available from

LPC (4.8 Kbps) can be downloaded in SpeakFreely, or in HawkVoice HawkVoice includes versions of OpenLPC, LPC-10, LPC, GSM, and Intel/DVI ADPCM. These versions have been rewritten to support multiple encoding and decoding streams, and the interfaces have been standardized. [Phil Frisbie, Jr.,]

OpenLPC (1.4 and 1.8 Kbps) can be downloaded from

MATLAB software for LPC-10 is available from Also, postscript copies of tutorials of speech coding can be found at [Andreas Spanias,]

For more information:

* The following articles describe the Federal-Standard-1016 4.8-kbps CELP coder (it's unnecessary to read more than one):

Campbell, Joseph P. Jr., Thomas E. Tremain and Vanoy C. Welch, The Federal Standard 1016 4800 bps CELP Voice Coder, Digital Signal Processing, Academic Press, 1991, Vol. 1, No. 3, p. 145-155.

Campbell, Joseph P. Jr., Thomas E. Tremain and Vanoy C. Welch, The DoD 4.8 kbps Standard (Proposed Federal Standard 1016), in Advances in Speech Coding, ed. Atal, Cuperman and Gersho, Kluwer Academic Publishers, 1991, Chapter 12, p. 121-133.

Campbell, Joseph P. Jr., Thomas E. Tremain and Vanoy C. Welch, The Proposed Federal Standard 1016 4800 bps Voice Coder: CELP, Speech Technology Magazine, April/May 1990, p. 58-64.

Additional information on CELP can also be found in the comp.speech FAQ.

* The voicing classifier used in the enhanced LPC-10 (LPC-10e) is described in: Campbell, Joseph P., Jr. and T. E. Tremain, Voiced/Unvoiced Classification of Speech with Applications to the U.S. Government LPC-10E Algorithm, Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing, 1986, p. 473-6.

The U. S. Federal Standard 1015 (NATO STANAG 4198) is described in: Thomas E. Tremain, The Government Standard Linear Predictive Coding Algorithm: LPC-10, Speech Technology Magazine, April 1982, pp. 40-49.

[Most of the above from Joe Campbell,, with additions from Dan Frankowski,, and Ed Hall,]

Q2.3: What is ADPCM? Where can I get source for it?

Updated: 04/03/01

ADPCM stands for Adaptive Differential Pulse Code Modulation. It is a family of speech compression and decompression algorithms. A common implementation takes 16-bit linear PCM samples and converts them to 4-bit samples, yielding a compression rate of 4:1.

To obtain:
There is public domain C code available via anonymous ftp at written by Jack Jansen (email It is very programmer-friendly. The ADPCM code used is the Intel/DVI ADPCM code which is being recommended by the IMA Digital Audio Technical Working Group. It allows the following calls:

adpcm_coder(short inbuf[], char outbuf[], int nsample,
struct adpcm_state *state);
adpcm_decoder(char inbuf[], short outbuf[], int nsample,
struct adpcm_state *state);

Note that this is NOT a G.722 coder. The ADPCM standard is much more complicated, probably resulting in better quality sound but also in much more computational overhead.

The routines have been tested on numerous platforms, and will easily compress and decompress millions of samples per second on current hardware.

For more information:
The G.721/722/723 packages are available from ITU at

[From Dan Frankowski,; Jack Jansen,]

Q2.4: What is GSM? Where can I get source for it?

Updated 4/27/00

GSM (Global System for Mobile Communication) is a standard for digital cellular telephony used in Europe. GSM also refers to the speech coder used in GSM telephones, which is what this section of the FAQ is concerned with.

The Communications and Operating Systems Research Group (KBS) at the Technische Universitaet Berlin is currently working on a set of UNIX-based tools for computer-mediated telecooperation that will be made freely available.

As part of this effort they are publishing an implementation of the European GSM 06.10 provisional standard for full-rate speech transcoding, prI-ETS 300 036, which uses RPE/LTP (residual pulse excitation/long term prediction) coding at 13 kbit/s.

GSM 06.10 compresses frames of 160 13-bit samples (8 kHz sampling rate, i.e. a frame rate of 50 Hz) into 260 bits; for compatibility with typical UNIX applications, our implementation turns frames of 160 16-bit linear samples into 33-byte frames (1650 Bytes/s). The quality of the algorithm is good enough for reliable speaker recognition; even music often survives transcoding in recognizable form (given the bandwidth limitations of 8 kHz sampling rate).

The interfaces offered are a front end modeled after compress(1), and a library API. Compression and decompression run faster than real time on most SPARCstations. The implementation has been verified against the ETSI standard test patterns.

Jutta Degener, Carsten Bormann

Communications and Operating Systems Research Group, TU Berlin
Fax: +49.30.31425156, Phone: +49.30.31424315

To obtain: An alternative site is Try also:

[From Dan Frankowski,; Jutta Degener,]

Q2.5: How does pitch perception work, and how do I implement it on my DSP chip?

Updated 04/02/01

Pitch is officially defined as "That attribute of auditory sensation in terms of which sounds may be ordered on a musical scale." Several good examples illustrating the subtleties of pitch perception are included in the "Auditory Demonstrations CD" which is available from the Acoustical Society of America, Woodbury, NY 10797 for $20.

A good general reference about the psychology of pitch perception is the book:

B.C.J. Moore, An Introduction to the Psychology of Hearing, Academic Press, London, 1997.

This book is available in paperback and makes a good desk reference.

An algorithm implementation that matches a large body of psycho-acoustical work, but which is computationally very intensive, is presented in the paper:

Malcolm Slaney and Richard Lyon, "A Perceptual Pitch Detector," Proceedings of the International Conference of Acoustics, Speech, and Signal Processing, 1990, Albuquerque, New Mexico. Available for ftp at

The definitive papers describing the use of such a perceptual pitch detector as applied to the classical pitch literature is in:

Ray Meddis and M. J. Hewitt. "Virtual pitch and phase sensitivity of a computer model of the auditory periphery. " Journal of the Acoustical Society of America 89 (6 1991): 2866-2682. and 2883-2894.

The current work that argues for a pure spectral method starts with the work of Goldstein:

J. Goldstein, "An optimum processor theory for the central formation of the pitch of complex tones," Journal of the Acoustical Society of America 54, 1496-1516, 1973.

Two approaches are worth considering if something approximating pitch is appropriate. The people at IRCAM have proposed a harmonic analysis approach that can be implemented on a DSP:

Boris Doval and Xavier Rodet, "Estimation of Fundamental Frequency of Musical Sound Signals," Proceedings of the 1991 International Conference on Acoustics, Speech, and Signal Processing, Toronto, Volume 5, pp. 3657-3660.

The classic paper for time domain (peak picking) pitch algorithms is:

B. Gold and L. Rabiner, "Parallel processing techniques for estimating pitch periods of speech in the time domain," Journal of the Acoustical Society of America, 46, pp 441-448, 1969.

Finally, a word of caution:
Pitch is not single-valued. We can hear a sound and match it to several different pitches. Imagine the number of instruments in an orchestra, each with its own pitch. Even a single sound can have more than one pitch. See for example Demonstration 27 from the ASA Auditory Demonstrations CD.

[The above from Malcolm Slaney, Interval Research, and John Lazzaro, U.C. Berkeley.]

Information about independently changing the pitch and speed of a digital recording can be found at [Stephan M. Bernsee,]Updated!

Q2.6: What standards exist for digital audio? What is AES/EBU? What is S/PDIF?

Updates 1/8/97

Q2.6.1: Where can I get copies of ITU (formerly CCITT) standards?

Try the ITU (International Telecommunication Union) homepage at

Q2.6.2: What standards are there for digital audio?

The "AES/EBU" (Audio Engineering Society / European Broadcast Union) digital audio standard is probably the most popular digital audio standard today. Most consumer and professional digital audio devices (CD players, DAT decks, etc.) that feature digital audio I/O support AES/EBU.

AES/EBU is a bit-serial communications protocol for transmitting digital audio data through a single transmission line. It provides two channels of audio data (up to 24 bits per sample), a method for communication control and status information ("channel status bits"), and some error detection capabilities. Clocking information (i.e., sample rate) is derived from the AES/EBU bit stream, and is thus controlled by the transmitter. The standard mandates use of 32 kHz, 44.1 kHz, or 48 kHz sample rates, but some interfaces can be made to work at other sample rates.

AES/EBU provides both "professional" and "consumer" modes. The big difference is in the format of the channel status bits mentioned above. The professional mode bits include alphanumeric channel origin and destination data, time of day codes, sample number codes, word length, and other goodies. The consumer mode bits have much less information, but do include information on copy protection (naturally). Additionally, the standard provides for "user data", which is a bit stream containing user-defined (i.e., manufacturer-defined) data. According to Tim Channon, "CD user data is almost raw CD subcode; DAT is StartID and SkipID. In professional mode, there is an SDLC protocol or, if DAT, it may be the same as consumer mode."

The physical connection media are commonly used with AES/EBU: balanced (differential), using two wires and shield in three-wire microphone cable with XLR connectors; unbalanced (single-ended), using audio coax cable with RCA jacks; and optical (via fiber optics).

"S/P-DIF" (Sony/Philips Digital Interface Format) typically refers to AES/EBU operated in consumer mode over unbalanced RCA cable. Note that S/P-DIF and AES/EBU mean different things depending on how much of a purist you are in the digital audio world; see the Finger article below.

Finger, Robert, AES3-199X: The Revised Two Channel Digital Audio Interface (DRAFT), presented at the 91st Convention of the Audio Engineering Society, October 4-8, 1991. Reprints: AES, 60 East 42nd St., New York, NY, 10165.

[The above from Phil Lapsley and Tim Channon,]

Painter, E. M., and Spanias, A. S. (1997 and revised 1999). A Review of Algorithms for Perceptual Coding of Digital Audio Signals. (PostScript, 3MB)

[Andreas Spanias,]

Q2.7: What is mu-law encoding? Where can I get source for it?

Updated 9/13/99

Mu-law (also "u-law") encoding is a form of logarithmic quantization or companding. It's based on the observation that many signals are statistically more likely to be near a low signal level than a high signal level. Therefore, it makes more sense to have more quantization points near a low level than a high level. In a typical mu-law system, linear samples of 14 to 16 bits are companded to 8 bits. Most telephone quality codecs (including the Sparcstation's audio codec) use mu-law encoded samples.

Desktop Sparc machines come with routines to convert between linear and mu-law samples. On a desktop Sparc, see the man page for audio_ulaw2linear in /usr/demo/SOUND/man.

To obtain:
Craig Reese posted the source of similar routines to comp.dsp in August '92. These are archived on

ITU-T (formerly CCITT) Recommendation G.711 (very difficult to follow).

Michael Villeret, et. al, A New Digital Technique for Implementation of Any Continuous PCM Companding Law, IEEE Int. Conf. on Communications, 1973, vol. 1, pp. 11.12-11.17.

MIL-STD-188-113, Interoperability and Performance Standards for Analog-to-Digital Conversion Techniques, 17 February 1987.

TI Digital Signal Processing Applications with the TMS320 Family (TI literature number SPRA012A), pp. 169-198.

[From Joe Campbell; Craig Reese,; Sepehr Mehrabanzad,; Keith Kendall,]

Q2.8: How can I do CD <=> DAT sample rate conversion?

Updated 9/13/99

CD players use a 44.1 kHz sample rate, whereas DAT uses a 48 kHz sample rate. This means that you must do sample rate conversion before you can get data from a CD player directly into a DAT deck.

[From Ed Hall,]

For a start, look at Multirate Digital Signal Processing by Crochiere and Rabiner (see FAQ section 1.1).

Almost any technique for producing good digital low-pass filters will be adaptable to sample-rate conversion. 44.1:48 and vice-versa is pretty hairy, though, because the lowest whole-number ratio is 147:160. To do all that in one go would require a FIR with thousands of coefficients, of which only 1/147th or 1/160th are used for each sample--the real problem is memory, not CPU for most DSP chips. You could chain several interpolators and decimators, as suggested by factoring the ratio into 3*7*7:2*2*2*2*2*5. This adds complexity, but reduces the number of coefficients required by a considerable amount.

[From Lou Scheffer:]

Theory of operation: 44.1 and 48 are in the ratio 147/160. To convert from 44.1 to 48, for example, we (conceptually):

1. interpolate 159 zeros between every input sample. This raises that data rate to 7.056 MHz. Since it is equivalent to reconstructing with delta functions, it also creates images of frequency f at 44.1-f, 44.1+f, 88.2-f, 88.2+f, ...
2. We remove these with an FIR digital filter, leaving a signal containing only 0-20 KHz information, but still sampled at a rate of 7.056 MHz.
3. We discard 146 of every 147 output samples. It does not hurt to do so since we have no content above 24 KHz. In practice, of course, we never compute the values of the samples we will throw out.

So we need to design an FIR filter that is flat to 20 KHz, and down at least X db at 24 KHz. How big does X need to be? You might think about 100 db, since the max signal size is roughly +-32767, and the input quantization +- 1/2, so we know the input had a signal to broadband noise ratio of 98 db at most. However, the noise in the stopband (20KHz-3.5MHz) is all folded into the passband by the decimation in step 3, so we need another 22 db (that's 160 in db) to account for the noise folding. Thus 120 db rejection yields a broadband noise equal to the original quantizing noise. If you are a fanatic, you can shoot for 130 db to make the original quantizing errors dominate, and a 22.05 KHz cutoff to eliminate even ultrasonic aliasing. You will pay for your fanaticism with a penance of more taps, however.

To obtain:
There's a free implementation of Julius O. Smith III and someone else's "bandwidth-limited interpolation" rate conversion algorithm.

A paper available as explains the algorithm. Free source code, as well as an HTML discussion of the algorithm, is available at It all works quite well.

[From Kevin Bradley,]

There is an implementation of polyphase resampling for various rates as a part of the Sox audio toolkit at See file polyphas.c for details.

Sox also contains an implementation of bandlimited interpolation and linear interpolation, and serves as a ready vehicle for module experimentation.

[From Fritz M. Rothacher,]

You can add my Ph.D. thesis on sample-rate conversion to the FAQ:

Fritz M. Rothacher, Sample-Rate Conversion: Algorithms and VLSI Implementation, Ph.D. thesis, Integrated Systems Lab, Swiss Federal Institute of Technology, ETH Zuerich, 1995, ISBN 3-89191-873-9

It can also be downloaded from my homepage at

Q2.9: Wavelets

Updated 6/3/98

Q2.9.1 What are wavelets? Where can I get more information?

In short, wavelets are a way to analyze a signal using base functions which are localized both in time (as diracs, but unlike sine waves), and in frequency (as sine waves, but unlike diracs). They can be used for efficient numerical algorithms and many DSP or compression applications.

Sources of information on wavelets include:

* a newsletter, "Wavelet Digest". Subscriptions for Wavelet Digest: E-mail to with "subscribe" as subject. The Wavelet Digest can also be found at

Q2.9.2 What are some good books and papers on wavelets?

The best introduction to wavelet transforms is in:

Wavelets and Signal Processing- Oliver Rioul and Martin Vetterli, IEEE Signal Processing magazine, Oct. 91, pp 14-38

A good introductory book on wavelets:

Randy K. Young, Wavelet Theory and Its Applications, Kluwer Academic Publishers, ISBN 0-7923-9271-X, 1993.

A more thorough book:

Ali N. Akansu and Richard A. Haddad, Multiresolution Signal Decomposition Transforms, Subbands, Wavelets Academic Press, Inc., ISBN 0-12-047140-X

A couple more interesting papers:

Wavelets and Filter banks: Theory and Design, IEEE Transactions on Signal Processing, Vol. 40, No.9, Sept. 1992, pp 2207-2232

Mac Cody's articles in Dr. Dobb's Journal, April 1992 and April 1993

Paper by Ingrid Daubechies in IEEE Trans. on Info. theory , vol 36. No.5 , Sept 1990 and a book titled " Ten lectures on Wavelets" deal with the mathematical aspects of the WT.

Q2.9.3: Where can I get some software for wavelets?

Binaries are available for the following platforms: Sun Sparcstations running SunOS 4.1 or Solaris 2.3, NeXT machines running NeXTstep 3.0 or higher, with an X server, Silicon Graphics machines (IRIS), DEC Alpha AXP running OSF/1 1.2 or higher, i386/i486 PC compatible with Linux 0.99.

There is also a sample data directory containing interesting signals.

More information:

[From Fazal Majid]:

Rice Wavelet Tools
The Rice Wavelet Toolbox (RWT) is a collection of Matlab M-files and C MEX-files for 1D and 2D wavelet and filter bank design, analysis, and processing. The toolbox provides tools for denoising and interfaces directly with our Matlab code for wavelet domain hidden Markov models and wavelet regularized deconvolution. Also included is a simple converter to the data format used by the official Matlab wavelet toolbox.

The current distribution, Version 2.3 (Dec 1, 2000), has been streamlined and packaged for different systems, including Solaris, Linux, and Microsoft Windows. Functions omitted in Version 2.3 can be found in the Version 2.01 distribution.

To obtain:

Send mail to (or

Q2.10: How do I calculate the coefficients for a Hilbert transformer?

Updated 6/3/98

For all the gory details, I suggest the paper: Andrew Reilly and Gordon Frazer and Boualem Boashash: Analytic signal generation---tips and traps, IEEE Transactions on Signal Processing, no. 11, vol. 42, Nov. 1994, pp. 3241-3245.

For comp.dsp, the gist is:

1. Design a half-bandwidth real low-pass FIR filter using whatever optimal method you choose, with the principle design criterion being minimization of the maximum attenuation in the band f_s/4 to f_s/2.

2. Modulate this by exp(2 pi f_s/4 t), so that now your stop-band is the negative frequencies, the pass-band is the positive frequencies, and the roll-off at each end does not extend into the negative frequency band.

3. either use it as a complex FIR filter, or a pair of I/Q real filters in whatever FIR implementation you have available.

If your original filter design produced an impulse response with an even number of taps, then the filtering in 3 will introduce a spurious half-sample delay (resampling the real signal component), but that does not matter for many applications, and such filters have other features to recommend them.

Andrew Reilly []

Q2.11: Algorithm implementation: floating-point versus fixed-point

According to the WWWebster Dictionary, an algorithm is "a procedure for solving a mathematical problem (as of finding the greatest common divisor) in a finite number of steps that frequently involves repetition of an operation; broadly: a step-by-step procedure for solving a problem or accomplishing some end especially by a computer."

Typical (although by no means the only) operations are those of addition and multiplication. When expressing the algorithm with pencil and paper, these operations are commonly taken to be within an algebraically complete number system such as the integers or the reals. However, when the time comes to implement the algorithm on a computer, these "ideal" number systems must be exchanged for something realizable. The number systems available today on common processors and digital hardware are broadly categorized as floating-point and fixed-point.

In a floating-point representation, the total number of bits available are partitioned into an exponent and mantissa. Generally speaking, the mantissa stores the "significant digits" of the value while the exponent scales the significant digits to the desired magnitude. The action of the exponent is to move, or "float," the decimal point depending on the magnitude being represented; thus the term "floating-point."

Because floating-point representations are typically at least 32 bits long (IEEE-754 is a popular standard for 32-bit and 64-bit floating-point numbers), there exists simultaneously high precision and high dynamic range. These traits of floating-point numbers allow most algorithms to be ported directly to floating-point implementations with little or no change, and this is the key reason floating-point representations are highly desirable. The disadvantage of floating-point implementations is that they require a significant amount of extra hardware over fixed-point implementations, which translates to higher parts costs, higher power consumption, slower execution, larger chip area, or a combination of these.

As the term "fixed-point" implies, fixed-point representations have the binary point at a fixed location. There are two subsets of fixed-point implementations: fractional and integer. In a fractional fixed-point implementation, such as that provided on the Motorola 56K series of DSPs, the binary point is always assumed to be to the left of the most-significant digit. In an integer fixed-point implementation, such as that provided by the Texas Instruments TMS320C54xx series of DSPs, the binary point is to the right of the least-significant digit. In either case, the arithmetic operations implemented in the hardware are essentially integer, which results in a much simpler arithmetic logic unit in hardware that allows lower cost, lower power consumption, faster execution, smaller chip area, or a combination of these, over that of floating-point implementations.
Fixed-Point Arithmetic: The Basics

In essence, a fixed-point representation is a simple integer scaled (divided) by a power of two. If we denote an unscaled integer variable by upper case "X" and the scaled, fixed-point variable by lower case "x," then x = X/2^b, where b is the number of digits the binary point is shifted left. For example, if X is a 16-bit, two's complement integer, and b=4, then "X" has values ranging from -2^(15) to +2^(15)-1 and with minimum step size of 1, while the scaled value "x" ranges from -2^(11) to +2^(11) - 1/(2^4) with a minimum step size of 1/(2^4).

Note that the value of "b" is not part of the representation. You won't see it in a register or as part of the data anywhere; it is a parameter that the algorithm implementer must determine and maintain.

Fixed-point representations place some very different rules on operations than their floating-point counterparts. For example, two variables must be scaled the same in order to be added (or subtracted). Thus it may be necessary to shift one or the other operand prior to adding. Another example is that when multiplying two N-bit values with scale factors b0 and b1, the result is scaled (b0+b1) and requires 2*N bits in general in order to avoid overflow and maintain precision.

There are several other rules and considerations for fixed-point arithmetic that are commonly encountered when implementing algorithms. For more information, see

Randy Yates []