|
Internet: netlib@nac.no
A similar collection of statistical software is available from
statlib@temper.stat.cmu.edu.
The symbolic algebra system REDUCE is supported by reduce-netlib@rand.org.
Naval Surface Warfare Center (E43)
[Witold Waldman]
The U.S. DoD's Federal-Standard-1016 based 4800 bps code excited linear
prediction voice coder version 3.2 (CELP 3.2) Fortran and C simulation
source codes are available for worldwide distribution (on DOS
diskettes, but configured to compile on Sun SPARC stations) from NTIS
and DTIC. Example input and processed speech files are included. A
Technical Information Bulletin (TIB), "Details to Assist in
Implementation of Federal Standard 1016 CELP," and the official
standard, "Federal Standard 1016, Telecommunications: Analog to
Digital Conversion of Radio Voice by 4,800 bit/second Code Excited
Linear Prediction (CELP)," are also available.
FS-1016 CELP 3.2 may also be obtained from
ftp://svr-ftp.eng.cam.ac.uk/pub/comp.speech/coding/celp_3.2a.tar.Z.
LPC-10 (2.4 Kbps) is available from
ftp://svr-ftp.eng.cam.ac.uk/pub/comp.speech/coding/lpc10-1.0.tar.gz.
LPC (4.8 Kbps) can be downloaded in SpeakFreely
http://www.speakfreely.org/, or in HawkVoice
http://www.hawksoft.com/hawkvoice/. HawkVoice includes versions of
OpenLPC, LPC-10, LPC, GSM, and Intel/DVI ADPCM. These versions have
been rewritten to support multiple encoding and decoding streams, and
the interfaces have been standardized. [Phil Frisbie, Jr.,
phil@hawksoft.com]
OpenLPC (1.4 and 1.8 Kbps) can be downloaded from
ftp://ftp.futuredynamics.com/OpenLPC/.
MATLAB software for LPC-10 is available from
http://www.eas.asu.edu/~spanias/srtcrs.html.
Also, postscript copies of tutorials of speech coding can be found at
http://www.eas.asu.edu/~spanias/papers.html.
[Andreas Spanias, spanias@asu.edu]
Campbell, Joseph P. Jr., Thomas E. Tremain and Vanoy C. Welch,
The Federal Standard 1016 4800 bps CELP Voice Coder, Digital Signal
Processing, Academic Press, 1991, Vol. 1, No. 3, p. 145-155.
Campbell, Joseph P. Jr., Thomas E. Tremain and Vanoy C. Welch,
The DoD 4.8 kbps Standard (Proposed Federal Standard 1016),
in Advances in Speech Coding, ed. Atal, Cuperman and Gersho,
Kluwer Academic Publishers, 1991, Chapter 12, p. 121-133.
Campbell, Joseph P. Jr., Thomas E. Tremain and Vanoy C. Welch, The
Proposed Federal Standard 1016 4800 bps Voice Coder: CELP, Speech
Technology Magazine, April/May 1990, p. 58-64.
Additional information on CELP can also be found in the
comp.speech FAQ.
The U. S. Federal Standard 1015 (NATO STANAG 4198) is described in:
Thomas E. Tremain, The Government Standard Linear Predictive Coding
Algorithm: LPC-10, Speech Technology Magazine, April 1982, pp. 40-49.
[Most of the above from Joe Campbell, jpcampb@afterlife.ncsc.mil, with
additions from Dan Frankowski, drankow@winternet.com, and Ed Hall, edhall@rand.org]
Note that this is NOT a G.722 coder. The ADPCM standard is
much more complicated, probably resulting in better quality sound but
also in much more computational overhead.
[From Dan Frankowski, dfrankow@winternet.com; Jack Jansen, Jack.Jansen@cwi.nl]
The Communications and Operating Systems Research Group (KBS) at the
Technische Universitaet Berlin is currently working on a set of
UNIX-based tools for computer-mediated telecooperation that will be
made freely available.
As part of this effort they are publishing an implementation of the
European GSM 06.10 provisional standard for full-rate speech
transcoding, prI-ETS 300 036, which uses RPE/LTP (residual pulse
excitation/long term prediction) coding at 13 kbit/s.
GSM 06.10 compresses frames of 160 13-bit samples (8 kHz sampling
rate, i.e. a frame rate of 50 Hz) into 260 bits; for compatibility
with typical UNIX applications, our implementation turns frames of 160
16-bit linear samples into 33-byte frames (1650 Bytes/s).
The quality of the algorithm is good enough for reliable speaker
recognition; even music often survives transcoding in recognizable
form (given the bandwidth limitations of 8 kHz sampling rate).
The interfaces offered are a front end modeled after compress(1), and
a library API. Compression and decompression run faster than real time
on most SPARCstations. The implementation has been verified against the
ETSI standard test patterns.
Jutta Degener jutta@cs.tu-berlin.de, Carsten Bormann cabo@cs.tu-berlin.de)
Communications and Operating Systems Research Group, TU Berlin
[From Dan Frankowski, dfrankow@winternet.com; Jutta Degener, jutta@cs.tu-berlin.de]
This book is available in paperback and makes a good desk reference.
An algorithm implementation that matches a large body of psycho-acoustical
work, but which is computationally very intensive, is presented in the paper:
The definitive papers describing the use of such a perceptual pitch
detector as applied to the classical pitch literature is in:
The current work that argues for a pure spectral method starts with the work
of Goldstein:
Two approaches are worth considering if something approximating pitch
is appropriate. The people at IRCAM have proposed a harmonic analysis
approach that can be implemented on a DSP:
The classic paper for time domain (peak picking) pitch algorithms is:
[The above from Malcolm Slaney, Interval Research, and John Lazzaro,
U.C. Berkeley.]
AES/EBU is a bit-serial communications protocol for transmitting
digital audio data through a single transmission line. It provides two
channels of audio data (up to 24 bits per sample), a method for
communication control and status information ("channel status bits"),
and some error detection capabilities. Clocking information (i.e.,
sample rate) is derived from the AES/EBU bit stream, and is thus
controlled by the transmitter. The standard mandates use of 32 kHz,
44.1 kHz, or 48 kHz sample rates, but some interfaces can be made to
work at other sample rates.
AES/EBU provides both "professional" and "consumer" modes. The big
difference is in the format of the channel status bits mentioned above.
The professional mode bits include alphanumeric channel origin and
destination data, time of day codes, sample number codes, word length,
and other goodies. The consumer mode bits have much less information,
but do include information on copy protection (naturally). Additionally,
the standard provides for "user data", which is a bit stream containing
user-defined (i.e., manufacturer-defined) data. According to Tim
Channon, "CD user data is almost raw CD subcode; DAT is StartID and
SkipID. In professional mode, there is an SDLC protocol or, if DAT,
it may be the same as consumer mode."
The physical connection media are commonly used with AES/EBU:
balanced (differential), using two wires and shield in three-wire microphone
cable with XLR connectors; unbalanced (single-ended), using audio coax cable
with RCA jacks; and optical (via fiber optics).
[The above from Phil Lapsley and Tim Channon,
tchannon@black.demon.co.uk]
Painter, E. M., and Spanias, A. S. (1997 and revised 1999). A Review of
Algorithms for Perceptual Coding of Digital Audio Signals. (PostScript, 3MB)
http://www.eas.asu.edu/~spanias/papers.html
[Andreas Spanias, spanias@asu.edu]
Desktop Sparc machines come with routines to convert between linear and
mu-law samples. On a desktop Sparc, see the man page for audio_ulaw2linear
in /usr/demo/SOUND/man.
Michael Villeret, et. al, A New Digital Technique for Implementation
of Any Continuous PCM Companding Law, IEEE Int. Conf. on Communications,
1973, vol. 1, pp. 11.12-11.17.
MIL-STD-188-113, Interoperability and Performance Standards
for Analog-to-Digital Conversion Techniques, 17 February 1987.
TI Digital Signal Processing Applications with the TMS320 Family
(TI literature number SPRA012A), pp. 169-198.
[From Joe Campbell; Craig Reese, cfreese@super.org; Sepehr Mehrabanzad,
sepehr@falstaff.dev.cdx.mot.com; Keith Kendall, KLK3%mimi@magic.itg.ti.com]
CD players use a 44.1 kHz sample rate, whereas DAT uses a 48
kHz sample rate. This means that you must do sample rate
conversion before you can get data from a CD player directly
into a DAT deck.
[From Ed Hall, edhall@rand.org:]
For a start, look at Multirate Digital Signal Processing
by Crochiere and Rabiner (see FAQ section 1.1).
Almost any technique for producing good digital low-pass
filters will be adaptable to sample-rate conversion. 44.1:48
and vice-versa is pretty hairy, though, because the lowest
whole-number ratio is 147:160. To do all that in one go
would require a FIR with thousands of coefficients, of which
only 1/147th or 1/160th are used for each sample--the real
problem is memory, not CPU for most DSP chips. You could
chain several interpolators and decimators, as suggested by
factoring the ratio into 3*7*7:2*2*2*2*2*5. This adds
complexity, but reduces the number of coefficients required
by a considerable amount.
[From Lou Scheffer:]
Theory of operation: 44.1 and 48 are in the ratio 147/160.
To convert from 44.1 to 48, for example, we (conceptually):
So we need to design an FIR filter that is flat to 20 KHz,
and down at least X db at 24 KHz. How big does X need to
be? You might think about 100 db, since the max signal size
is roughly +-32767, and the input quantization +- 1/2, so we
know the input had a signal to broadband noise ratio of 98
db at most. However, the noise in the stopband
(20KHz-3.5MHz) is all folded into the passband by the
decimation in step 3, so we need another 22 db (that's 160
in db) to account for the noise folding. Thus 120 db
rejection yields a broadband noise equal to the original
quantizing noise. If you are a fanatic, you can shoot for
130 db to make the original quantizing errors dominate, and
a 22.05 KHz cutoff to eliminate even ultrasonic aliasing.
You will pay for your fanaticism with a penance of more
taps, however.
A paper available as
ftp://ccrma-ftp.stanford.edu/pub/DSP/Tutorials/BandlimitedInterpolation.eps.Z
explains the algorithm. Free source code, as well as an HTML
discussion of the algorithm, is available
at http://ccrma-www.stanford.edu/~jos/resample/. It all works quite well.
[From Kevin Bradley, kb+@andrew.cmu.edu:]
There is an implementation of polyphase resampling for various
rates as a part of the Sox audio toolkit at
http://home.sprynet.com/~cbagwell/sox.html. See file polyphas.c for details.
Sox also contains an implementation of bandlimited interpolation
and linear interpolation, and serves as a ready vehicle for module
experimentation.
[From Fritz M. Rothacher, f.rothacher@ieee.org:]
You can add my Ph.D. thesis on sample-rate conversion to the FAQ:
Fritz M. Rothacher, Sample-Rate Conversion: Algorithms and VLSI
Implementation, Ph.D. thesis, Integrated Systems Lab, Swiss
Federal Institute of Technology, ETH Zuerich, 1995, ISBN 3-89191-873-9
It can also be downloaded from my homepage at
http://www.guest.iis.ee.ethz.ch/~rota.
Sources of information on wavelets include:
A good introductory book on wavelets:
A more thorough book:
A couple more interesting papers:
Mac Cody's articles in Dr. Dobb's Journal, April 1992 and April 1993
Paper by Ingrid Daubechies in IEEE Trans. on Info. theory , vol 36.
No.5 , Sept 1990 and a book titled " Ten lectures on Wavelets" deal
with the mathematical aspects of the WT.
Binaries are available for the following platforms:
Sun Sparcstations running SunOS 4.1 or Solaris 2.3,
NeXT machines running NeXTstep 3.0 or higher, with an X server,
Silicon Graphics machines (IRIS),
DEC Alpha AXP running OSF/1 1.2 or higher,
i386/i486 PC compatible with Linux 0.99.
There is also a sample data directory containing interesting signals.
[From Fazal Majid majid@math.yale.edu]:
The current distribution, Version 2.3 (Dec 1, 2000), has been streamlined
and packaged for different systems, including Solaris, Linux, and Microsoft
Windows. Functions omitted in Version 2.3 can be found in the Version 2.01
distribution.
Send mail to wlet-tools@rice.edu (or ramesh@dsp.rice.edu)
For all the gory details, I suggest the paper:
Andrew Reilly and Gordon Frazer and Boualem Boashash: Analytic signal
generation---tips and traps, IEEE Transactions on Signal Processing,
no. 11, vol. 42, Nov. 1994, pp. 3241-3245.
For comp.dsp, the gist is:
If your original filter design produced an impulse response with
an even number of taps, then the filtering in 3 will introduce a
spurious half-sample delay (resampling the real signal component),
but that does not matter for many applications, and such filters
have other features to recommend them.
Andrew Reilly [Reilly@zeta.org.au]
According to the WWWebster Dictionary, an algorithm is "a procedure
for solving a mathematical problem (as of finding the greatest common
divisor) in a finite number of steps that frequently involves
repetition of an operation; broadly: a step-by-step procedure for
solving a problem or accomplishing some end especially by a computer."
Typical (although by no means the only) operations are those of
addition and multiplication. When expressing the algorithm with
pencil and paper, these operations are commonly taken to be within an
algebraically complete number system such as the integers or the
reals. However, when the time comes to implement the algorithm on a
computer, these "ideal" number systems must be exchanged for something
realizable. The number systems available today on common processors
and digital hardware are broadly categorized as floating-point and
fixed-point.
In a floating-point representation, the total number of bits
available are partitioned into an exponent and mantissa. Generally
speaking, the mantissa stores the "significant digits" of the value
while the exponent scales the significant digits to the desired
magnitude. The action of the exponent is to move, or "float," the
decimal point depending on the magnitude being represented; thus the
term "floating-point."
Because floating-point representations are typically at least 32
bits long (IEEE-754 is a popular standard for 32-bit and 64-bit
floating-point numbers), there exists simultaneously high precision
and high dynamic range. These traits of floating-point numbers allow
most algorithms to be ported directly to floating-point
implementations with little or no change, and this is the key reason
floating-point representations are highly desirable. The disadvantage
of floating-point implementations is that they require a significant
amount of extra hardware over fixed-point implementations, which
translates to higher parts costs, higher power consumption, slower
execution, larger chip area, or a combination of these.
As the term "fixed-point" implies, fixed-point representations have
the binary point at a fixed location. There are two subsets of
fixed-point implementations: fractional and integer. In a fractional
fixed-point implementation, such as that provided on the Motorola 56K
series of DSPs, the binary point is always assumed to be to the left
of the most-significant digit. In an integer fixed-point implementation,
such as that provided by the Texas Instruments TMS320C54xx series of
DSPs, the binary point is to the right of the least-significant
digit. In either case, the arithmetic operations implemented in the
hardware are essentially integer, which results in a much simpler
arithmetic logic unit in hardware that allows lower cost, lower power
consumption, faster execution, smaller chip area, or a combination of
these, over that of floating-point implementations.
In essence, a fixed-point representation is a simple integer scaled
(divided) by a power of two. If we denote an unscaled integer variable
by upper case "X" and the scaled, fixed-point variable by lower case
"x," then x = X/2^b, where b is the number of digits the binary point
is shifted left. For example, if X is a 16-bit, two's complement
integer, and b=4, then "X" has values ranging from -2^(15) to
+2^(15)-1 and with minimum step size of 1, while the scaled value "x"
ranges from -2^(11) to +2^(11) - 1/(2^4) with a minimum step size of
1/(2^4).
Note that the value of "b" is not part of the representation. You
won't see it in a register or as part of the data anywhere; it is a
parameter that the algorithm implementer must determine and maintain.
Fixed-point representations place some very different rules on
operations than their floating-point counterparts. For example, two
variables must be scaled the same in order to be added (or
subtracted). Thus it may be necessary to shift one or the other
operand prior to adding. Another example is that when multiplying two
N-bit values with scale factors b0 and b1, the result is scaled
(b0+b1) and requires 2*N bits in general in order to avoid overflow
and maintain precision.
There are several other rules and considerations for fixed-point
arithmetic that are commonly encountered when implementing algorithms.
For more information, see
http://www.digitalsignallabs.com/papers.htm.
Randy Yates [yates@ieee.org]
Q2.1: Where can I get public domain algorithms for general-purpose DSP?
Updated 12/31/96
Netlib
EARN/BITNET: netlib%nac.no@norunix.bitnet
X.400: s=netlib; o=nac; c=no;
EUNET/uucp: nac!netlib
NSWC Library
Report No.: NSWC TR 90-21, January 1990
by Alfred H. Morris, Jr.
Dahlgren, VA 22448-5000
U.S.A.
IEEE Press book "Programs For Digital Signal Processing"
Q2.2: What are CELP and LPC? Where can I get the source for CELP and LPC?
Updated 09/10/01
NTIS
U.S. Department of Commerce
5285 Port Royal Road
Springfield, VA 22161
USA
(800) 553-6847
Q2.3: What is ADPCM? Where can I get source for it?
Updated: 04/03/01
ADPCM stands for Adaptive Differential Pulse Code Modulation. It is a
family of speech compression and decompression algorithms. A common
implementation takes 16-bit linear PCM samples and converts
them to 4-bit samples, yielding a compression rate of 4:1.
adpcm_coder(short inbuf[], char outbuf[], int nsample,
struct adpcm_state *state);
adpcm_decoder(char inbuf[], short outbuf[], int nsample,
struct adpcm_state *state);
Q2.4: What is GSM? Where can I get source for it?
Updated 4/27/00
Fax: +49.30.31425156, Phone: +49.30.31424315
Q2.5: How does pitch perception work, and how do I implement it on my DSP chip?
Updated 04/02/01
B.C.J. Moore, An Introduction to the Psychology of Hearing,
Academic Press, London, 1997.
Malcolm Slaney and Richard Lyon, "A Perceptual Pitch Detector,"
Proceedings of the International Conference of Acoustics, Speech,
and Signal Processing, 1990, Albuquerque, New Mexico.
Available for ftp at
ftp://worldserver.com/pub/malcolm/ICASSP90.psc.Z
Ray Meddis and M. J. Hewitt. "Virtual pitch and phase
sensitivity of a computer model of the auditory periphery. "
Journal of the Acoustical Society of America 89 (6 1991): 2866-2682.
and 2883-2894.
J. Goldstein, "An optimum processor theory for the
central formation of the pitch of complex tones," Journal
of the Acoustical Society of America 54, 1496-1516, 1973.
Boris Doval and Xavier Rodet, "Estimation of Fundamental Frequency
of Musical Sound Signals," Proceedings of the 1991 International
Conference on Acoustics, Speech, and Signal Processing, Toronto,
Volume 5, pp. 3657-3660.
B. Gold and L. Rabiner, "Parallel processing techniques for estimating
pitch periods of speech in the time domain," Journal of the Acoustical
Society of America, 46, pp 441-448, 1969.
Q2.6: What standards exist for digital audio? What is AES/EBU? What is S/PDIF?
Updates 1/8/97Q2.6.1: Where can I get copies of ITU (formerly CCITT) standards?
Q2.6.2: What standards are there for digital audio?
AES/EBU
S/P-DIF
Q2.7: What is mu-law encoding? Where can I get source for it?
Updated 9/13/99
Q2.8: How can I do CD <=> DAT sample rate conversion?
Updated 9/13/99
Q2.9: Wavelets
Updated 6/3/98
Q2.9.1 What are wavelets? Where can I get more information?
Q2.9.2 What are some good books and papers on wavelets
Wavelets and Signal Processing- Oliver Rioul and Martin Vetterli,
IEEE Signal Processing magazine, Oct. 91, pp 14-38
Randy K. Young, Wavelet Theory and Its Applications,
Kluwer Academic Publishers, ISBN 0-7923-9271-X, 1993.
Ali N. Akansu and Richard A. Haddad,
Multiresolution Signal Decomposition Transforms, Subbands, Wavelets
Academic Press, Inc., ISBN 0-12-047140-X
Wavelets and Filter banks: Theory and Design, IEEE Transactions on
Signal Processing, Vol. 40, No.9, Sept. 1992, pp 2207-2232
Q2.9.3: Where can I get some software for wavelets?
ftp://pascal.math.yale.edu/pub/wavelets/software/xwpl
Rice Wavelet Tools
Q2.10: How do I calculate the coefficients for a Hilbert transformer?
Updated 6/3/98
Q2.11: Algorithm implementation: floating-point versus fixed-point
Fixed-Point Arithmetic: The Basics