BDTI's DSP Insider Archives |
||
| HOME << |
||
This month:
Intrinsity Unveils 2 GHz Matrix ProcessorAt last month’s Embedded Processor Forum, Intrinsity revealed details of its MIPS32-based FastMIPS and FastMATH processors, the first processors to use Intrinsity’s Fast14 dynamic logic technology. According to Intrinsity, Fast14 uses several novel techniques to avoid common dynamic logic problems like susceptibility to noise. Intrinsity claims Fast14 is up to three times faster than conventional static logic and backed up this claim by demonstrating a 2.2 GHz Fast14-based test chip last year. Intrinsity expects to make 2 GHz samples of its processors available in the fourth quarter of 2002. If Intrinsity meets this ambitious goal, its chips will have a faster clock rates than all but a few processors; by comparison, the Pentium 4 currently tops out at 2.53 GHz. While it is easy to appreciate the benefits of a 2 GHz clock, high clock speeds do not always translate into high performance in signal-processing applications. To address the needs of these applications, the FastMATH processor includes a DSP-oriented coprocessor composed of a 4 X 4 matrix of processing elements. Each element contains an ALU, a multiply-accumulate unit, and local registers, and can communicate with other elements in the same row or column. These elements are fed by a 512-bit data bus that connects to a 1 GHz cache. The combination of high clock speed and matrix-processing capabilities will give the FastMATH processor impressive speed, particularly on algorithms that lend themselves to high levels of parallel processing on large blocks of data. For example, Intrinsity says that on a 1024-point FFT, the 2 GHz FastMATH processor will be about six times faster than Texas Instruments’ 600 MHz TMS320C64xx and about fifteen times faster than Motorola’s 300 MHz MSC8101. If Intrinsity delivers on its claims, the FastMATH processor will be the first embedded processor to achieve this level of DSP performance—and one of the first MIPS-based processors with serious DSP capabilities. Of course, the FastMATH processor will have a smaller speed advantage on algorithms that cannot exploit its matrix-processing capabilities. BDTI is currently using the BDTI Benchmark™ suite to evaluate the performance of the FastMATH processor on a range of DSP algorithms. Although the FastMATH processor promises impressive speed, this speed comes at the cost of high power consumption. Intrinsity projects the FastMATH processor will consume about 15 W, which is an order of magnitude higher than the power consumption of either the ’C64xx or the MSC8101. This high power consumption will be significant disadvantage, even in the infrastructure applications that the FastMATH processor targets. (Intrinsity estimates that its FastMIPS processor will consume about 10W.)
The FastMIPS and FastMATH processors are expected to be available in
sample quantities in the fourth quarter of 2002; full production is
planned for the third quarter of 2003. As of this writing, Intrinsity
had not released pricing for its processors.
ARC Addresses VoIP ApplicationsAt last month’s Embedded Processor Forum, ARC described extensions to its ARCtangent customizable processor core that target VoIP applications. The extensions include enhanced saturation and rounding support for existing ALU, shifter, and multiplier operations, and new operations like absolute value and negate. These instructions are intended to improve performance on applications that conform to the bit-exact ITU and ETSI specifications for voice compression algorithms such as G.729. Although the new instructions are useful mainly for compliance with bit-exact standards, other added features will be useful for a broad range of signal-processing applications. Most notably, the extensions greatly improve the processor’s address-generation capabilities. The existing core, the ARCtangent-A4, contains four address registers; each of these registers has a corresponding modifier register. The extensions expand the address registers to a total of eight and expand the modifier registers to two per address register. While expanding the address registers may seem like a trivial improvement, BDTI’s analysis shows otherwise. BDTI analyzed the existing ARCtangent-A4 core (which lacks the extensions described at the Forum) and found that restrictions on the address registers were a significant limitation when optimizing signal-processing algorithms. For example, four address pointers are barely sufficient for common signal-processing algorithms like FIR filters. Even fairly simple FIR filters may require four separate address pointers, one each for the input buffer, output buffer, delay line, and coefficients.
By recognizing that signal-processing performance depends not only on
fast multiplier hardware but also on features like specialized
arithmetic operations and flexible addressing, ARC may gain an
advantage over its competitors. For example, BDTI’s analysis of ARC
and ARM processors with similar multiplier hardware shows that a 170
MHz ARCtangent-A4 is about 25% faster than a 200 MHz ARM9E on typical
DSP tasks. (This analysis applies only to the older version of the
ARCtangent-A4 without the extensions described at the Forum.) It will
be interesting to see if ARC can widen this performance gap through
the addition of DSP features like those described at the Forum.
BDTI Case Study
This Month: Algorithm Transformations—More Than Meets the EyeOne of the most effective ways to optimize signal-processing software is to restructure the underlying algorithms to better match the processor’s capabilities. For example, a processor with low memory bandwidth may benefit from an algorithmic transformation that reduces the number of data transfers, even if this transformation increases the number of calculations required. Although many common signal-processing algorithms are fairly simple—leaving only limited opportunities for algorithmic transformation—some common algorithms lend themselves to tens or even of hundreds of approaches. The FFT is a prime example of this latter type of algorithm: it has dozens of variants such as decimation-in-time and decimation-in-frequency, each of which has numerous sub-variants. Because the FFT is a complicated algorithm, merely understanding the mathematics of each variant is a significant challenge; understanding how each variant maps to the target processor requires expert knowledge of both the algorithm and the processor. Identifying the best way to implement an algorithm is particularly difficult for complex, flexible processors. On a simple, inflexible processor, some approaches can be disqualified quickly due to an obvious mismatch. For example, some processors include multiply-add instructions, but not multiply-subtract instructions; in such cases, transformations that replace multiply-subtract operations with multiply-add operations have a clear advantage. On a more complex and/or more flexible processor, the advantages of different approaches are often far subtler. When both the algorithm and the processor are complicated, finding the best match between the two can be a major undertaking. One way to simplify the implementation selection process is to use a software library from the processor vendor or a third-party software developer. However, there is no guarantee that the library developer thoroughly studied the trade-offs between different implementations, or that the trade-offs of the chosen implementation match the needs of the application. For example, a library of speedy but memory-hungry algorithms is useless for an application with tight memory constraints. Another option is to engage the services of a software-development firm specializing in DSP algorithms. BDTI has been providing such services for over five years. In one recent project, a consumer electronics vendor engaged BDTI to implement an audio decompression algorithm on a PowerPC-based product. The CPU was hampered by a small data cache and slow off-cache memory accesses, placing a premium on efficient data memory usage. BDTI transformed the cycle-dominating algorithm to re-use data whenever possible, dramatically reducing the processing time required for the algorithm.
To find out how BDTI can help you create efficient DSP application
software for your products, please visit
http://www.BDTI.com/products/services_software.html or contact Jeremy
Giddings (giddings@BDTI.com).
Impulse Response, by Jeff Bier
Jack-of-All-TradesCustomizable processors were all the rage at this year’s Embedded Processor Forum. Vendors from Tensilica to Toshiba touted customizable processors as the ultimate solution for DSP applications from voice-over-IP to MPEG-4 video compression. In the view of these companies, processors with fixed instruction sets are forever bound to be jacks-of-all-trades, but masters of none. A better approach, they argue, is a flexible instruction set that designers can fine-tune to do one thing well. These vendors have a point: in most signal-processing applications, a few key functions occupy a huge chunk of the processing time. These critical functions are usually fairly simple, yet they often map poorly to standard instruction sets. Hence, minor instruction-set tweaks can result in enormous payoffs in speed, memory use, and energy efficiency. As was demonstrated at the Forum, adding a few custom instructions can improve a processor’s performance on key functions by tens or even hundreds of times. Unfortunately, this promised land of super-efficient hardware lies on the other side a foreboding development desert. Customizing a processor requires development of a custom chip, and the cost and difficultly of developing custom chips has been rising steadily, moving this option out of the reach of many product developers. Another cause for concern is the fact that customizable processors need customizable development tools—tools that must be rebuilt whenever the user adds new instructions. While neither of these problems is insurmountable, they add layers of complexity and opportunities for things to go wrong. Perhaps the greatest weakness of customizable processors is their very specialization. The individuality of each implementation hinders third-party software support and code reuse. Another key trade-off arises from the need to lock in instruction-set customizations early in the product design process. If the application mix changes late in the design process—or after production beginsthe benefits of prior customization may be lost.
Despite these concerns, the potential performance of customizable
processors will continue to draw adventurous designers. True,
customizable processors force product designers to make tough
trade-offs, but so do industry-standard fixed instruction-set
architectures. After all, making trade-offs is the essence of
engineering.
FPGAs for DSP—BDTI Technology AnalysisIn the first volume of BDTI’s series of technology analysis reports, BDTI examines the use of FPGAs in digital signal processing applications. FPGAs for DSP evaluates the latest DSP enhancements available on FPGAs and explains why FPGAs are a practical solution for some DSP applications. Includes:
FPGAs for DSP is scheduled for publication this summer.
For more information on BDTI Focus reports, go to
http://www.BDTI.com/products/reports_focus.html.
BDTI Offers Sounding Board for DSP Marketing PresentationsWant to improve the impact of your product presentations? BDTI’s Sounding Board service helps vendors of DSP-related products—chips, cores, tools, and software—develop marketing presentations that are accurate and compelling. BDTI’s expert analysts will review your presentation, host a Q&A session, and provide specific, detailed suggestions for improvements. BDTI’s experience in DSP product development, analysis, and reporting will help you achieve technical accuracy and ensure your message has the right focus for the target audience.
Interested vendors should contact Jeremy Giddings at BDTI
(giddings@BDTI.com) for further information.
BDTI Compiler EvaluationBaffled by competing vendor claims of compiler efficiency for DSP application development? BDTI has developed a methodology to quantifiably and qualitatively assess the relative merits of C compilers for DSP applications. Developed over several years, BDTI’s C compiler evaluation methodology is ready for commercial roll-out. BDTI welcomes a small number of early participants in a multi-client study.
Interested parties, processor vendors, and compiler vendors should
contact Jeremy Giddings at BDTI (giddings@BDTI.com) for further
information.
About BDTIBDTI is an independent source for DSP technology analysis and optimized DSP software. From rigorous technical analyses of processors for DSP, such as the Inside series of processor analyses, to highly regarded technology training classes, BDTI is the trusted independent source for reliable information on DSP technology.
For more information, visit our Web site at http://www.BDTI.com.
The next issue of BDTI’s DSP Insider is coming in July. Previous issues of BDTI’s DSP Insider are archived on BDTI’s Web site. Follow the link from http://www.BDTI.com/dspinsider/dspinsider.html. If you have comments, suggestions, or other feedback about the DSP Insider, please send email to dspinsider@BDTI.com. BDTI’s DSP Insider is a free monthly electronic newsletter published by Berkeley Design Technology, Inc. If our newsletter was forwarded to you and you would like to receive it regularly, please register at http://www.BDTI.com/dspinsider.htm.
If you no longer wish to receive the DSP Insider, send an email
message to dspinsider@BDTI.com with the words Remove me in the
subject line.
BDTI's DSP Insider © 2002 Berkeley Design Technology, Inc. |