## **Choosing and Using DSPs**





































































| Software Pipelining<br>Dot Product: Initial Implementation |          |                                           | <b>BD</b> Ti |
|------------------------------------------------------------|----------|-------------------------------------------|--------------|
| LOOP                                                       | <br>LDRD | r6, [r0], #8                              |              |
|                                                            | LDRD     | r10, [r1], #8                             |              |
|                                                            | SUBS     | r2, r2, #4                                |              |
|                                                            | SMLAD    | <b>2 stall cycles</b>                     |              |
|                                                            | UNITE    | 1 stall cycle                             |              |
|                                                            | SMLAD    | r12, r7, r11, r12                         |              |
|                                                            | BGT      | LOOP                                      |              |
|                                                            |          |                                           |              |
|                                                            |          | Effective throughput: 0.44 MACs per cycle |              |
| © 2006 BDTI                                                |          |                                           | 34           |









## **Choosing and Using DSPs**

