space
space
BDTI Solution Certification™ Results for H.264 Decoding
BDTI
HOME << FREE INFO << BDTI
space
 

BDTI Solution Certification™ for H.264 Decoding
on the ARC AV 401 Video Subsystem

Operating Point: Baseline Profile, D1 Resolution, 30 fps, 1.5 Mbps

H.264 decoder solution performance is reported as the minimum clock rate required to decode BDTI’s Primary Operating Point H.264 bitstream in real-time.  Two important factors that affect the minimum required clock rate are the number of output “delay buffers” used and the performance of external (main) memory.  In recognition of these factors, BDTI has chosen to present the minimum clock rate required by the ARC AV 401 Subsystem for real-time operation for a number of output delay buffer sizes and a range of external memory access times (see Figures 1 and 2).

In the figure, “0 buffers” (i.e., no buffering of output frames) indicates the clock rate required to process the single most processing intensive frame in the video clip in real-time (i.e., 1/30th of a second). Adding delay buffers (each of which holds one decoded frame) smooths the processing load across multiple frames and significantly reduces the required clock rate. For the ARC AV 401 Subsystem, using three buffers results in a minimum required clock rate essentially equal to the minimum clock rate achievable (i.e., the average per-frame processing over the entire video clip).The 3-buffer case is the typical output buffering used in real-world applications; the 0-buffer case is not typical and would only be used in extremely delay sensitive applications.

BDTI uses the three parameters identified below when describing the external memory access timing characteristics of a device undergoing H.264 Solution Certification. In these descriptions we use the term “burst” to refer to a sequence of accesses to words located in consecutive memory locations (where the size of each word is the equal to external bus width). For DSP-intensive algorithms such as H.264, external memory accesses are often made in bursts.

  1. Memory-Processor clock ratio: This is the ratio of the processor speed to the maximum external memory speed. For example, a Memory-Processor clock ratio of 1 indicates that the processor and memory operate at the same speed (i.e., the external memory is capable of supporting an access every processor clock cycle). However, a memory-processor clock ratio of 3 combined with a 300 MHz processor clock speed would result in a 100 MHz external memory speed (i.e., each external memory access requires a minimum of 3 processor clock cycles).

  2. Non-sequential external memory stall multiplier: This number multiplied by the memory-processor clock ratio results in the latency associated with accessing a random memory location (e.g., the first access in a burst) measured in processor clock cycles. For example, if the memory-processor ratio is 6 and the non-sequential external stall multiplier is 5, then each non-sequential memory access will require 30 processor clock cycles.

  3. Sequential external memory stall multiplier: This number multiplied by the memory-processor clock ratio results in the latency associated with accessing a sequential memory location (e.g., in a burst access, all subsequent contiguous accesses after the first non-sequential one) measured in processor clock cycles. For example, if the memory-processor ratio is 6 and the sequential external stall multiplier is set to 1, then each sequential memory access will require 6 processor cycles.

These parameters provide a reasonable first-order model for the performance of a solution using typical external memory devices, such as DDR2. However, the actual memory controller used in a final system may have different characteristics, and thus impact the performance of a solution.

The table following the figures summarizes the complete Solution Certification results and shows the solution clock rate required only for the minimum external memory access times reported.

Figures 1 and 2 show the minimum solution clock rate required for real-time operation, where lower is better. The only difference between Figures 1 and 2 is in the expression of the external memory access delays. In the first graph, external memory access delay is expressed in time (ns), and in the second graph delay is expressed in wait states (clock cycles). Note that since the memory-processor clock ratio is 1 in all cases for the ARC AV 401, the non-sequential and sequential external memory stall multipliers will be equal to the non-sequential and sequential memory wait states, respectively.

For more information about the reported H.264 Solution Certification performance metrics click here.

Figure 1.

Note that since the memory-processor clock ratio is 1, the non-sequential external memory access time is equal to the non-sequential external memory stall divided by the clock rate.The following parameters are constant for all “Non-sequential external access times” values shown in the above graph:

  • Memory-processor clock ratio: 1
  • Sequential external memory stall multiplier: 1
Figure 2.

Note that since the memory-processor clock ratio is 1, the non-sequential external memory wait states is equal to the non-sequential external memory stall multiplier. The following parameters are constant for all “Non-sequential external wait states” values shown in the above graph:

  • Memory-processor clock ratio: 1
  • Sequential external memory stall multiplier: 1

BDTI H.264 Video Decoder Solution Certification Performance
Primary Operating Point: Baseline Profile, D1 (720x480) Resolution, 30 fps, 1.5 Mbps

 

Minimum Clock Rate (MHz)

External Memory Bandwidth Utilization (MBps)

Program Memory Usage (bytes)

Static data memory usage (bytes)

Dynamic Memory Usage (bytes)

Buffering delay (seconds)

Average over entire clip 160 120 111k 19k N/A N/A
Buffering 3 frames 160 120 111k 19k 4.5M 0.100
Buffering 2 frames 162 121 111k 19k 4M 0.067
Buffering 1 frames 198
123
111k
19k
3.5M
0.033
No buffering—
highest CPU load frame
330
133
111k
19k
3M
0
Estimated energy consumption1 3.9 mJ/frame (116.4 mW average, 0.65 mW/MHz for I Frames
and 0.73 mW/MHz for P Frames)
Cost (silicon area) for licensable IP 9.1 mm2
Cost (dollars) for chips and external devices External 32-bit memory required—
cost depends on type and latency of device chosen
Table 1.  Summary of Performance for the ARC AV 401 Subsystem
  1. Simulation-based energy estimate assuming a 160 MHz solution clock rate.  All licensable IP results are based on standard BDTI core conditions (TSMC CL130G process and Artisan SAGE-X library.  See BDTI Core Conditions for more information).
No reproduction or reuse is permitted without the express authorization of BDTI.

Results for licensable silicon IP solutions are based on BDTI Core Conditions,which specify a set of uniform conditions (fabrication process, temperature, voltage, etc.) used to derive the clock speed, area, and power consumption for each solution. To obtain BDTI Solution Certification for your processing engine, please contact BDTI.

Top of page