http://www.markharvey.info

Reducing Power Consumption in Xilinx FPGAs — Part 2

BlockRAMs

BlockRAMs are one of the most commonly used resources on a Xilinx FPGA, but they are also amongst the most power-hungry. The dominating factor in BlockRAM power is the amount of time that the BlockRAM is enabled. The single most effective way of reducing BlockRAM power is by using the enable pins (ENA and ENB) that are provided.

Many designers take advantage of the fact that these enable pins can be connected directly to a logic high level - this simplifies the design as the only other strobe that needs to be handled is the write enable (WEA and WEB). However it also a guarantee of maximum power consumption.

Consider two identical memory blocks each made up of 32 BlockRAM primitives where the memories are accessed just 40% of the actual operating time. We can use the XPE spreadsheet to do a "what-if" analysis and compare the power consumption when the enable is tied permanently to a logic high to when it is activated only 40% of the time:



Figure 1: BlockRAM power consumption comparison

As we can clearly see, the effect of leaving the enable pins permanently high is to increase power consumption from 173mW to 432mW. BlockRAM should only be enabled when it is actually being accessed for read or writes.

Another way to help reduce power consumption is change the WRITE_MODE attribute from its default value of WRITE_FIRST to NO_CHANGE. This will stop the outputs of the BlockRAMs from toggling unnecessarily and stop switching activity rippling down the logic that is connected to the data outputs. Obviously, not all designs can make use of this trick.

It should be noted that stopping the clock connected to the BlockRAM will also reduce its dynamic power to zero. We will discuss clock gating in another part of this series.

Power Optimized BlockRAM Arrays

Another method for power reduction when using multiple BlockRAMs as a memory array is create an architecture that ensures that the minimum number of BlockRAMs are enabled at any one time. Consider the case of a 2k x 36bit memory array. The most obvious way of implementing this is with four BlockRAMs, each configured as 2k x 9bits:



Figure 2: 2k x 36bit BlockRAM array, maximum speed and power consumption

This configuration guarantees maximum possible performance but also maximum power consumption as all four BlockRAMs are enabled for any access.

We can rearrange this configuration to use four BlockRAMs, each configured as 512 x 36bits. With the addition of some decoding logic on the address bus and some multiplexing on the output data bus, only one of the BlockRAMs is enabled during accesses:



Figure 3: 2k x 36bit BlockRAM array, optimized for power consumption

This kind of structure can easily be hand-coded, or it can be selected as an option in CoreGen:



Figure 4: CoreGen power consumption option

It should be noted that the added logic for the decoding and output multiplexing will certainly have an impact on total system performance which could perhaps be improved by adding pipelining.

For small memory arrays it may be better to implement them with distributed RAM instead of BlockRAM. Let's compare the case of sixteen small memory arrays, each 32 words x 16bits. The BlockRAM implementation consumes approximately ten times the DistributedRAM version:





Figure 5: BlockRAM vs DistributedRAM

DSP48s

The scope for power reduction in DSP48s is much more limited. These blocks really don't contribute much to the overall power consumption. Virtually the only thing that the user can control is the use of the pre-adder (not available on all families) and the M-register.



Figure 6: Virtex-6 DSP48E1s with and without M-register

In the next part of this series, I will look at clocks, PLLs, DCMs and clock gating.