Pipelining is a technique you can use to increase the throughput of the FPGA VI. In a pipelined design, you take advantage of the parallel processing capabilities of the FPGA to increase the efficiency of sequential code. To implement a pipeline, you must divide code into discrete steps and wire the inputs and outputs of each step to Feedback Nodes or shift registers in a loop.
In the following block diagram, subVIs A, B, and C execute in sequence within a single-cycle Timed Loop. As a result, the clock rate of the single-cycle Timed Loop must be set to accommodate the sum of the running times of all three running subVIs.
When you wire the inputs and outputs of the subVIs to Feedback Nodes, as shown in the following block diagram, LabVIEW pipelines the subVIs. Now, the subVIs execute in parallel, all within a single cycle, and the maximum clock rate is limited only by the subVI with the longest combinatorial path. By implementing a pipelined design, you might be able to increase the clock rate of the single-cycle Timed Loop and increase the throughput of the FPGA VI.
You also can use shift registers to implement a pipeline, as shown in the following block diagram.
When you implement a pipeline, the output of the final step lags behind the input by the number of steps in the pipeline, and the output is invalid for each clock cycle until the pipeline fills. The number of steps in a pipeline is called the pipeline depth, and the latency of a pipeline, measured in clock cycles, corresponds to its depth. For a pipeline of depth N, the result is invalid until the Nth clock cycle, and the output of each valid clock cycle lags behind the input by N-1 clock cycles.
Because there are three steps in this example (subVIs A, B, and C), the improved code results in a pipeline of depth 3. Therefore, the output is not valid until the third clock cycle, and the output of each valid clock cycle C always corresponds to the input from clock cycle C – (N – 1), as shown in the following illustration.
In this example, subVI A processes measurement 1 during clock cycle 1, while subVIs B and C both process the default value of the shift register, yielding invalid output. During clock cycle 2, subVI A processes measurement 2, subVI B processes the output of subVI A from clock cycle 1, and subVI C processes an invalid value, yielding invalid output. During clock cycle 3 the pipeline finally fills and the output from subVI C becomes valid for the first time. SubVI A processes measurement 3, subVI B processes the output of subVI A from clock cycle 2, and subVI C processes the output of subVI B from clock cycle 2, yielding the output that corresponds to measurement 1. After the pipeline is full, all subsequent clock cycles yield valid output, with a constant lag of 2 clock cycles.
|Note You must use caution to prevent undesired behavior due to the invalid outputs that occur at the beginning of pipelined execution. For example, you can use a Case structure to ensure that a control algorithm enables actuators only after N clock cycles elapse.|
You can use pipelining to increase throughput by compiling a single-cycle Timed Loop in a faster clock domain. For example, the top section of the illustration below shows the execution timing of a non-pipelined loop consisting of three subVIs, each of which requires a propagation delay of 12.5 ns. The total propagation delay from subVI A to subVI C is 37.5 ns, which is too long to compile at 40 MHz. The middle section of the illustration shows how pipelining the code reduces the propagation delay to 12.5 ns, allowing the loop to compile at 40 MHz. Because the propagation delay of the pipelined loop is only 12.5 ns, the loop can compile at a clock rate as high as 80 MHz, as shown in the bottom section of the illustration.
|Note A pipelined design increases latency over a non-pipelined design when measured in clock cycles. However, because pipelining allows you to decrease the cycle period, the overall latency measured in units of time should not change substantially.|