|LabVIEW 2016 FPGA Module Help|
|LabVIEW 2017 FPGA Module Help|
|LabVIEW 2018 FPGA Module Help|
|LabVIEW 2019 FPGA Module Help|
|LabVIEW 2020 FPGA Module Help|
Pipelining is a technique you can use to increase the clock rate and throughput of an FPGA VI. Pipelined designs take advantage of the parallel processing capabilities of the FPGA to increase the efficiency of sequential code. To implement a pipeline, divide code into discrete steps and wire the inputs and outputs of each step to Feedback Nodes or shift registers in a loop.
The following sections demonstrate FPGA VIs with standard execution versus pipelined execution in a single-cycle Timed Loop.
In the following block diagram, subVIs A, B, and C execute in sequence within a single-cycle Timed Loop. As a result, you set the clock rate of the single-cycle Timed Loop to accommodate the sum of the running times of all three running subVIs.
In the following block diagram, LabVIEW pipelines the subVIs because the inputs and outputs of the subVIs are wired to Feedback Nodes. In this FPGA VI, the subVIs execute in parallel, all within a single cycle, and the maximum clock rate is limited only by the subVI with the longest combinatorial path.
You also can use shift registers to implement pipelined code, as shown in the following block diagram.
Consider the following behaviors when implementing pipelined code:
Consider the following example.
In this example, there are three separate execution steps in executing subVIs A, B, and C, resulting in a pipeline depth of three. Because this code requires three execution steps, the output is not valid until Clock Cycle 3. The output of each valid clock cycle C always corresponds to the input from clock cycle C – (N – 1).
|Clock Cycle 1||In Clock Cycle 1, subVI A processes the first measurement (Meas1), while subVI B and subVI C both process the default value of the shift register (Default), yielding an invalid output.|
|Clock Cycle 2||During Clock Cycle 2, subVI A processes the second measurement (Meas2), subVI B processes the output of subVI A from Clock Cycle 1, and subVI C processes an invalid input from subVI B, yielding an invalid output.|
|Clock Cycle 3||During Clock Cycle 3, the pipeline finally fills, as all inputs are valid, and the output from subVI C is valid for the first time. subVI A processes the third measurement (Meas3), subVI B processes the output of subVI A from Clock Cycle 2, and subVI C processes the output of subVI B from Clock Cycle 2, yielding the output that corresponds to the first measurement (Meas1). After the pipeline is full, all subsequent clock cycles yield valid output, with a constant lag of two clock cycles.|
|Tip Consider using a Case structure to avoid undesired behavior that results from invalid outputs and to ensure that a control algorithm enables actuators only after N clock cycles elapse.|
You can use pipelining to increase throughput by compiling a single-cycle Timed Loop in a faster clock domain.
The top section of the illustration shows the execution timing of a non-pipelined loop. This code consists of three subVIs, each of which requires a propagation delay of 12.5 ns. The total propagation delay from subVI A to subVI C is 37.5 ns, which is too long to compile at 40 MHz.
The middle section of the illustration shows how pipelining the code reduces the propagation delay to 12.5 ns, allowing the loop to compile at 40 MHz.
The bottom section of the illustration shows that the loop compiles at a clock rate as high as 80 MHz because the propagation delay of the pipelined loop is only 12.5 ns.
|Note A pipelined design increases latency more so than a non-pipelined design when measured in clock cycles. However, because pipelining allows you to decrease the cycle period, the overall latency measured in units of time should not change substantially.|