Optimizing FPGA VIs Using Pipelining (FPGA Module)

LabVIEW 2018 FPGA Module Help


Edition Date: March 2018
Part Number: 371599P-01
View Product Info

DOWNLOAD (Windows Only)


LabVIEW 2015 FPGA Module Help
LabVIEW 2016 FPGA Module Help
LabVIEW 2017 FPGA Module Help
LabVIEW 2018 FPGA Module Help
LabVIEW 2019 FPGA Module Help

Pipelining is a technique you can use to increase the clock rate and throughput of an FPGA VI. Pipelined designs take advantage of the parallel processing capabilities of the FPGA to increase the efficiency of sequential code. To implement a pipeline, divide code into discrete steps and wire the inputs and outputs of each step to Feedback Nodes or shift registers in a loop.

The following sections demonstrate FPGA VIs with standard execution versus pipelined execution in a single-cycle Timed Loop.

Standard Execution in a Single-Cycle Timed Loop

In the following block diagram, subVIs A, B, and C execute in sequence within a single-cycle Timed Loop. As a result, you set the clock rate of the single-cycle Timed Loop to accommodate the sum of the running times of all three running subVIs.

Pipelined Execution in a Single-Cycle Timed Loop Using Feedback Nodes

In the following block diagram, LabVIEW pipelines the subVIs because the inputs and outputs of the subVIs are wired to Feedback Nodes. In this FPGA VI, the subVIs execute in parallel, all within a single cycle, and the maximum clock rate is limited only by the subVI with the longest combinatorial path.

Pipelined Execution in a Single-Cycle Timed Loop Using Shift Registers

You also can use shift registers to implement pipelined code, as shown in the following block diagram.

Implementing Pipelined Code

Consider the following behaviors when implementing pipelined code:

  • The output of the final step lags behind the input by the number of steps in the pipeline.
  • The output is invalid for each clock cycle until the pipeline fills.
  • The number of steps in a pipeline is called the pipeline depth.
  • The latency of a pipeline, measured in clock cycles, corresponds to its depth. For a pipeline of depth N, the result is invalid until the Nth clock cycle, and the output of each valid clock cycle lags behind the input by N-1 clock cycles.

Consider the following example.

In this example, there are three separate execution steps in executing subVIs A, B, and C, resulting in a pipeline depth of three. Because this code requires three execution steps, the output is not valid until Clock Cycle 3. The output of each valid clock cycle C always corresponds to the input from clock cycle C – (N – 1).

Clock Cycle Description
Clock Cycle 1 In Clock Cycle 1, subVI A processes the first measurement (Meas1), while subVI B and subVI C both process the default value of the shift register (Default), yielding an invalid output.
Clock Cycle 2 During Clock Cycle 2, subVI A processes the second measurement (Meas2), subVI B processes the output of subVI A from Clock Cycle 1, and subVI C processes an invalid input from subVI B, yielding an invalid output.
Clock Cycle 3 During Clock Cycle 3, the pipeline finally fills, as all inputs are valid, and the output from subVI C is valid for the first time. subVI A processes the third measurement (Meas3), subVI B processes the output of subVI A from Clock Cycle 2, and subVI C processes the output of subVI B from Clock Cycle 2, yielding the output that corresponds to the first measurement (Meas1). After the pipeline is full, all subsequent clock cycles yield valid output, with a constant lag of two clock cycles.
Tip  Consider using a Case structure to avoid undesired behavior that results from invalid outputs and to ensure that a control algorithm enables actuators only after N clock cycles elapse.

Pipelining to Increase Throughput

You can use pipelining to increase throughput by compiling a single-cycle Timed Loop in a faster clock domain.

Non-Pipelined (40 MHz)

The top section of the illustration shows the execution timing of a non-pipelined loop. This code consists of three subVIs, each of which requires a propagation delay of 12.5 ns. The total propagation delay from subVI A to subVI C is 37.5 ns, which is too long to compile at 40 MHz.

Pipelined (40 MHz)

The middle section of the illustration shows how pipelining the code reduces the propagation delay to 12.5 ns, allowing the loop to compile at 40 MHz.

Pipelined (80 MHz)

The bottom section of the illustration shows that the loop compiles at a clock rate as high as 80 MHz because the propagation delay of the pipelined loop is only 12.5 ns.

Note  A pipelined design increases latency more so than a non-pipelined design when measured in clock cycles. However, because pipelining allows you to decrease the cycle period, the overall latency measured in units of time should not change substantially.

Related Information

Feedback Node

Timed Loop (Single-Cycle)

Implementing Multiple Clock Domains

Understanding Timing Considerations for FPGA VIs

WAS THIS ARTICLE HELPFUL?

Not Helpful