Academic Company Events NI Developer Zone Support Solutions Products & Services Contact NI MyNI

Document Type: Tutorial
NI Supported: Yes
Publish Date: Jan 8, 2010


Feedback


Yes No

Related Categories

Related Links - Developer Zone

Related Links - Products and Services

FPGAs - Under the Hood

15 ratings | 4.60 out of 5
Read in | Print | PDF

Overview

High-level design tools offer field-programmable gate array (FPGA) technology to engineers and scientists who have little or no digital hardware design expertise. Whether you use graphical programming, C, or VHDL, the synthesis process is quite complex and can leave you wondering how FPGAs really work. What actually happens inside the chip to make programs execute within configurable blocks of silicon? This white paper is intended for the nondigital designer who wants to understand the fundamental parts of an FPGA and how it all works “under the hood.” This information is still helpful when using high-level design tools, and can hopefully shed some light on the inner workings of an extraordinary technology.

Field Programmable Gate Arrays

Every FPGA chip is made up of a finite number of predefined resources with programmable interconnects to implement a reconfigurable digital circuit.


[+] Enlarge Image

Figure 1. The Different Parts of an FPGA

FPGA chip specifications include the amount of configurable logic blocks, the number of fixed function logic blocks, such as multipliers, and size of memory resources like embedded block RAM. There are many other parts to an FPGA chip, but these are typically the most important when selecting and comparing FPGAs for a particular application.

At the lowest level, configurable blocks of logic, such as slices or logic cells, are made up of two basic things: flip-flops and look-up tables (LUTs). This is important to note because the various FPGA families differ in the way flip-flops and LUTs are packaged together. Virtex-II FPGAs for example, have slices with two LUTs and two flip-flops, whereas Virtex-5 FPGAs have slices with four LUTs and four flip-flops. The LUT architecture itself may also differ (4-input versus 6-input) but more details on how LUTs work will be provided in a later section.

Table 1 lists the specifications of the FPGAs used in LabVIEW FPGA hardware targets. The number of gates has traditionally been a way to compare FPGA chips to ASIC technology, but it does not truly describe the number of individual components inside an FPGA. This is one of the reasons why Xilinx did not specify the number of gates for the new Virtex-5 family.

 

Gates

Flip-Flops

LUTs

Multiplier

Block RAM (kb)

Virtex-II 1000

1 million

10,240

10,240

40

720

Virtex-II 3000

3 million

28,672

28,672

96

1,728

Spartan-3 1000

1 million

15,360

15,360

24

432

Spartan-3 2000

2 million

40,960

40,960

40

720

Virtex-5 LX30

720

19,200

19,200

32

1,152

Virtex-5 LX50 ----- 28,800 28,800 48 1,728
Virtex-5 LX85 ----- 51,840 51,840 48 3,456
Virtex-5 LX110 ----- 69,120 69,120 64 4,608

Table 1. FPGA Resource Specifications for Various Families

To understand these specifications better, consider the way code is synthesized into digital circuitry.  Synthesis is the process of translating high-level programming languages into true hardware implementations.  For any given piece of synthesizable code, either graphical or textual, there is a corresponding circuit schematic that describes how logic blocks should be wired together.  The LabVIEW FPGA Module adds additional logic around every block diagram function before sending final schematic to the compiler.  Let’s examine a small section of block diagram code to see what the corresponding schematic looks like. Figure 2 shows an example of five Boolean signals being past into a grouping of Boolean functions to graphically calculate a single binary value.

Figure 2. Small Section of a LabVIEW Block Diagram with Simple Boolean Logic  

Under normal conditions (outside the LabVIEW single-cycle timed loop), the corresponding circuit schematic that results from the Figure 2 block diagram section looks like Figure 3.

Figure 3. Circuit Schematic Corresponding to Boolean Logic in Figure 2

It may be difficult to see, but there are actually two parallel branches of circuitry that are created. The five topmost black wires feed into the first branch, which adds a flip-flop between each Boolean operation.  The five bottommost black wires go to a second chain of logic to with the same number of flip-flops, which is created to keep track of the number of clock cycles needed to propagate data through the digital circuit. In total, 12 flip-flops and 12 LUTs are used implement this schematic. The upper branch and each component are analyzed in the following sections.

Flip-Flops

Figure 4. Flip-Flop Symbol

Flip-flops are binary shift registers used to synchronize logic and save logical states between clock cycles. On every clock edge, a flip-flop latches the 1 or 0 (TRUE or FALSE) value on its input and holds that value constant until the next clock edge. Under normal conditions, LabVIEW FPGA places a flip-flop between every single operation to maximize the propagation time available for each operation to execute. The exception to this rule is when code is placed into a single-cycle timed loop structure. In this special loop structure, flip-flops are added only at the beginning and end of the loop iteration, and it is up to the programmer to understand timing considerations. Further details on how code within a single-cycle timed loop is synthesized are discussed in a later section. Figure 5 shows the upper branch of Figure 3, with flip-flops highlighted in red.


[+] Enlarge ImageFigure 5. Schematic Drawing with Flip-Flops Highlighted in Red

Look-Up Tables (LUTs)

 

Figure 6. Four-Input LUT

The remaining logic in the schematic shown in Figure 6 is implemented using very small amounts of RAM in the form of LUTs. It is easy to assume that the number of system gates in an FPGA refers to the number of NAND gates and NOR gates in a particular chip, but, in reality, all combinatorial logic (ANDs, ORs, NANDs, XORs, and so on) is implemented as truth tables within LUT memory. A truth table is a predefined list of outputs for every combination of inputs. (Faded visions of Karnaugh maps might be flashing through your head right now)

Here is the quick refresher from digital logic class:

The Boolean AND operation, for example, is shown in Figure 7:

Figure 7. Boolean AND Operation

 The corresponding truth table for the two inputs of an AND operation is shown in Table 2.

Input 1

 Input 2 Output

0

0

0

0

1

0

1

0

0

1

1

1

Table 2. Truth Table for Boolean AND Operation

You also can think of the inputs as the numerical index for all possible outputs, as shown in Table 3.

LUT Index

Output

0 (00)

0

1 (01)

0

2 (10)

0

3 (11)

1

 Table 3. LUT Implementation of Truth Table for Boolean AND Operation

Virtex-II and Spartan-3 FPGAs have four-input LUTs to implement truth tables with up to 16 combinations of four input signals. Figure 8 is an example of a four-input circuit implementation.

Figure 8. Circuit of Four Input Signals to Boolean Logic

Table 4 shows the corresponding truth table you would implement within a four-input LUT.

LUT Index

Output

0 (0000)

1

1 (0001)

1

2 (0010)

1

3 (0011)

0

4 (0100)

0

5 (0101)

0

6 (0110)

0

7 (0111)

1

8 (1000)

0

9 (1001)

0

10 (1010)

0

11 (1011)

1

12 (1100)

0

13 (1101)

0

14 (1110)

0

15 (1111)

1

Table 4. Corresponding Truth Table for Circuit Shown in Figure 8

FPGAs in the Virtex-5 family use six-input LUTs, which implement truth tables with up to 64 combinations of six different input signals. This becomes increasingly important when using single-cycle timed loops in LabVIEW FPGA, as the combinatorial logic between flip-flops can become very complex. The next section describes how single-cycle timed loops optimize FPGA resource usage in LabVIEW.

Single-Cycle Timed Loops

The example code used in previous sections assumed that code was placed outside a single-cycle timed loop, and additional circuitry was synthesized to ensure synchronous dataflow execution. The single-cycle timed loop is a special structure in LabVIEW FPGA that generates a much more optimized circuit schematic, with the expectation that all branches of logic can execute within a single clock cycle. If a single-cycle timed loop is configured to run at 40 MHz, for example, all branches of logic must execute within a clock tick of 25ns.

If the same Boolean logic from a previous example were placed inside a single-cycle timed loop, as shown in Figure 9, the corresponding circuit schematic that is generated is shown in Figure 10.


[+] Enlarge Image

Figure 9. Simple Boolean Logic within a Single-Cycle Timed Loop

Figure 10. Circuit Schematic Corresponding to Boolean Logic in Figure 9

When compared to the previous schematic shown in Figure 3, it is clear that this implementation is much simpler. The logic between the flip-flops would require at least two 4-input LUTs on a Virtex-II or Spartan-3 FPGA (shown in Figure 11).

Figure 11. Four-Input LUT Implementation of Circuit Schematic in Figure 10

Since Virtex-5 FPGAs have 6-input LUTs, the exact same logic could be implemented within a single LUT (shown in Figure 12).

Figure 12. Six-Input LUT Implementation of Circuit Schematic in Figure 10

The single-cycle timed loop used in this example (Figure 9) is configured to run at 40 MHz, which means that the logic between any given flip-flop must execute within one clock tick of 25 ns. The maximum speed at which code can execute is dependent on the propagation of electrons through the circuit. The branch of logic with the longest propagation delay is known as the critical path, and it determines the theoretical maximum clock speed for that part of the circuit. The six-input LUTs on Virtex-5 FPGAs not only reduce the total number of LUTs needed to implement a given piece of logic but also reduce the propagation delay of electrons through that piece of logic. This means that you can configure the same single-cycle timed loop for faster clock rates simply by choosing a Virtex-5-based hardware target.

For more information on the benefits of Virtex-5 FPGAs, please see Advantages of Xilinx Virtex-5 FPGAs whitepaper.

Multipliers and DSP slices

Figure 13. Multiply Function

The seemingly simple task of multiplying two numbers together can get extremely resource-intensive and complex to implement in digital circuitry. To provide some frame of reference, Figure 14 is the schematic drawing of one way to implement a 4-bit by 4-bit multiplier using combinatorial logic.

Figure 14. Schematic Drawing of a 4-Bit by 4-Bit Multiplier

Now imagine multiplying two 32-bit numbers together, and you end up with more than 2000 operations for a single multiply. Because of this, FPGAs have prebuilt multiplier circuitry to save on LUT and flip-flop usage in math and signal processing applications. Virtex-II and Spartan-3 FPGAs have 18-bit by 18-bit multipliers, so multiplying two 32-bit numbers together actually requires three multipliers for a single operation. Many signal processing algorithms involve keeping the running total of numbers being multiplied, and, as a result, higher-performance FPGAs like Virtex-5 have prebuilt multiplier-accumulate circuitry. These prebuilt processing blocks, also known as DSP48 slices, integrate a 25-bit by 18-bit multiplier with adder circuitry. LabVIEW FPGA, however, uses the multiplier functionality independently. Table 5 shows multiplier resources for various FPGA families.

  Number
of
Multipliers
Type
Virtex-II 1000 40 18x18
Virtex-II 3000 96 18x18
Spartan-3 1000 24 18x18
Spartan-3 2000 40 18x18
Virtex-5 LX30 32 DSP48 Slices
Virtex-5 LX50 48 DSP48 Slices
Virtex-5 LX85 48 DSP48 Slices
Virtex-5 LX110 64 DSP48 Slices

Table 5. Multiplier Resources for Various FPGAs

Block RAM

Memory resources are another key specification to consider when selecting FPGAs. User-defined RAM, embedded throughout the FPGA chip, is useful for storing datasets or passing values between parallel loops. Depending on the FPGA family, you can configure the onboard RAM in blocks of 16 or 36 kb. You still have the option to implement datasets as arrays using flip-flops, however, large arrays quickly become expensive for FPGA logic resources. A 100-element array of 32-bit numbers could consume more than 30 percent of the flip-flops in a Virtex-II 1000 FPGA or take up less than 1 percent of the embedded block RAM. Digital signal processing algorithms often need to keep track of an entire block of data, or the coefficients of a complex equation, and without onboard memory, many processing functions do not fit within the configurable logic of an FPGA chip. Figure 15 shows the graphical functions for reading and writing to memory using block RAM.


[+] Enlarge Image

Figure 15. Block RAM Functions for Writing and Reading to Memory

You can also use memory blocks to hold periodic waveform data for onboard signal generation by storing one complete period as a table of values and indexing through the table sequentially. The ultimate frequency of the output signal is determined by the rate at which values are indexed, and you can use this method for dynamically changing the output frequency without introducing sharp transitions in the waveform.


[+] Enlarge Image

Figure 16. Block RAM Functions for FIFO Buffers

The inherent parallel execution of FPGAs allows for independent pieces of hardware logic to be driven by different clocks. Passing data between logic running at different rates can be tricky, and onboard memory is often used to smooth out the transfer using first-in-first-out (FIFO) buffers. You can configure FIFO buffers, shown in Figure 16, for different sizes and help to ensure that data is not lost between asynchronous parts of the FPGA chip. Table 6 shows the user-configurable block RAM embedded in various FPGA families.

   Total RAM
(kbits)
Blocks Size
(kbits)
Virtex-II 1000 1728 16
Virtex-II 3000 720 16
Spartan-3 1000 432 16
Spartan-3 2000 720 16
Virtex-5 LX30 1152 36
Virtex-5 LX50 1728 36
Virtex-5 LX85 3456 36
Virtex-5 LX110 4608 36

Table 6. Memory Resources for Various FPGAs

Conclusion

The adoption of FPGA technology continues to increase as higher-level tools evolve and further abstract the concepts described in this white paper. It is still important, however, to look inside the FPGA and appreciate how much is actually happening when block diagrams are compiled down to execute in silicon. Comparing and selecting hardware targets based on flip-flops, LUTs, multipliers, and block RAM is the best way to choose the right FPGA chip for your application.  Understanding resource usage is extremely helpful during development, especially when optimizing for size and speed. These fundamental building blocks are not meant to be a comprehensive list all resources and there are many other parts to an FPGA that were not discussed. You can continue to learn more about FPGAs and digital hardware design through recommended resources below:

Additional Resources

The Design Warrior's Guide to FPGAs - by Clive "Max" Maxfield

Browse Customer Solutions Using FPGA Technology 

Video: Introduction to LabVIEW FPGA

An Introduction to FPGA Technology: Read the Top Five Benefits

Learn about R Series Intelligent DAQ

National Instruments LabVIEW FPGA Module

15 ratings | 4.60 out of 5
Read in | Print | PDF

Reader Comments | Submit a comment »

More reference material
Thank you for this very good introduction. Could you please suggest some recent books that would help someone with a Software Engineering background to learn more about FPGA (and perhaps related, similar) technology. Ideally a book that had a section covering the software used to program these devices.
- juan.jimenez@pobox.com - Feb 3, 2009

FPGA tables have been updated
Hello, All information has been updated to include the Virtex-5 LX110. Regards, Vineet A.
- Nov 12, 2008

Update for new Virtex 5 Chips
Will this tutorial be updated to include the specs for the Virtex 5 LX110 chip so a comparison can be made with all the available options from NI?
- Brandon Settles, Bechtel Bettis Inc.. settlesj@bettis.gov - Aug 26, 2008

 

Legal
This tutorial (this "tutorial") was developed by National Instruments ("NI"). Although technical support of this tutorial may be made available by National Instruments, the content in this tutorial may not be completely tested and verified, and NI does not guarantee its quality in any way or that NI will continue to support this content with each new revision of related products and drivers. THIS TUTORIAL IS PROVIDED "AS IS" WITHOUT WARRANTY OF ANY KIND AND SUBJECT TO CERTAIN RESTRICTIONS AS MORE SPECIFICALLY SET FORTH IN NI.COM'S TERMS OF USE (http://ni.com/legal/termsofuse/unitedstates/us/).