Academic Company Events NI Developer Zone Support Solutions Products & Services Contact NI MyNI

High-Speed Data Streaming: Programming and Benchmarks

3 ratings | 4.00 out of 5
Read in | Print

Overview

PXI Express is changing the way engineers design systems. This document discusses the technology that enables high speed data streaming, application design that maximizes system streaming performance, and data rate benchmarks that can be achieved in stream-to-disk and stream-to-memory applications.

Introduction

Many engineers utilize “streaming,” but for numerous applications, data cannot be generated or acquired fast enough. In these situations, engineers must compromise by using a slower sample rate to transfer data over the bus or by sampling at the necessary high speeds for the short periods of time that onboard instrument memory allows. Neither sacrifice is desirable.

Traditionally, benchtop instrumentation systems such as oscilloscopes, logic analyzers, and arbitrary waveform generators have implemented limited data streaming. Although many instruments have incredibly fast sampling rates and high bandwidths, the bus that interfaces with the PC to return data to the user is often overlooked, yet it can dramatically increase overall test times. For example, the majority of acquisitions performed with stand-alone oscilloscopes are finite. The duration of the acquisition is dictated by the amount of onboard memory available in the oscilloscope (a stand-alone arbitrary waveform generator has the same limitation, except the waveform is downloaded to the onboard device memory for generation). After the acquisition is complete, the data is transferred to the controlling PC using Ethernet or, more commonly, GPIB. Consider a case where data is sampled at 1 GS/s after an event trigger. If the device has 256 MB of onboard memory per channel, the memory would be full and end acquiring after about 250 ms. If the instrument interfaces using the GPIB bus (which has a bandwidth of about 1 MB/s), the user must wait almost 4.5 minutes (250 s) for this data to be transferred to the computer for analysis. Now compare this to an NI digitizer/oscilloscope with the same sample rate and onboard memory. The same data transfer would take fewer than 3 seconds over the high-bandwidth PCI/PXI bus: a more than 80x improvement! The PCI Express/PXI Express bus enables even faster data transfers  

Streaming Technology

PXI Express, built on PCI Express technology, offers dedicated bandwidth per instrument. PCI Express, available in x1, x4, x8, and x16 links (pronounced “by 1,” “by 4,” and so on), provides 250 MB/s of throughput per lane with very low latency. The x1 and x4 options are most common for instrument-class hardware and provide 250 MB/s and 1 GB/s (four lanes at 250 MB/s) of dedicated throughput, respectively. As a result, total system throughput increases as the number of instruments in a chassis increase. The figure below highlights the bandwidth of various buses versus their latency. Latency describes the delay that occurs in any transmission of data, and it is frequently forgotten when considering system design. Many people recognize that higher bandwidth is desirable, but high latency can also detrimentally affect test times and should be a consideration when designing a system.  

 

Figure 1. Bandwidth vs. Latency of Popular Instrument Buses

The PXI platform, since it is based on the high-bandwidth PCI and PCI Express buses, enables instruments to stream data to or from sources other than onboard device memory. A PXI/PXI Express digitizer or oscilloscope is able to continuously acquire at a high sample rate because the high bandwidth of the bus allows real-time data transfer to PC memory or disk at rates up to 1 GB/s so that data can be fetched before it is overwritten in device memory. 

Consequently, the bottleneck for an acquisition or generation is no longer the bus, but actually reading or writing the data to the system storage – a hard drive or even a Redundant Array of Inexpensive Disks (RAID) array. Again, this means engineers can acquire or generate data for long periods of time at the high sampling rates they need, instead of compromising their sample rate or test time.  For example, using an NI PXIe-5122 digitizer and a 12-drive RAID array with a capacity of 4 TB, data can be captured at the maximum sampling rate of 100 MS/s on both simultaneously-sampled channels for more than 2.5 hours. 

What does all this mean? Many application challenges were previously unsolvable because they required expensive proprietary systems, but now these challenges become feasible using commercially available PXI Express. Some applications include RF/IF data streaming in signal intelligence, data recording and playback, digital video generation/streaming for image sensor and display panel testing, and other high data throughput applications.

Best Programming Practices for Stream-to-Disk Applications

It is widely recognized that the progression of applications from single-threaded to multithreaded architectures is a significant programming challenge. LabVIEW offers an ideal programming environment for multicore processors because LabVIEW applications are inherently multithreaded. As a result, LabVIEW programmers can benefit from multicore processors with little or no extra code. Multithreaded applications provide the greatest benefits to parallel test and stream-to-disk applications, and using proper programming in streaming applications allows maximum performance of PXI Express instruments. Both these benefits are attained by parallelizing the code.

The same rules of parallelism apply for creating stream-data-to-disk applications or for getting the most performance from the computer processor(s). In a streaming application, the two main bus- and processor-intensive tasks are: 1) Acquiring data from the digitizer and 2) Writing data to a file. To better utilize processor resources, users can divide processes into multiple loops. Data is shared between each loop with the use of a LabVIEW queue structure commonly referred to as a producer-consumer algorithm structure.


 
 
[+] Enlarge Image
Figure 2.  Producer/Consumer Loop Architecture with Queue Structure

In the preceding example, the top loop (the producer) acquires data from a high-speed digitizer and passes it to a queue. The bottom loop (the consumer) reads data from the queue and writes it to hard disk. At the same time, LabVIEW handles the queue as a block of allocated PC memory. This memory block is utilized as a temporary storage FIFO for data passing between two loops. In most programming languages, sharing memory between multiple processes requires significant overhead programming. However, LabVIEW handles all the memory access to ensure that read-write race conditions do not occur. The execution of a queue structure can be visualized with the following diagram.

 
Figure 3.  Data-Flow Programming Model of Queue Structure

As data is acquired from the digitizer, it is placed into memory in a first-in-first-out (FIFO) buffer using the queue structure (element 0, element 1…element n-1, element n ). As the figure illustrates, queues can pass data between multiple loops. The dequeue element accesses the same memory FIFO, removing elements in the same order (starting with element 0). LabVIEW automatically creates independent execution threads for the two While loops. Stream-to-disk applications benefit from this parallel execution because the completion of one task does not delay execution of the entire program. By contrast, using the sequential model most text-based programming languages employ causes drastically reduced performance.

Beyond overall application architecture, stream-to-disk or stream-from-disk rates can be affected by some of the following factors:

  • Running background programs such as virus scan
  • How the hard drive is formatted to group data
  • Using system restore or the recycle bin
  • Disk fragmentation
  • Location of the file on the hard drive

Ideally, dedicating a separate hard drive (or RAID array) for data helps tremendously with many of these problems.

Stream to/from Disk Benchmarks

Earlier discussion described how data streaming speed for traditional instrumentation systems is limited by the amount of data that can be pushed through the bus. The high bandwidth of PXI/PXI Express completely changes the bottleneck—the read and write speed of the storage system becomes the new limiting factor. On most PXI controllers, the hard disk is capable of speeds of around 40 MB/s. However, these disk rates can be increased significantly by using external ExpressCard or PXI Express RAID-0 hard drive configurations. RAID technology is an easy way to combine multiple hard disk drives for faster disk speeds. Current RAID-0 hard disk configurations can achieve up to 140 MB/s for ExpressCard systems and 600 MB/s for a x4 cabled PCI Express configuration. 

When calculating stream-to-disk or stream-to-memory throughput for an instrument, we can use the following equation:
Throughput = Sampling rate x Bytes/Sample x Number of Channels

For an NI PXIe-5122 high-speed digitizer with a x4 connector, sampling at the maximum sampling rate of 100 MS/s on two 14-bit channels translates to 400 MB/s of data over the bus. This number is well within the bandwidth limit of x4 PCI Express, so we can address stream-to-disk applications using a RAID-0 hard drive configuration. Using the NI PXIe-5122, we achieved the following benchmarks for stream-to-disk applications.


[+] Enlarge Image
Figure 4. Maximum Stream-to-Disk Rates for NI PXIe-5122

For the NI PXIe-5122 benchmarks shown in the preceding table and also for the following NI PXIe-6537 and NI PXIe-5442 benchmarks, a PXI Express dual-core controller was used with a x4 PXI Express RAID-0 hard drive configuration. The maximum hard drive read and write speeds were tested at over 600 MB/s, and the acquisition size for the test results shown above was 40 GB.  The NI PXIe-5122 devices used in this test came with 256 MB of onboard memory, and the PXIe-5442 devices had 512 MB of onboard memory.

For an NI PXIe-6537 high-speed digital I/O module with a x1 connector, sampling at the maximum clock rate of 50 MHz on all 32 channels translates to 200 MB/s of data over the bus. Using the NI PXIe-6537 with the RAID-0 hard drive configuration, we achieved the following benchmarks for stream-to-disk and stream-from-disk applications.


[+] Enlarge Image
Figure 5. Maximum Stream-to/from-Disk Rates for NI PXIe-6537

One number that requires an explanation is the throughput for 32 or more channels streaming-from-disk (generation).  The lower throughput is not a limitation of PXI Express bandwidth; it is actually a result of the maximum allowable packet transfer size the controller chipset allows.


[+] Enlarge Image
Figure 6. Maximum Stream-to/from-Disk Rates for NI PXIe-6537 using NI PXIe-1065 and NI PXIe-8130

As a result of of the controller chipset, generating data with the NI PXIe-6537 in Sots 7 and 8 of the NI PXIe-1065 and Slots 3 and 5 of the NI PXIe-1062Q results in lower maximum output rates. NI recommends using the NI PXIe-6537 in Slots 9 through 14 of the NI PXIe-1065 and Slot 4 of the NI PXIe-1062Q for maximum generation performance.


[+] Enlarge Image
Figure 7. Maximum Stream-to/from-Disk Rates for PXIe-6537 High-Speed Digital I/O using PXIe-1062Q and PXIe-8130

Below, we have benchmarks for the NI PXIe-5442 arbitrary waveform generator with a x4 connector. By generating at 100 MS/s, the maximum rate of the instrument, on its single 16-bit analog output channel, we require 200 MB/s per channel. When streaming from the RAID array, we can generate up to three channels at full rate. Adding another channel exceeds the bandwidth of the hard disks, but we can stream at 75 MS/s per device.  


[+] Enlarge Image
Figure 8. Maximum Stream-from-Disk Rates for NI PXIe-5442

 

Stream to/from Memory Benchmarks

As a variation of a stream-to-disk application, we also can stream data from a high-speed digitizer into the onboard memory of our PXI controller. This scenario conclusively shows that even in the previous example, the bus is not limiting the throughput; the disk write speed of the RAID-0 array is the bottleneck. In this experiment, the acquisition size is actually limited by the amount of available PC memory. As a result, the following performance for a stream-to-memory application using the NI PXIe-5122 high-speed digitizer can be achieved.  


[+] Enlarge Image
Figure 6. Maximum Stream-to-Memory Rates for NI PXIe-5122

In the test described previously, a PXI Express dual-core controller with 2 GB of onboard memory was used. The acquisition length was 100,000,000 samples per channel, which requires 800 MB of PC memory for four channels (2 bytes per sample). The NI PXIe-5122 devices used in this test came with 256 MB of onboard memory. A similar test can be run with the NI PXIe-6537 high-speed digital I/O module, as shown in the following table. 


[+] Enlarge Image
Figure 7. Maximum Stream-to/from-Memory Rates for NI PXIe-6537

For the same reason described above, in streaming-from-disk with the NI PXIe-6537, the throughput is limited by the controller chipset, not PXI Express bandwidth. Using the same setup as the digitizer test, we can stream to the NI PXIe-5442 at 200 MB/s per channel. As seen below, we can generate from memory on up to four channels at the full device sample rate.


[+] Enlarge Image
Figure 8. Maximum Stream-from-Memory Rates for NI PXIe-5442

The most important takeaway from these stream-to/from-memory benchmarks is that the system throughput increases above the write speed of the RAID array. This increase means that the throughput over the bus is increased and the bus is no longer the bottleneck. One reason why both stream-to/from-disk and stream-to/from-memory applications can achieve such high throughput in PXI Express is through the use of the high-bandwidth and low-latency data bus: PCI Express.

Conclusion

PXI and PXI Express are enabling engineers to take the capabilities of their systems to the next level. The high bandwidth of the PCI bus used in the PXI platform allows high sampling rates and long acquisitions to coexist. By integrating PCI Express technology into the platform, even higher performance is possible with data rates up to 1 GB/s. Good application design can help maximize the streaming performance of a system, and several PXI Express instruments can now stream to or from PC memory or disk at their maximum sampling rates so that entire data sets can be later processed or analyzed.


Related Links:
Modular Instruments for PCI Express and PXI Express
Data Streaming Architectures in PXI Systems
NI-SCOPE Stream to Disk Examples
NI-HSDIO Stream to Disk Examples
NI-HSDIO Stream from Disk Examples
Selecting Hard Drives for Test, Measurement, and Control Systems
            

               
3 ratings | 4.00 out of 5
Read in | Print

Reader Comments | Submit a comment »

 

Legal
This tutorial (this "tutorial") was developed by National Instruments ("NI"). Although technical support of this tutorial may be made available by National Instruments, the content in this tutorial may not be completely tested and verified, and NI does not guarantee its quality in any way or that NI will continue to support this content with each new revision of related products and drivers. THIS TUTORIAL IS PROVIDED "AS IS" WITHOUT WARRANTY OF ANY KIND AND SUBJECT TO CERTAIN RESTRICTIONS AS MORE SPECIFICALLY SET FORTH IN NI.COM'S TERMS OF USE (http://ni.com/legal/termsofuse/unitedstates/us/).