Hyperthreading is a feature of some versions of the Intel Pentium 4 and later. A hyperthreaded computer has a single processor but acts as a computer with a multiprocessor. When you launch the Windows Task Manager on a hyperthreaded computer and click the Performance tab, the Windows Task Manager displays the usage history for two CPUs.
A hyperthreaded processor acts like multiple processors embedded on the same microchip. Some of the resources on the chip are duplicated, such as the register set. Other resources are shared, such as the execution units and the cache. Some resources, such as the buffers that store micro operations, are partitioned, with each logical processor receiving a portion.
Optimizing an application to take advantage of hyperthreading is similar to optimizing an application for a multiprocessor system, also known as a multi-core, multiple CPU, or SMP system, but there are some differences. For example, a hyperthreaded computer shares the execution units, and a dual-processor computer contains two complete sets of execution units. Therefore, any application that is limited by floating-point execution units performs better on the multiprocessor computer because you do not have to share the execution units. The same principle applies with cache contention. If two threads try to access the cache, the performance is better on a multiprocessor computer, where each processor has its own full-size cache.
The LabVIEW execution system is already built for multiprocessing. In text-based programming languages, to make an application multithreaded, you have to create multiple threads and write code to communicate among those threads. LabVIEW, however, can recognize opportunities for multithreading in VIs, and the execution system handles multithreading communications for you.
The following example takes advantage of the LabVIEW multithreaded execution system.
In this VI, LabVIEW recognizes that it can execute the two loops independently, and in a multiprocessing or hyperthreaded environment, often simultaneously.
The following example calculates prime numbers greater than two.
The block diagram evaluates all the odd numbers between three and Num Terms and determines if they are prime. The inner For Loop returns TRUE if any number divides the term with a zero remainder.
The inner For Loop is computationally intensive because it does not include any I/O or wait functions. The architecture of this VI prevents LabVIEW from taking advantage of any parallelism. There is a mandatory order for every operation in the loop. This order is enforced by dataflow, and there is no other execution order possible because every operation must wait for its inputs.
You can introduce parallelism into this VI. Parallelism requires that no single loop iteration depends on any other loop iteration. Once you meet this condition, you can distribute loop iterations between two loops. However, one LabVIEW constraint is that no iteration of a loop can begin before the previous iteration finishes. You can split the process into two loops after you determine that the constraint is not necessary.
In the following illustration, the primes parallelism example splits the process into two loops. The top loop evaluates half of the odd numbers, and the bottom loop evaluates the other half. On a multiprocessor computer, the two-loop version is more efficient because LabVIEW can simultaneously execute code from both loops. Notice that the output of this version of the VI has two arrays instead of one as in the previous example. You can write a subVI to combine these arrays and because the calculations consume most of the execution time, the additional VI at the end of the process becomes negligible.
Notice that these two example VIs do not include code for explicit thread management. The LabVIEW dataflow programming paradigm allows the LabVIEW execution system to run the two loops in different threads. In many text-based programming languages, you must explicitly create and handle threads.
Optimizing the performance of an application for a hyperthreaded computer is nearly identical to doing so for a multiprocessor computer. However, differences exist because a hyperthreaded computer shares some resources between the two logical processors, such as the cache and execution units. If you think a shared resource on a hyperthreaded computer limits an application, test the application with an advanced sampling performance analyzer, such as the Intel VTune.
Refer to the NI Developer Zone at ni.com/zone to view the primes programming example written in C++. The C++ code example demonstrates the kind of effort required to write thread-handling code and illustrates the special coding necessary to protect data that threads share.