Table of Contents
A product’s life cycle comprises three phases: early life, useful life, and wear out. Failures can happen in each phase, but the mechanisms of failure differ across the phases. The majority of those that occur during a typical system deployment happen in the useful life phase. In this phase, the concepts of reliability, availability, serviceability, and manageability (RASM) engineering are applied.
Figure 1. The “bathtub curve” depicts the failure rate over time of a system or a product.
Availability is the measure of how often a system can perform its intended function, even in the midst of failures. For test, measurement, and control applications with demanding system uptime requirements, practices like sparing strategies and preventive maintenance schedules have traditionally improved the availability of critical system components. For the PXI platform, some of the most critical components are housed in the PXI chassis: the power supply, fan, and backplane.
Defining Availability for Your System
The Availability (Av) rating of a system represents the percentage of time that the system can perform its intended function (uptime) during the timeframe that the system is expected to perform this function, with the goal being 100 percent availability. Availability is commonly stated as a percentage or in terms of the number of “nines” within the percentage. For example, Av = 99.9523%, is stated as “three nines” of availability.
The inherent availability is defined by the following equation:
Av = MTBF/(MTBF + MTTR)
Where MTBF = Mean Time Between Failure and MTTR = Mean Time to Repair
More practically, availability is defined as the following:
Avp = Uptime/(Uptime + Downtime)
Calculating the Availability of Your PXI Chassis
To calculate the availability of a PXI chassis, you need to know its functional mission. The mission, for purposes of the availability calculation, is composed of operational run-time expectations, a critical component sparing strategy, and MTTR variance of critical system components.
For the NI PXIe-1066DC chassis, these mission assumptions are defined as follows:
- 24/7 run-time operation (this is a worst-case run-time operation value; deployments with a less demanding run-time operation schedule can expect improved availability)
- A sparing strategy is in place with adequate spare power supplies, fans, and a chassis (in case the backplane and electronics fail) on-site
- MTTR is associated with the unexpected downtime; the planned downtime (scheduled maintenance) is not included
- MTTR = 0 for power supplies and fans because they are hot swappable, and switchover times are instantaneous with hot redundancy
- MTTR = 40 minutes if the backplane and electronics fail (this value may vary per installation as a function of the notification system, availability and skill of repair personnel, chassis accessibility, and location of spare chassis)
Figure 2. The NI PXIe-1066DC chassis features redundant, hot-swappable fans and power supplies to maximize its availability.
You also need to estimate the probability of chassis failure in one of two ways: (1) the Bellcore model or (2) the empirically gathered field failure data. The Bellcore model estimates the MTBF of the NI PXIe-1066DC chassis to be 305,782 hours at 25 °C. This is not the MTBF of any component (power supply, fan) in the chassis failing but rather the MTBF of a chassis outage because enough critical components failed.
Now calculate the availability of the NI PXIe-1066DC chassis:
MTBF = 305,782
MTTR = 40 minutes
Therefore, Av = 305,782/(305,782 + 0.67) = .999998 = 99.9998% or 5 nines (the accepted scale goes up to 6)
Planning for Failure
As you can see from the availability calculation, the high availability features, including redundant, hot-swappable fans and power supplies, improve the availability of a PXI chassis (in this case, the NI PXIe-1066DC) to beyond five nines. To calculate the availability of the entire system, you must take into account failures because of software and modules. Availability considers sparing strategy and service capabilities (as a part of MTTR), so redundancy does not eliminate the need to plan for these other elements of system uptime. To aid in this planning, the NI PXIe-1066DC incorporates an Ethernet port to remotely monitor the health and status of critical system components, including chassis fans, power supplies, and overall temperature. A preventive failure plan comprising critical component redundancy, sparing, and efficient maintenance scheduling reduces the frequency of unexpected system failures and maximizes system uptime.
—David Nosbusch email@example.com
David Nosbusch is a product marketing manager for PXI chassis and PXI timing and synchronization products at National Instruments. He earned a bachelor’s degree in electrical engineering from the University of Wisconsin–Madison.
This article first appeared in the Q2 2012 issue of Instrumentation Newsletter.
Reader Comments | Submit a comment »
The Av of the pxi chassis is imho only 99.8%
- Jun 19, 2012
This material is protected under the copyright laws of the U.S. and other countries and any uses not in conformity with the copyright laws are prohibited, including but not limited to reproduction, DOWNLOADING, duplication, adaptation and transmission or broadcast by any media, devices or processes.