Academic Company Events NI Developer Zone Support Solutions Products & Services Contact NI MyNI

Document Type: Tutorial
NI Supported: Yes
Publish Date: Feb 16, 2009


Feedback


Yes No

Related Categories

Products

Related Links - Developer Zone

Related Links - Products and Services

TDMS File Format Internal Structure

13 ratings | 3.85 out of 5
Print

Overview

This document provides a detailed description of the internal structure of the TDM Streaming file format.

Logical Structure

TDMS files organize data in a three-level hierarchy of objects. The top level is comprised of a single object that holds file-specific information like author or title. Each file can contain an unlimited number of groups, and each group can contain an unlimited number of channels. In the following illustration, the file example events.tdms, contains two groups, each of which contains two channels.

Every TDMS object is uniquely identified by a path. Each path is a string including the name of the object and the name of its owner in the TDMS hierarchy, separated by /. Each name is enclosed by the ' ' symbols. Any' symbol within an object name is replaced with two ' symbols. The following table illustrates path formatting examples for each type of TDMS object:

Object
Path
File object
/
Group object Measured Data
/'Measured Data'
Group object Dr. T's Events
/'Dr. T''s Events'
Channel object Time in group Events
/'Events'/'Time'

Every TDMS object can have an unlimited number of properties. Each TDMS property consists of a combination of a name (always a string), a type identifier (defined by TDMS) and a binary value. Typical data types for properties include numeric types such as integer or floating point, time stamps, and strings. TDMS properties do not support arrays or complex data types. If a TDMS file is located within a search area of the National Instruments DataFinder, all properties automatically are available for searching.
 
Only channel objects in TDMS files can contain raw data arrays. In current TDMS versions, only one-dimensional arrays are supported.
 

Binary Layout

Meta data is descriptive data stored in objects or properties. Data arrays attached to channel objects are referred to as raw data. TDMS files hold the raw data for multiple channels in a contiguous block. To extract raw data from the block, TDMS files use a raw data index, which includes information about the data block composition, including the channel that corresponds to the data, the amount of values the block contains for that channel, the order in which the data was stored, and so on.

TDMS Segment Layout

 
Data is written to TDMS files in segments. Every time data is appended to a TDMS file, a new segment is created. Every segment starts with a TDSm tag. These tags identify TDMS files and indicate possible data corruptions within TDMS files. The rest of the data follows the TDSm tag in order, as shown in the following illustration.
 
 
 
In order, every segment contains:
 
  • A bit field that serves as a table of contents for the segment. The Table of Contents, or ToC, indicates whether a segment contains meta data, raw data, or both, and whether or not it contains a new object list. 
  • A version number, which specifies the oldest TDMS revision with which a segment complies.
  • The combined length of the rest of the segment.
  • If the segment includes any meta data or raw data index information, the combined length of this information. Otherwise, this field contains 0x0000000.
  • Object paths, properties, and raw data indexes for all objects in the segment.
  • Concatenated raw data for all channels in the segment.

The ToC bit field is a 32-bit integer, in which each bit can be set to provide a particular piece of information about the segment:

#define    kTocMetaData     (1L<<1)  // sgmt contains meta data #define

#define    kTocNewObjList   (1L<<2)  //sgmt contains new object order

#define   kTocRawData     (1L<<3)  //sgmt contains raw data

The version number is a 32-bit integer that identifies the oldest version of the file format with which a segment is compatible. A version number is written to every segment in a file. That way, you can append data to existing files without checking for the version or mutating the file to a newer version. You can perform mutations on a by-segment basis when the file is read.

The offset to the next segment is the exact distance in bytes to the ToC bit field of the next segment, recorded in an unsigned 64-bit integer format. You can use the offset when you want to skip over certain segments. For example, if you want to read only meta-data from a TDMS file and the ToC of a segment indicates that the segment contains only raw-data, you can use the offset to skip to the next segment. The offset is located after the version number to facilitate changes in later versions, such as when address spaces convert to 128-bit.

To speed up reading raw data without reading meta data from the same segment, every segment contains an unsigned 64-bit integer that indicates the length of the meta data section in bytes.

Meta Data

As previously explained, TDMS meta data consists of a three-level hierarchy of data objects including a file, groups, and channels. Each of these object types can include any number of user-defined properties. The meta data section for a single data object has the following binary layout on disc:

  • Number of new objects in this segment.
  • Binary representation of these objects.

The binary layout of any TDMS object on disk consists of components in the following order. Depending on the information stored in a particular segment, the object might contain only a subset of these components.

  1. Object path (string), preceded by length integer (uInt32)
  2. Raw data index
    • If this object does not have any raw data assigned to it in this segment, this value will be 0x00000000.
    • If the raw data index of this object in this segment exactly matches the index the same object had in the previous segment, this value will be 0xFFFFFFFF.
    • If the object has a new raw data index, that index will be stored here:
      • Length of index information (uInt32)
      • Data type (tdsDataType, 32bit)
      • Array dimension (uInt32)  (right now 1 is the only valid value)
      • Number of values (uInt64)
      • Total size in bytes (uInt64) (only stored for variable length data types, e.g. strings)
  3. Number of properties (32bit unsigned integer)
  4. Properties. For each property, the following information is stored:
    • Name (string), preceded by length integer (uInt32) 
    • Data type (tdsDataType)
    • Value (binary, preceded by length integer (uInt32) if variable length)

Raw Data

In the final step of the writing process for each segment, TDMS files store actual raw data. The data arrays for all channels are concatenated in the exact order in which the channels appear in the meta information part of the segment. Numeric data is written in little-endian format. A channel can not change data type once it is written.

String type channels preprocess data so channels read faster. All strings are concatenated to a contiguous piece of memory. The offset of the first character in this contiguous piece of memory is stored to an array of uInt32 numbers for each string. The TDMS segment contains a list of offset numbers followed by the concatenated string data, which allows you to read a string from anywhere in a TDMS file by repositioning the file pointer a maximum of three times.

All strings within TDMS files are encoded in unicode utf8 format. Except in raw data channels, a uInt32 number indicating the number of characters precedes each string. Strings can be null-terminated, but they do not have to be. Numbers are stored in little-endian format.

Data Type Values

The following enum type is used to describe the data type of a property or channel in a TDMS file. For properties, the data type value will be stored in between the name and the binary value. For channels, the data type will be part of the raw data index.

typedef enum {
    tdsTypeVoid,
    tdsTypeI8,    
    tdsTypeI16,    
    tdsTypeI32,    
    tdsTypeI64,

    tdsTypeU8,    
    tdsTypeU16,    
    tdsTypeU32,    
    tdsTypeU64,

    tdsTypeSingleFloat,    
    tdsTypeDoubleFloat,    
    tdsTypeExtendedFloat,    
    tdsTypeSingleFloatWithUnit=0x19,
   
    tdsTypeDoubleFloatWithUnit,    
    tdsTypeExtendedFloatWithUnit,

    tdsTypeString=0x20,   
    tdsTypeBoolean=0x21,
   
    tdsTypeTimeStamp=0x44

} tdsDataType;

The following structure represents the timestamp data type used in TDMS files:

typedef struct {
   unsigned long long fraction; 
   long long sec;      
} tdsTime;

Use the VI-based API  (linked below) for writing TDMS files for further reference on how exactly TDMS files are composed.

Optimization

Applying the format definition as described in the previous paragraphs creates valid TDMS files. However, TDMS allows for a variety of optimizations that are commonly used by NI software like LabVIEW, LabWindows/CVI or DIAdem. Applications that are trying to read data written by NI software need to support the optimization mechanisms described in this paragraph.

Incremental Meta Information Example

Meta information such as object paths, properties, and raw indexes, is added to a segment only if it has changed. Incremental meta information is best explained by example:

In the first segment, channels 1 and 2 are written. Each has three values and several descriptive properties. The meta data section of the first segment contains paths, properties, and raw data indexes for channel 1 and channel 2. All flags of the ToC bit field are set.

In the second segment, none of the properties have changed, channel 1 and channel 2 still have three values each, and no additional channels are written to the segment. The segment does not contain any meta data. The meta data from the previous segment is still assumed valid. Only the raw data bit is set in the ToC bit field.

The third segment adds another three values to each channel. The channel 1 property status was set to valid in the first segment, but now needs to be set to error. The meta data section of the third segment now contains the object path for channel, name, type, and value for this property. In future file reads, the error value will override the previously written valid value. However, the previous valid value remains in the file, unless it is defragmented.

The fourth segment adds an additional channel, voltage, which contains five values. Since all other meta data from the previous segment is still valid, the meta data section of the fourth segment includes the object path, the properties, and the index information for channel voltage only. The raw data section contains three values for channel 1, three values for channel 2, and five values for channel voltage.

In the fifth segment, channel 2 now has 27 values. All other channels remain unchanged. The meta data section now contains the object path for channel 2, the new raw data index for channel 2, and no properties for channel 2.

In the sixth segment, you stop writing to channel 2. You only continue writing to channel 1 and channel voltage. This constitutes a change in the channel order, which requires you to write a new list of channel paths. You must set the ToC bit kTocNewObjList. The meta data section of the new segment must contain a complete list of all object paths, but no properties and no raw data indexes, unless they also change.

Index Files

All data written to a TDMS file is stored to a file with the extension *.tdms. TDMS files can be accompanied by a *.tdms_index optional index file. The index file is used to speed up reading from the *.tdms file. If a National Instruments application opens a TDMS file without an index file, the index file is automatically created. If a National Instruments application such as LabVIEW or LabVIEW Windows/CVI writes a TDMS file, the index file and the main file are created at the same time.

The index file is an exact copy of the *.tdms file, except in that it does not contain any raw data and every segment starts with a TDSh tag instead of a TDSm tag. The index file contains all information to precisely locate any value of any channel within the *.tdms file.

Conclusion

In brief, the TDMS File Format is designed to write and read measured data at very high speed, while maintaining a hierarchical system of descriptive information. While the binary layout by itself is very simple, the optimizations enabled by writing meta data incrementally can lead to very sophisticated file configurations.

Related Links

Developer Zone: VI-based API for Writing TDMS Files

Developer Zone: Introduction to the LabWindows/CVI TDM Streaming Library (An introductory overview of features and use cases for the TDMS file format)

 

13 ratings | 3.85 out of 5
Print

Reader Comments | Submit a comment »

We have the same problem with the tdms documentation. Some of our customers need to store and read their data for 15y+. They require a propper tdms format documentation to be able to read their data with the programming language they prefer in the future. Thanks in advance for more detailed, updated information about the tdms format. MetaDAQ
- Andreas Hergesell, MetaDAQ, Ingenieurbüro Hergesell. andreas.hergesell@metadaq.de - Sep 3, 2008

Documentation is Poor
Actually, the documentation is just plain wrong. There is the "TDSm" or "TDSh" tags at the head of each segment, and the next segment offset goes from the beginning of that field to the next tag. Please update this page so it contains good information, it is very difficult to write code using a document that assumes usage of a non-portable, incompatible API, not to mention wrong information on the TDMS file format itself. There is also insufficient information. For example, what is the current version of TDMS? I'm reading the version number, but it means nothing to me. How do I decrypt the value I get? And the ToC bitmask. The graphic indicates 4 bits, but you state that it's a 32 bit value. So, does this mean that it's divided into 8 bit blocks, where the value for each block if 0xFF if that piece of data exists and 0x00 if it doesn't? Thanks in advance if this information is updated.
- Sam, Army Research Lab. xuancongwen@gmail.com - Jan 8, 2008

 

Legal
This tutorial (this "tutorial") was developed by National Instruments ("NI"). Although technical support of this tutorial may be made available by National Instruments, the content in this tutorial may not be completely tested and verified, and NI does not guarantee its quality in any way or that NI will continue to support this content with each new revision of related products and drivers. THIS TUTORIAL IS PROVIDED "AS IS" WITHOUT WARRANTY OF ANY KIND AND SUBJECT TO CERTAIN RESTRICTIONS AS MORE SPECIFICALLY SET FORTH IN NI.COM'S TERMS OF USE (http://ni.com/legal/termsofuse/unitedstates/us/).