Academic Company Events NI Developer Zone Support Solutions Products & Services Contact NI MyNI

Document Type: Tutorial
NI Supported: Yes
Publish Date: Sep 6, 2006


Feedback


Yes No

Related Categories

Related Links - Developer Zone

Programming for Multibyte Character Sets in LabWindows/CVI

0 ratings | 0.00 out of 5
Print

Overview

This application note discusses how to write your LabWindows/CVI program so that it is compatible with multibyte character sets.

What Is Multibyte?

A traditional character in the C programming language consists of a single byte, which you can set to a particular value from the universal ASCII code. A multibyte character, on the other hand, is a character that can be composed of one or two bytes. A multibyte character set consists of all the multibyte characters required to represent a single language, such as Japanese. A multibyte character composed of only one byte is commonly referred to as a single-byte character. The first byte of a dual-byte character is referred to as the lead byte, while the second byte is referred to as the trail byte. The codepage is the name given to a numeric value that identifies a particular multibyte character set.

String Handling


The primary rule in manipulating strings that might contain multibyte characters is to always treat the lead byte and the trail byte of a dual-byte character as a single unit. Unfortunately, this affects every instance in your program where characters or strings are handled.

It is important to keep in mind the difference between the length of a string measured in bytes versus the length of a string measured in characters. In many instances, the number of bytes should be used, such as when allocating a buffer for the storage of a string, because every memory storage location of a character needs to allow for the possibility of having a two-byte character. In this case, you should continue to use the ANSI C Library function strlen, which returns the number of bytes in a string. In other cases, however, you must replace all ANSI string handling functions with the functions described in the Multibyte ANSI Extension Functions section or the macros described in the Multibyte Macros and Functions in toolbox.h section.

Write all of your string processing code in a multibyte-aware manner. For example, pointers should always indicate the start of a character, and indices should always reference the start of a character. Use CmbStrInc or CmbStrDec instead of the ++ and -- operators to modify the value of pointers into your strings.

Process strings sequentially, from left to right, rather than randomly. Accessing random characters in a multibyte string is computationally expensive and can be error prone.

The following is a code example that performs a text search beginning at the end of a string.

Before multibyte changes, your code might look like the following example:

char * CVIFUNC FindFileExtension(const char *pathString)
{
int index, count=0;
char *fileName;
char *terminatorPtr;

AssertMsg(pathString, "Null pathString parameter passed to FindFileExtension");

fileName = FindFileName(pathString);

if ((index = strlen (fileName)) == 0)
return fileName;

terminatorPtr = fileName + index;

for (; index > 1; index--) /* do not bother checking when index is 1 because */
{ /* if the dot is in position 0, it really is not an extension */
if (fileName[index-1] == '.')
return &fileName[index];
count++;
if (count > MAX_FILE_EXTENSION_LENGTH)
return terminatorPtr;
}

return terminatorPtr;
}

After multibyte changes, your code might look like the following example:

char * CVIFUNC FindFileExtension(const char *pathString)
{
    int index;
char *fileName;
    char *ptr, *terminatorPtr;

AssertMsg(pathString, "Null pathString parameter passed to FindFileExtension");

fileName = FindFileName(pathString);

if ((index = strlen (fileName)) == 0)
return fileName;

terminatorPtr = fileName + index;

    ptr = CmbStrPrev (fileName, terminatorPtr);

while (ptr && ((terminatorPtr-ptr) <= MAX_FILE_EXTENSION_LENGTH+1))
{
if (*ptr == '.' && ptr != fileName) /* if dot is the first char in filename, */
return ++ptr; /* it really is not an extension */
CmbStrDec (fileName, ptr);
}

return terminatorPtr;
}

Multibyte ANSI Extension Functions


The following table contains multibyte aware versions of the ANSI string handling functions.
_getmbcpGet Current Code Page
_ismbbleadGet Byte Type
_mbsbtypeGet Byte Type from Context
_mbsdecGet Previous Character
_mbsincGet Next Character
_mbslenGet String Length
_mbscmpCompare Strings
_mbsnbcmpCompare Characters
_mbsicmpCompare Strings (no case)
_mbsnbicmpCompare Characters (no case)
_mbscatConcatenate Strings
_mbsnbcatConcatenate Characters
_mbscpyCopy String
_mbsnbcpyCopy Characters
_mbschrFind First Occurrence of Character
_mbsrchrFind Last Occurrence of Character
_mbspbrkFind Character from Set
_mbscspnFind Character from Set (index)
_mbsspnFind Character Not in Set (index)
_mbsstrFind Substring
_mbstokBreak String into Tokens

Multibyte Macros and Functions in toolbox.h


The macros and functions in toolbox.h that are listed in the following table offer generally useful multibyte-aware functionality that is comparable to the string handling functions in the ANSI C library and pointer arithmetic, such as *s++. In these macro and function descriptions, p refers to a pointer to the beginning of the string, and s refers to the character pointer that you want to move.
OnMBSystemReturns TRUE if the program is running on a multibyte system.
CmbIsSingleCChecks if the given 16-bit character fits into a single byte.
CmbGetNumBytesInCharReturns the number of bytes required to store the given character.
CmbIsLeadByteChecks if the given byte is a valid lead byte. Call CmbIsLeadByte only if you know that the byte starts on a character boundary; that is, the byte cannot be a trail byte.
CmbCharCodeLeadByteReturns the lead byte of a dual-byte character.
CmbCharCodeTrailByteReturns the trail byte of a dual-byte character.
CmbStrByteTypeReturns the type of the byte at a given offset within a string, taking into account the bytes before the given byte. If you know that p[offset] is not in the middle of a dual-byte character, you can call CmbIsLeadByte instead. Possible return values are CMB_SINGLE_BYTE, CMB_LEAD_BYTE, CMB_TRAIL_BYTE, and CMB_ILLEGAL_BYTE.
CmbStrDecChanges s by moving it back by one character. Analogous to --s.
CmbStrIncChanges s by moving it forward by one character. Analogous to ++s.
CmbStrPrevReturns a pointer to the previous character in the string. Analogous to s-1.
CmbStrNextReturns a pointer to the next character in the string. Analogous to s+1.
CmbGetCRetrieves the character at the given pointer. Analogous to *s.
CmbSetCSets the character at the given pointer. CmbSetC overwrites any values at the given location and does not properly insert a dual-byte character into the middle of a string. Analogous to *s = c.
CmbGetCIncFirst retrieves the character at the given position and then advances the pointer to the next character. Analogous to *s++.
CmbGetCNdxIncFirst retrieves the character at the given position and then advances the index to the next character. Analogous to s[i++].
CmbSetCIncFirst sets the character at the given position and then advances the pointer to the next character. The newly set character is returned. Analogous to *s++ = c.
CmbSetCNdxIncFirst sets the character at the given position and then advances the index to the next character. The newly set character is returned. Analogous to s[i++] = c.
CmbIncGetCFirst advances the given pointer to the next character and then returns that character. Analogous to *++s.
CmbIncSetCFirst advances the given pointer to the next character and then sets and returns the new character. Analogous to *++s = c.
CmbFirstByteOfCharReturns the first byte of the given single or dual-byte character.
CmbNumCharsReturns the number of characters in the given string.
CmbStrEqCompares the two given strings. Returns TRUE if they are equal.
CmbStrEqICompares the two given strings. The comparison is not case-sensitive. Returns TRUE if they are equal.
CmbStrEqNCompares the first n bytes of the two given strings. Returns TRUE if they are equal.
CmbStrEqNICompares the first n bytes of the two given strings. The comparison is not case-sensitive. Returns TRUE if they are equal.
CmbStrCmpEquivalent to _mbscmp.
CmbStrNCmpEquivalent to _mbsnbcmp.
CmbStrICmpEquivalent to _mbsicmp.
CmbStrNICmpEquivalent to _mbsnbicmp.
CmbStrCatEquivalent to strcat.
CmbStrNCatEquivalent to _mbsnbcat.
CmbStrCpyEquivalent to strcpy.
CmbStrNCpyEquivalent to _mbsnbcpy.
CmbStrSpnEquivalent to _mbsspn.
CmbStrCSpnEquivalent to _mbscspn.
CmbStrChrEquivalent to _mbschr.
CmbStrRChrEquivalent to _mbsrchr.
CmbStrTokEquivalent to _mbstok.
CmbStrPBrkEquivalent to _mbspbrk.
CmbStrStrEquivalent to _mbsstr.
CmbStrUprConverts all the lowercase characters in the given string to uppercase characters. Converts only English characters.
CmbStrLwrConverts all the uppercase characters in the given string to lowercase characters. Converts only English characters.
CmbStrByteIsReturns TRUE if the character at the given offset in the given string is equal to the given single byte character. Analogous to (s[offset] == ‘*’).
CmbStrLastCharReturns a pointer to the last character of the given string.

KeyPress Event Handling


KeyPress events are sent to the callback function associated with the control that has the input focus and to the callback function associated with the control’s panel.

If your user enters a dual-byte character in a control, two EVENT_KEYPRESS events are sent to the callback functions. To process the character as a single event, ignore the first event. You can use KeyPressEventIsLeadByte to determine when you receive the first event of a dual-byte character. When you receive the second event, KeyPressEventIsTrailByte returns TRUE. You can then use GetKeyPressEventCharacter to obtain the full dual-byte character.

You can use SetKeyPressEventKey to change the character that the user entered. Do not use SetKeyPressEventKey in the first EVENT_KEYPRESS event of a dual-byte character.

The following is a code example from the user interface sample program, multikey.prj.

int CVICALLBACK KeyCallback (int panel, int control, int event,
void *callbackData, int eventData1, int eventData2)
{
int character, virtual, modifiers, enabled;
char buffer[3];

if (event == EVENT_KEYPRESS) {
/* remove previous keypress from string control */
SetCtrlAttribute (panel, PANEL_KEY, ATTR_CTRL_VAL, "");

/* ignore leading bytes - wait for complete character */
if (!KeyPressEventIsLeadByte (eventData2)) {
/* obtain keypress information */
character = GetKeyPressEventCharacter (eventData2);
virtual = GetKeyPressEventVirtualKey (eventData2);
modifiers = GetKeyPressEventModifiers (eventData2);

/* put the character into a buffer for display */
memset (buffer, 0, sizeof (buffer));
CmbSetC (buffer, character);

/* update the original keypress information */
SetCtrlAttribute (panel, PANEL_ORIGINAL, ATTR_CTRL_VAL, buffer);
SetCtrlAttribute (panel, PANEL_CHARACTER, ATTR_CTRL_VAL, character);
SetCtrlAttribute (panel, PANEL_VIRTUAL, ATTR_CTRL_VAL, virtual);
SetCtrlAttribute (panel, PANEL_MODIFIERS, ATTR_CTRL_VAL, modifiers >> 16);

/* is filtering enabled */
GetCtrlVal (panel, PANEL_FILTER, &enabled);
if (enabled) {
/* get filter information */
GetCtrlVal (panel, PANEL_CHAR_FILTER, &character);
GetCtrlVal (panel, PANEL_VIRT_FILTER, &virtual);
GetCtrlVal (panel, PANEL_MODI_FILTER, &modifiers);

/* replace key event */
SetKeyPressEventKey (eventData2, virtual, character, modifiers << 16, 0,
virtual == 0 && !CmbIsSingleC (character));
}
}
}

return 0;
}

0 ratings | 0.00 out of 5
Print

Reader Comments | Submit a comment »

 

Legal
This tutorial (this "tutorial") was developed by National Instruments ("NI"). Although technical support of this tutorial may be made available by National Instruments, the content in this tutorial may not be completely tested and verified, and NI does not guarantee its quality in any way or that NI will continue to support this content with each new revision of related products and drivers. THIS TUTORIAL IS PROVIDED "AS IS" WITHOUT WARRANTY OF ANY KIND AND SUBJECT TO CERTAIN RESTRICTIONS AS MORE SPECIFICALLY SET FORTH IN NI.COM'S TERMS OF USE (http://ni.com/legal/termsofuse/unitedstates/us/).