Programming for Multibyte Character Sets in LabWindows/CVI
Overview
This application note discusses how to write your LabWindows/CVI program so that it is compatible with multibyte character sets.
Table of Contents
What Is Multibyte?
A traditional character in the C programming language consists of a single byte, which you can set to a particular value from the universal ASCII code. A multibyte character, on the other hand, is a character that can be composed of one or two bytes. A multibyte character set consists of all the multibyte characters required to represent a single language, such as Japanese. A multibyte character composed of only one byte is commonly referred to as a single-byte character. The first byte of a dual-byte character is referred to as the lead byte, while the second byte is referred to as the trail byte. The codepage is the name given to a numeric value that identifies a particular multibyte character set.
String Handling
The primary rule in manipulating strings that might contain multibyte characters is to always treat the lead byte and the trail byte of a dual-byte character as a single unit. Unfortunately, this affects every instance in your program where characters or strings are handled.
It is important to keep in mind the difference between the length of a string measured in bytes versus the length of a string measured in characters. In many instances, the number of bytes should be used, such as when allocating a buffer for the storage of a string, because every memory storage location of a character needs to allow for the possibility of having a two-byte character. In this case, you should continue to use the ANSI C Library function strlen, which returns the number of bytes in a string. In other cases, however, you must replace all ANSI string handling functions with the functions described in the Multibyte ANSI Extension Functions section or the macros described in the Multibyte Macros and Functions in toolbox.h section.
Write all of your string processing code in a multibyte-aware manner. For example, pointers should always indicate the start of a character, and indices should always reference the start of a character. Use CmbStrInc or CmbStrDec instead of the ++ and -- operators to modify the value of pointers into your strings.
Process strings sequentially, from left to right, rather than randomly. Accessing random characters in a multibyte string is computationally expensive and can be error prone.
The following is a code example that performs a text search beginning at the end of a string.
Before multibyte changes, your code might look like the following example:
char * CVIFUNC FindFileExtension(const char *pathString)
{
int index, count=0;
char *fileName;
char *terminatorPtr;
AssertMsg(pathString, "Null pathString parameter passed to FindFileExtension");
fileName = FindFileName(pathString);
if ((index = strlen (fileName)) == 0)
return fileName;
terminatorPtr = fileName + index;
for (; index > 1; index--) /* do not bother checking when index is 1 because */
{ /* if the dot is in position 0, it really is not an extension */
if (fileName[index-1] == '.')
return &fileName[index];
count++;
if (count > MAX_FILE_EXTENSION_LENGTH)
return terminatorPtr;
}
return terminatorPtr;
}
After multibyte changes, your code might look like the following example:
char * CVIFUNC FindFileExtension(const char *pathString)
{
int index;
char *fileName;
char *ptr, *terminatorPtr;
AssertMsg(pathString, "Null pathString parameter passed to FindFileExtension");
fileName = FindFileName(pathString);
if ((index = strlen (fileName)) == 0)
return fileName;
terminatorPtr = fileName + index;
ptr = CmbStrPrev (fileName, terminatorPtr);
while (ptr && ((terminatorPtr-ptr) <= MAX_FILE_EXTENSION_LENGTH+1))
{
if (*ptr == '.' && ptr != fileName) /* if dot is the first char in filename, */
return ++ptr; /* it really is not an extension */
CmbStrDec (fileName, ptr);
}
return terminatorPtr;
}
Multibyte ANSI Extension Functions
The following table contains multibyte aware versions of the ANSI string handling functions.
| _getmbcp | Get Current Code Page |
| _ismbblead | Get Byte Type |
| _mbsbtype | Get Byte Type from Context |
| _mbsdec | Get Previous Character |
| _mbsinc | Get Next Character |
| _mbslen | Get String Length |
| _mbscmp | Compare Strings |
| _mbsnbcmp | Compare Characters |
| _mbsicmp | Compare Strings (no case) |
| _mbsnbicmp | Compare Characters (no case) |
| _mbscat | Concatenate Strings |
| _mbsnbcat | Concatenate Characters |
| _mbscpy | Copy String |
| _mbsnbcpy | Copy Characters |
| _mbschr | Find First Occurrence of Character |
| _mbsrchr | Find Last Occurrence of Character |
| _mbspbrk | Find Character from Set |
| _mbscspn | Find Character from Set (index) |
| _mbsspn | Find Character Not in Set (index) |
| _mbsstr | Find Substring |
| _mbstok | Break String into Tokens |
Multibyte Macros and Functions in toolbox.h
The macros and functions in toolbox.h that are listed in the following table offer generally useful multibyte-aware functionality that is comparable to the string handling functions in the ANSI C library and pointer arithmetic, such as *s++. In these macro and function descriptions, p refers to a pointer to the beginning of the string, and s refers to the character pointer that you want to move.
| OnMBSystem | Returns TRUE if the program is running on a multibyte system. |
| CmbIsSingleC | Checks if the given 16-bit character fits into a single byte. |
| CmbGetNumBytesInChar | Returns the number of bytes required to store the given character. |
| CmbIsLeadByte | Checks if the given byte is a valid lead byte. Call CmbIsLeadByte only if you know that the byte starts on a character boundary; that is, the byte cannot be a trail byte. |
| CmbCharCodeLeadByte | Returns the lead byte of a dual-byte character. |
| CmbCharCodeTrailByte | Returns the trail byte of a dual-byte character. |
| CmbStrByteType | Returns the type of the byte at a given offset within a string, taking into account the bytes before the given byte. If you know that p[offset] is not in the middle of a dual-byte character, you can call CmbIsLeadByte instead. Possible return values are CMB_SINGLE_BYTE, CMB_LEAD_BYTE, CMB_TRAIL_BYTE, and CMB_ILLEGAL_BYTE. |
| CmbStrDec | Changes s by moving it back by one character. Analogous to --s. |
| CmbStrInc | Changes s by moving it forward by one character. Analogous to ++s. |
| CmbStrPrev | Returns a pointer to the previous character in the string. Analogous to s-1. |
| CmbStrNext | Returns a pointer to the next character in the string. Analogous to s+1. |
| CmbGetC | Retrieves the character at the given pointer. Analogous to *s. |
| CmbSetC | Sets the character at the given pointer. CmbSetC overwrites any values at the given location and does not properly insert a dual-byte character into the middle of a string. Analogous to *s = c. |
| CmbGetCInc | First retrieves the character at the given position and then advances the pointer to the next character. Analogous to *s++. |
| CmbGetCNdxInc | First retrieves the character at the given position and then advances the index to the next character. Analogous to s[i++]. |
| CmbSetCInc | First sets the character at the given position and then advances the pointer to the next character. The newly set character is returned. Analogous to *s++ = c. |
| CmbSetCNdxInc | First sets the character at the given position and then advances the index to the next character. The newly set character is returned. Analogous to s[i++] = c. |
| CmbIncGetC | First advances the given pointer to the next character and then returns that character. Analogous to *++s. |
| CmbIncSetC | First advances the given pointer to the next character and then sets and returns the new character. Analogous to *++s = c. |
| CmbFirstByteOfChar | Returns the first byte of the given single or dual-byte character. |
| CmbNumChars | Returns the number of characters in the given string. |
| CmbStrEq | Compares the two given strings. Returns TRUE if they are equal. |
| CmbStrEqI | Compares the two given strings. The comparison is not case-sensitive. Returns TRUE if they are equal. |
| CmbStrEqN | Compares the first n bytes of the two given strings. Returns TRUE if they are equal. |
| CmbStrEqNI | Compares the first n bytes of the two given strings. The comparison is not case-sensitive. Returns TRUE if they are equal. |
| CmbStrCmp | Equivalent to _mbscmp. |
| CmbStrNCmp | Equivalent to _mbsnbcmp. |
| CmbStrICmp | Equivalent to _mbsicmp. |
| CmbStrNICmp | Equivalent to _mbsnbicmp. |
| CmbStrCat | Equivalent to strcat. |
| CmbStrNCat | Equivalent to _mbsnbcat. |
| CmbStrCpy | Equivalent to strcpy. |
| CmbStrNCpy | Equivalent to _mbsnbcpy. |
| CmbStrSpn | Equivalent to _mbsspn. |
| CmbStrCSpn | Equivalent to _mbscspn. |
| CmbStrChr | Equivalent to _mbschr. |
| CmbStrRChr | Equivalent to _mbsrchr. |
| CmbStrTok | Equivalent to _mbstok. |
| CmbStrPBrk | Equivalent to _mbspbrk. |
| CmbStrStr | Equivalent to _mbsstr. |
| CmbStrUpr | Converts all the lowercase characters in the given string to uppercase characters. Converts only English characters. |
| CmbStrLwr | Converts all the uppercase characters in the given string to lowercase characters. Converts only English characters. |
| CmbStrByteIs | Returns TRUE if the character at the given offset in the given string is equal to the given single byte character. Analogous to (s[offset] == ‘*’). |
| CmbStrLastChar | Returns a pointer to the last character of the given string. |
KeyPress Event Handling
KeyPress events are sent to the callback function associated with the control that has the input focus and to the callback function associated with the control’s panel.
If your user enters a dual-byte character in a control, two EVENT_KEYPRESS events are sent to the callback functions. To process the character as a single event, ignore the first event. You can use KeyPressEventIsLeadByte to determine when you receive the first event of a dual-byte character. When you receive the second event, KeyPressEventIsTrailByte returns TRUE. You can then use GetKeyPressEventCharacter to obtain the full dual-byte character.
You can use SetKeyPressEventKey to change the character that the user entered. Do not use SetKeyPressEventKey in the first EVENT_KEYPRESS event of a dual-byte character.
The following is a code example from the user interface sample program, multikey.prj.
int CVICALLBACK KeyCallback (int panel, int control, int event,
void *callbackData, int eventData1, int eventData2)
{
int character, virtual, modifiers, enabled;
char buffer[3];
if (event == EVENT_KEYPRESS) {
/* remove previous keypress from string control */
SetCtrlAttribute (panel, PANEL_KEY, ATTR_CTRL_VAL, "");
/* ignore leading bytes - wait for complete character */
if (!KeyPressEventIsLeadByte (eventData2)) {
/* obtain keypress information */
character = GetKeyPressEventCharacter (eventData2);
virtual = GetKeyPressEventVirtualKey (eventData2);
modifiers = GetKeyPressEventModifiers (eventData2);
/* put the character into a buffer for display */
memset (buffer, 0, sizeof (buffer));
CmbSetC (buffer, character);
/* update the original keypress information */
SetCtrlAttribute (panel, PANEL_ORIGINAL, ATTR_CTRL_VAL, buffer);
SetCtrlAttribute (panel, PANEL_CHARACTER, ATTR_CTRL_VAL, character);
SetCtrlAttribute (panel, PANEL_VIRTUAL, ATTR_CTRL_VAL, virtual);
SetCtrlAttribute (panel, PANEL_MODIFIERS, ATTR_CTRL_VAL, modifiers >> 16);
/* is filtering enabled */
GetCtrlVal (panel, PANEL_FILTER, &enabled);
if (enabled) {
/* get filter information */
GetCtrlVal (panel, PANEL_CHAR_FILTER, &character);
GetCtrlVal (panel, PANEL_VIRT_FILTER, &virtual);
GetCtrlVal (panel, PANEL_MODI_FILTER, &modifiers);
/* replace key event */
SetKeyPressEventKey (eventData2, virtual, character, modifiers << 16, 0,
virtual == 0 && !CmbIsSingleC (character));
}
}
}
return 0;
}
Reader Comments | Submit a comment »
Legal
This tutorial (this "tutorial") was developed by National Instruments ("NI"). Although technical support of this tutorial may be made available by National Instruments, the content in this tutorial may not be completely tested and verified, and NI does not guarantee its quality in any way or that NI will continue to support this content with each new revision of related products and drivers. THIS TUTORIAL IS PROVIDED "AS IS" WITHOUT WARRANTY OF ANY KIND AND SUBJECT TO CERTAIN RESTRICTIONS AS MORE SPECIFICALLY SET FORTH IN NI.COM'S TERMS OF USE (http://ni.com/legal/termsofuse/unitedstates/us/).
