The Wide-Character Functions
Overview
In 1995, a number of wide-character functions were added to the C89 standard, and were later incorporated into C99 and C++. The wide-character functions operate on characters of type wchar,,t, which are 16 bits. For the most part, these functions parallel their char equivalents. For example, the function iswspace( ) is the wide-character version of ispasse. In general, the wide-character functions use the same names as their char equivalents, except that a “w” is added.
For C, the wide-character functions use the headers <wchar.h> and <wctype.h>. For C++, the headers are <cwchar> and <cwctype>. This chapter will use the C header names, but references to <wchar.h> and <wctype.h> also apply to <cwchar> and <cwctype>, respectively.
The header <wctype.h> defines the types wint_t, wctrans_t, and wctype_t. Many of the wide-character functions receive a wide character as a parameter. The type of this parameter is wint_t. It is capable of holding a wide character. The use of the wint_t type in the wide-character functions parallels the use of int in the char-based functions. The types wctrans_t and wctype_t are the types of objects used to represent a character mapping (i.e., character translation) and the classification of a character, respectively. Also defined is the wide-character EOF mark, which is defined as WEOF.
In addition to defining win_t, the header <wchar.h> defines the types wchar_t, size_t, and mbstate_t. The wchar_t type creates a wide-character object, and size_t is the type of value returned by sizeof. The mbstate_t type describes an object that holds the state of a multibyte to wide-character conversion. The <wchar.h> header also defines the macros NULL, WEOF, WCHAR_MAX, and WCHAR_MIN. The last two define the maximum and minimum value that can be held in an object of type wchar_t.
Since most of the wide-character functions simply parallel their char equivalents, only a brief description of these functions is provided.
The Wide-Character Classification Functions
The header <wctype.h> provides the prototypes for the wide-character functions that support character classification. These functions categorize wide characters as to their type or convert the case of a character. The following table lists these functions along with their char equivalents, which are described in Chapter 7.
Function | char Equivalent |
---|---|
int iswalnum(wint_t ch) | isalnum( ) |
int iswalpha(wint_t ch) | isalpha( ) |
int iswblank(wint_t ch) | isblank( ) (Added by C99) |
int iswcntrl(wint_t ch) | iscntrl( ) |
int iswdigit(wint_t ch) | isdigit( ) |
int iswgraph(wint_t ch) | isgraph( ) |
int iswlower(wint_t ch) | islower( ) |
int iswprint(wint_t ch) | isprint( ) |
int iswpunct(wint_t c) | ispunct( ) |
int iswspace(wint_t ch) | isspace( ) |
int iswupper(wint_t ch) | isupper( ) |
int iswxdigit(wint_t ch) | isxdigit( ) |
wint_t towlower(wint_t ch) | tolower( ) |
wint_t towupper(wint_t ch) | toupper( ) |
In addition to the functions shown above, <wctype.h> defines the following, which provide an open-ended means of classifying characters:
wctype_t wctype(const char *attr);int iswctype(wint_t ch, wctype_t attr_ob);
The function wctype( ) returns a value that can be passed as the attr_ob parameter to iswctype( ). The string pointed to by attr specifies a property that a character must have. This value can then be used to determine if ch is a character that has that property. If it has, iswctype( ) returns nonzero. Otherwise, it returns zero. The following property strings are defined for all execution environments:
alnum | alpha | cntrl | digit |
graph | lower | print | punct |
space | upper | xdigit |
For C99, the string “blank” is also defined.
The functions wctrans( ) and towctrans( ) are also defined in <wctype.h>. They are shown here:
wctrans_t wctrans(const char *mapping);wint_t towctrans(wint_t ch, wctrans_t mapping_ob);
The function wctrans( ) returns a value that can be passed as the mapping_ob parameter to towctrans( ). The string pointed to by mapping specifies a mapping of one character to another. This value can then be used by towctrans( ) to map ch. The mapped value is returned. The following mapping strings are supported in all execution environments:
tolower | toupper |
The Wide-Character I/O Functions
Several of the I/O functions described in Chapter 6 have wide-character implementations. These functions are shown in the following table. The wide-character I/O functions use the header <wchar.h>. Notice that swprintf( ) and vswprintf( ) require an additional parameter not needed by their char equivalents.
Function | char Equivalent |
---|---|
win_t fgetwc(FILE *stream) | fgetc( ) |
wchar_t *fgetws(wchar_t *str, int num, FILE *stream) | |
wint_t fputwc(wchar_t ch, FILE *stream) | fputc( ) |
int fputws(const wchar_t *str, FILE *stream) | |
int fwprintf(FILE *stream, const wchar_t *fmt, ...) | |
int fwscanf(FILE *stream, const wchar_t *fmt, ...) | |
wint_t getwc(FILE *stream) | getc( ) |
wint_t getwchar(void) | getchar( ) |
wint_t putwc(wchar_t ch, FILE *stream) | putc( ) |
wint_t putwchar(wchar_t ch) | putchar( ) |
int swprintf(wchar_t *str, size_t num, const wchar_t *fmt, ...) | sprintf( ) Note the addition of the parameter num, which limits the number of characters written to str. In C99, str and fmt are qualified by restrict. |
int swscanf(const wchar_t *str, const wchar_t *fmt, ...) | |
wint_t ungetwc(wint_t ch, FILE *stream) | ungetc( ) |
int vfwprintf(FILE *stream, const wchar_t *fmt, va_list arg) | |
int vfwscanf(FILE * restrict stream, const wchar_t * restrict fmt, va_list arg); | vfscanf( ) (Added by C99.) |
int vswprintf(wchar_t *str, size_t num, const wchar_t *fmt, va_list arg) | vsprintf( ) Note the addition of the parameter num, which limits the number of characters written to str. In C99, str and fmt are qualified by restrict. |
int vswscanf(const wchar_t * restrict str, const wchar_t * restrict fmt, va_list arg); | vsscanf( ) (Added by C99.) |
int vwprintf(const wchar_t *fmt, va_list arg) | |
int vwscanf(const wchar_t * restrict fmt, va_list arg); | vscanf( ) (Added by C99.) |
int wprintf(const wchar_t *fmt, ...) | |
int wscanf(const wchar_t *fmt, ...) |
In addition to those shown in the table, the following wide-character I/O function has been added:
int fwide(FILE *stream, int how);
If how is positive, fwide( ) makes stream a wide-character stream. If how is negative, fwide( ) makes stream into a char stream. If how is zero, stream is unaffected. If the stream has already been oriented to either wide or normal characters, it will not be changed. The function returns positive if the stream uses wide characters, negative if the stream uses chars, and zero if the stream has not yet been oriented. A stream’s orientation is also determined by its first use.
The Wide-Character String Functions
There are wide-character versions of the string manipulation functions described in Chapter 7. These are shown in the following table. They use the header <wchar.h>. Note that wcstok( ) requires an additional parameter not used by its char equivalent.
Function | char Equivalent |
---|---|
wchar_t *wcscat(wchar_t *str1, const wchar_t *str2) | |
wchar_t *wcschr(const wchar_t *str, wchar_t ch) | strchr( ) |
int wcscmp(const wchar_t *str1, const wchar_t *str2) | strcmp( ) |
int wcscoll(const wchar_t *str1, const wchar_t *str2) | strcoll( ) |
size_t wcscspn(const wchar_t *str1, const wchar_t *str2) | strcspn( ) |
wchar_t *wcscpy(wchar_t *str1, const wchar_t *str2) | strcpy( ) In C99, str1 and str2 are qualified by restrict. |
size_t wcslen(const wchar_t *str) | strlen( ) |
wchar_t *wcsncpy(wchar_t *str1, const wchar_t str2, size_t num) | strncpy( ) In C99, str1 and str2 are qualified by restrict. |
wchar_t *wcsncat(wchar_t *str1, const wchar_t str2, size_t num) | strncat( ) In C99, str1 and str2 are qualified by restrict. |
int wcsncmp(const wchar_t *str1, const wchar_t *str2, size_t num) | strncmp( ) |
wchar_t *wcspbrk(const wchar_t *str1, const wchar_t *str2) | strpbrk( ) |
wchar_t *wcsrchr(const wchar_t *str, wchar_t ch) | strrchr( ) |
size_t wcsspn(const wchar_t *str1, const wchar_t *str2) | strspn( ) |
wchar_t *wcstok(wchar_t *str1, const wchar_t *str2, wchar_t **endptr) | strtok( ) Here, endptr is a pointer that holds information necessary to continue the tokenizing process. In C99, str1, str2, and endptr are qualified by restrict. |
wchar_t *wcsstr(const wchar_t *str1, const wchar_t *str2) | strstr( ) |
size_t wcsxfrm(wchar_t *str1, const wchar_t *str2, size_t num) | strxfrm( ) In C99, str1 and str2 are qualified by restrict. |
Wide-Character String Conversion Functions
The functions shown in the following table provide wide-character versions of the standard numeric and time conversion functions. These functions use the header <wchar.h>.
Function | char Equivalent |
---|---|
size_t wcsftime(wchar_t *str, size_t max, const wchar_t *fmt, const struct tm *ptr) | |
double wcstod(const wchar_t *start, wchar_t **end); | strtod( ) In C99, start and end are qualified by restrict. |
float wcstof(const wchar_t * restrict start, wchar_t ** restrict end); | strtof( ) (Added by C99) |
long double wcstold(const wchar_t * restrict start, wchar_t ** restrict end); | strtold( ) (Added by C99) |
long int wcstol(const wchar_t *start, wchar_t **end, int radix) | strtol( ) In C99, start and end are qualified by restrict. |
long long int wcstoll(const wchar_t * restrict start, wchar_t ** restrict end, int radix) | strtoll( ) (Added by C99) |
unsigned long int wcstoul( const wchar_t * restrict start, wchar_t ** restrict end, int radix) | strtoul( ) In C99, start and end are qualified by restrict. |
unsigned long long int wcstoull( const wchar_t *start, wchar_t **end, int radix) | strtoull( ) (Added by C99) |
Wide-Character Array Functions
The standard character array-manipulation functions, such as memcpy( ), also have wide-character equivalents. They are shown in the following table. These functions use the header <wchar.h>.
Function | char Equivalent |
---|---|
wchar_t *wmemchr(const wchar_t *str, wchar_t ch, size_t num) | memchr( ) |
int wmemcmp(const wchar_t *str1, const wchar_t *str2, size_t num) | memcmp( ) |
wchar_t *wmemcpy(wchar_t *str1, const wchar_t *str2, size_t num) | |
wchar_t *wmemmove(wchar_t *str1, const wchar_t *str2, size_t num) | memmove( ) |
wchar_t *wmemset(wchar_t *str, wchar_t ch, size_t num) | memset( ) |
Multibyte/Wide-Character Conversion Functions
The standard library supplies various functions that support conversions between multibyte and wide characters. These functions, shown in the following table, use the header <wchar.h>. Many of these functions are restartable versions of the normal multibyte functions. The restartable version utilizes the state information passed to it in a parameter of type mbstate_t. If this parameter is null, the function will provide its own mbstate_t object.
Function | Description |
---|---|
win_t btowc(int ch) | Converts ch into its wide-character equivalent and returns the result. Returns WEOF on error or if ch is not a one-byte, multibyte character. |
size_t mbrlen(const char *str, size_t num, mbstate_t *state) | |
size_t mbrtowc(wchar_t *out, const char *in, size_t num, mbstate_t *state) | Restartable version of mbtowc( ) as described by state. Returns a positive value that indicates the length of the next multibyte character. Zero is returned if the next character is null. A value of –1 is returned if an error occurs and the macro EILSEQ is assigned to errno. If the conversion is incomplete, –2 is returned. In C99, out, in, and state are qualified by restrict. |
int mbsinit(const mbstate_t *state) | Returns true if state represents an initial conversion state. |
size_t mbsrtowcs(wchar_t *out, const char **in, size_t num, mbstate_t state) | Restartable version of mbstowcs( ) as described by state. Also, mbsrtowcs( ) differs from mbstowcs( ) in that in is an indirect pointer to the source array. If an error occurs, the macro EILSEQ is assigned to errno. In C99, out, in, and state are qualified by restrict. |
size_t wcrtomb(char *out, wchar_t ch, mbstate_t *state) | |
size_t wcsrtombs(char *out, const wchar_t **in, size_t num, mbstate_t *state) | Restartable version of wcstombs( ) as described by state. Also, wcsrtombs( ) differs from wcstombs( ) in that in is an indirect pointer to the source array. If an error occurs, the macro EILSEQ is assigned to errno. In C99, out, in, and state are qualified by |
int wctob(wint_t ch) | Converts ch into its one-byte multibyte equivalent. It returns eof on failure. |