The Wide-Character Functions

The Wide-Character Functions

Overview

In 1995, a number of wide-character functions were added to the C89 standard, and were later incorporated into C99 and C++. The wide-character functions operate on characters of type wchar,,t, which are 16 bits. For the most part, these functions parallel their char equivalents. For example, the function iswspace( ) is the wide-character version of ispasse. In general, the wide-character functions use the same names as their char equivalents, except that a “w” is added.
For C, the wide-character functions use the headers <wchar.h> and <wctype.h>. For C++, the headers are <cwchar> and <cwctype>. This chapter will use the C header names, but references to <wchar.h> and <wctype.h> also apply to <cwchar> and <cwctype>, respectively.
The header <wctype.h> defines the types wint_t, wctrans_t, and wctype_t. Many of the wide-character functions receive a wide character as a parameter. The type of this parameter is wint_t. It is capable of holding a wide character. The use of the wint_t type in the wide-character functions parallels the use of int in the char-based functions. The types wctrans_t and wctype_t are the types of objects used to represent a character mapping (i.e., character translation) and the classification of a character, respectively. Also defined is the wide-character EOF mark, which is defined as WEOF.
In addition to defining win_t, the header <wchar.h> defines the types wchar_t, size_t, and mbstate_t. The wchar_t type creates a wide-character object, and size_t is the type of value returned by sizeof. The mbstate_t type describes an object that holds the state of a multibyte to wide-character conversion. The <wchar.h> header also defines the macros NULL, WEOF, WCHAR_MAX, and WCHAR_MIN. The last two define the maximum and minimum value that can be held in an object of type wchar_t.
Since most of the wide-character functions simply parallel their char equivalents, only a brief description of these functions is provided.

The Wide-Character Classification Functions

The header <wctype.h> provides the prototypes for the wide-character functions that support character classification. These functions categorize wide characters as to their type or convert the case of a character. The following table lists these functions along with their char equivalents, which are described in Chapter 7.
Function
char Equivalent
int iswalnum(wint_t ch)
isalnum( )
int iswalpha(wint_t ch)
isalpha( )
int iswblank(wint_t ch)
isblank( ) (Added by C99)
int iswcntrl(wint_t ch)
iscntrl( )
int iswdigit(wint_t ch)
isdigit( )
int iswgraph(wint_t ch)
isgraph( )
int iswlower(wint_t ch)
islower( )
int iswprint(wint_t ch)
isprint( )
int iswpunct(wint_t c)
ispunct( )
int iswspace(wint_t ch)
isspace( )
int iswupper(wint_t ch)
isupper( )
int iswxdigit(wint_t ch)
isxdigit( )
wint_t towlower(wint_t ch)
tolower( )
wint_t towupper(wint_t ch)
toupper( )
In addition to the functions shown above, <wctype.h> defines the following, which provide an open-ended means of classifying characters:
wctype_t wctype(const char *attr);
int iswctype(wint_t ch, wctype_t attr_ob);
The function wctype( ) returns a value that can be passed as the attr_ob parameter to iswctype( ). The string pointed to by attr specifies a property that a character must have. This value can then be used to determine if ch is a character that has that property. If it has, iswctype( ) returns nonzero. Otherwise, it returns zero. The following property strings are defined for all execution environments:
alnum
alpha
cntrl
digit
graph
lower
print
punct
space
upper
xdigit
For C99, the string “blank” is also defined.
The functions wctrans( ) and towctrans( ) are also defined in <wctype.h>. They are shown here:
wctrans_t wctrans(const char *mapping);
wint_t towctrans(wint_t ch, wctrans_t mapping_ob);
The function wctrans( ) returns a value that can be passed as the mapping_ob parameter to towctrans( ). The string pointed to by mapping specifies a mapping of one character to another. This value can then be used by towctrans( ) to map ch. The mapped value is returned. The following mapping strings are supported in all execution environments:
tolower
toupper

The Wide-Character I/O Functions

Several of the I/O functions described in Chapter 6 have wide-character implementations. These functions are shown in the following table. The wide-character I/O functions use the header <wchar.h>. Notice that swprintf( ) and vswprintf( ) require an additional parameter not needed by their char equivalents.
Function
char Equivalent
win_t fgetwc(FILE *stream)
fgetc( )
wchar_t *fgetws(wchar_t *str, int num,
                            FILE *stream)
fgets( )
In C99, str and stream
are qualified by restrict.
wint_t fputwc(wchar_t ch, FILE *stream)
fputc( )
int fputws(const wchar_t *str,
FILE *stream)
fputs( )
In C99, str and stream
are qualified by restrict.
int fwprintf(FILE *stream,
                    const wchar_t *fmt, ...)
fprintf( )
In C99, str and fmt
are qualified by restrict.
int fwscanf(FILE *stream,
                   const wchar_t *fmt, ...)
fscanf( )
In C99, str and fmt
are qualified by restrict.
wint_t getwc(FILE *stream)
getc( )
wint_t getwchar(void)
getchar( )
wint_t putwc(wchar_t ch, FILE *stream)
putc( )
wint_t putwchar(wchar_t ch)
putchar( )
int swprintf(wchar_t *str, size_t num,
                    const wchar_t *fmt, ...)
sprintf( )
Note the addition of the parameter num, which
limits the number of
characters written to str.
In C99, str and fmt are
qualified by restrict.
int swscanf(const wchar_t *str,
                    const wchar_t *fmt, ...)
sscanf( )
In C99, str and fmt are
qualified by restrict.
wint_t ungetwc(wint_t ch, FILE *stream)
ungetc( )
int vfwprintf(FILE *stream,
                     const wchar_t *fmt, va_list arg)
vfprintf( )
In C99, str and fmt are
qualified by restrict.
int vfwscanf(FILE * restrict stream,
                     const wchar_t * restrict fmt,
                     va_list arg);
vfscanf( ) (Added by C99.)
int vswprintf(wchar_t *str, size_t num,
                      const wchar_t *fmt, va_list arg)
vsprintf( )
Note the addition of the parameter num, which
limits the number of
characters written to str.
In C99, str and fmt are
qualified by restrict.
int vswscanf(const wchar_t * restrict str,
                      const wchar_t * restrict fmt,
                      va_list arg);
vsscanf( ) (Added by C99.)
int vwprintf(const wchar_t
                    *fmt, va_list arg)
vprintf( )
In C99, str and fmt are
qualified by restrict.
int vwscanf(const wchar_t * restrict fmt,
                    va_list arg);
vscanf( ) (Added by C99.)
int wprintf(const wchar_t *fmt, ...)
printf( )
In C99, fmt is qualified
by restrict.
int wscanf(const wchar_t *fmt, ...)
scanf( )
In C99, fmt is qualified
by restrict.
In addition to those shown in the table, the following wide-character I/O function has been added:
int fwide(FILE *stream, int how);
If how is positive, fwide( ) makes stream a wide-character stream. If how is negative, fwide( ) makes stream into a char stream. If how is zero, stream is unaffected. If the stream has already been oriented to either wide or normal characters, it will not be changed. The function returns positive if the stream uses wide characters, negative if the stream uses chars, and zero if the stream has not yet been oriented. A stream’s orientation is also determined by its first use.

The Wide-Character String Functions

There are wide-character versions of the string manipulation functions described in Chapter 7. These are shown in the following table. They use the header <wchar.h>. Note that wcstok( ) requires an additional parameter not used by its char equivalent.
Function
char Equivalent
wchar_t *wcscat(wchar_t *str1, const
                             wchar_t *str2)
strcat( )
In C99, str1 and str2 are
qualified by restrict.
wchar_t *wcschr(const wchar_t *str,
                             wchar_t ch)
strchr( )
int wcscmp(const wchar_t *str1,
                    const wchar_t *str2)
strcmp( )
int wcscoll(const wchar_t *str1,
                   const wchar_t *str2)
strcoll( )
size_t wcscspn(const wchar_t *str1,
                          const wchar_t *str2)
strcspn( )
wchar_t *wcscpy(wchar_t *str1,
                              const wchar_t *str2)
strcpy( )
In C99, str1 and str2 are qualified by restrict.
size_t wcslen(const wchar_t *str)
strlen( )
wchar_t *wcsncpy(wchar_t *str1,
                                const wchar_t str2,
                                size_t num)
strncpy( )
In C99, str1 and str2 are qualified by restrict.
wchar_t *wcsncat(wchar_t *str1,
                               const wchar_t str2,
                               size_t num)
strncat( )
In C99, str1 and str2 are qualified by restrict.
int wcsncmp(const wchar_t *str1,
                      const wchar_t *str2, size_t num)
strncmp( )
wchar_t *wcspbrk(const wchar_t *str1,
                               const wchar_t *str2)
strpbrk( )
wchar_t *wcsrchr(const wchar_t *str,
                               wchar_t ch)
strrchr( )
size_t wcsspn(const wchar_t *str1,
                        const wchar_t *str2)
strspn( )
wchar_t *wcstok(wchar_t *str1, const
                             wchar_t *str2,
                             wchar_t **endptr)
strtok( )
Here, endptr is a pointer
that holds information necessary to continue
the tokenizing process.
In C99, str1, str2, and
endptr are qualified
by restrict.
wchar_t *wcsstr(const wchar_t *str1,
                            const wchar_t *str2)
strstr( )
size_t wcsxfrm(wchar_t *str1, const
                          wchar_t *str2,
                          size_t num)
strxfrm( )
In C99, str1 and str2 are qualified by restrict.

Wide-Character String Conversion Functions

The functions shown in the following table provide wide-character versions of the standard numeric and time conversion functions. These functions use the header <wchar.h>.
Function
char Equivalent
size_t wcsftime(wchar_t *str, size_t max,
                           const wchar_t *fmt,
                           const struct tm *ptr)
strftime( )
In C99, str, fmt, and ptr
are qualified by restrict.
double wcstod(const wchar_t *start,
                         wchar_t **end);
strtod( )
In C99, start and end are qualified by restrict.
float wcstof(const wchar_t * restrict start,
                    wchar_t ** restrict end);
strtof( ) (Added by C99)
long double wcstold(const wchar_t *
                           restrict start,
                           wchar_t ** restrict end);
strtold( ) (Added by C99)
long int wcstol(const wchar_t *start,
                          wchar_t **end,
                          int radix)
strtol( )
In C99, start and end are qualified by restrict.
long long int wcstoll(const wchar_t *
                            restrict start,
                            wchar_t ** restrict end,
                            int radix)
strtoll( ) (Added by C99)
unsigned long int wcstoul(
                          const wchar_t * restrict
                          start,
                          wchar_t ** restrict end,
                          int radix)
strtoul( )
In C99, start and end are qualified by restrict.
unsigned long long int wcstoull(
                                  const wchar_t *start,
                                  wchar_t **end,
                                  int radix)
strtoull( ) (Added
by C99)

Wide-Character Array Functions

The standard character array-manipulation functions, such as memcpy( ), also have wide-character equivalents. They are shown in the following table. These functions use the header <wchar.h>.
Function
char Equivalent
wchar_t *wmemchr(const wchar_t *str,
                                  wchar_t ch, size_t num)
memchr( )
int wmemcmp(const wchar_t *str1,
                         const wchar_t *str2, size_t num)
memcmp( )
wchar_t *wmemcpy(wchar_t *str1,
                                  const wchar_t *str2,
                                  size_t num)
memcpy( )
In C99, str1 and
str2 are qualified
by restrict.
wchar_t *wmemmove(wchar_t *str1,
                                     const wchar_t *str2,
                                     size_t num)
memmove( )
wchar_t *wmemset(wchar_t *str, wchar_t ch,
                                 size_t num)
memset( )

Multibyte/Wide-Character Conversion Functions

The standard library supplies various functions that support conversions between multibyte and wide characters. These functions, shown in the following table, use the header <wchar.h>. Many of these functions are restartable versions of the normal multibyte functions. The restartable version utilizes the state information passed to it in a parameter of type mbstate_t. If this parameter is null, the function will provide its own mbstate_t object.
Function
Description
win_t btowc(int ch)
Converts ch into its wide-character equivalent and returns the result. Returns WEOF on error or if ch is not a one-byte, multibyte character.
size_t mbrlen(const char *str, size_t num,
                       mbstate_t *state)
Restartable version of mblen( ) as described by state. Returns a positive value that indicates the length of the next multibyte character. Zero is returned if the next character is null. A negative value is returned if an error occurs. In C99, str and state are qualified by restrict.
size_t mbrtowc(wchar_t *out,
                          const char *in, size_t num,
                          mbstate_t *state)
Restartable version of mbtowc( ) as described by state. Returns a positive value that indicates the length of the next multibyte character. Zero is returned if the next character is null. A value of –1 is returned if an error occurs and the macro EILSEQ is assigned to errno. If the conversion is incomplete, –2 is returned. In C99, out, in, and state are qualified by restrict.
int mbsinit(const mbstate_t *state)
Returns true if state represents an initial conversion state.
size_t mbsrtowcs(wchar_t *out,
                              const char **in,
                              size_t num,
                              mbstate_t state)
Restartable version of mbstowcs( ) as described by state. Also, mbsrtowcs( ) differs from mbstowcs( ) in that in is an indirect pointer to the source array. If an error occurs, the macro EILSEQ is assigned to errno. In C99, out, in, and state are qualified by restrict.
size_t wcrtomb(char *out, wchar_t ch,
                          mbstate_t *state)
Restartable version of wctomb( ) as described by state. If an error occurs, the macro EILSEQ is assigned to errno. In C99, out and state are qualified by restrict.
size_t wcsrtombs(char *out,
                              const wchar_t **in,
                              size_t num,
                              mbstate_t *state)
Restartable version of wcstombs( ) as described by state. Also, wcsrtombs( ) differs from wcstombs( ) in that in is an indirect pointer to the source array. If an error occurs, the macro EILSEQ is assigned to errno. In C99, out, in, and state are qualified by 
int wctob(wint_t ch)
Converts ch into its one-byte multibyte equivalent. It returns eof on failure.

Followers