Visual c++ character sets, _UNICODE,

UCanCode Software focuses on general application software development. We provide complete solution for developers. No matter you want to develop a simple database workflow application, or an large flow/diagram based system, our product will provide a complete solution for you. Our product had been used by hundreds of top companies around the world!

"100% source code provided! Free you from not daring to use components because of unable to master the key technology of components!"

Visual c++ character sets, Unicode, _MBCS

Introduction

Q: I have this simple function call:

MessageBox(NULL, "Test message", "Title", MB_OK);

The compiler raises the following error and I don't understand why.

error C2664: 'MessageBoxW' : cannot convert parameter 2 from 'const char [13]' to 'LPCWSTR'
Types pointed to are unrelated; conversion requires reinterpret_cast, C-style cast or function-style cast

A: Simply answered, that happens because the project is built for UNICODE.

Microsoft run-time library provides Microsoft-specific generic-text mappings for many data types, routines and other objects, mappings that are defined in TCHAR.h. There are three supported character sets:[/list][*]ASCII (single-byte character set – SBCS)[*]MBCS (multi-byte character set)[*]Unicode[/list]
The use of one or another character set is controlled by two pre-processor directives:

_UNICODE: if defined, Unicode is the character set used
_MBCS: if defined, MBCS is used
If neither of the above (mutually-exclusive) is defined, ASCII is the character set used

The Windows API provides different version of each function for Unicode and ASCII.

Q: How do I select the character set?
A: You have to go to Project Properties > Configuration Properties > General and change the value of the Character Set property. The three available options are:

Not Set (neither _UNICODE nor _MBCS are defined)
Use Multi-byte Character Set (_MBCS is defined)
Use Unicode Character Set (_UNICODE is defined)

Q: How exactly do the generic-text mapping directives affect the data types and functions that I'm using?
A: C run-time library functions, such as _itot, or Windows API functions, such are MessageBox, aren't functions at all; they are macros.

The C run-time library provides functions for all character sets and a macro to define one or another of these functions depending on the used character set. For instance macro _itot resolves to:

_itoa, when _UNICODE is not defined
_itow, when _UNICODE is defined

Similarly, TCHAR resolves:

char, when _UNICODE is not defined
wchar_t, when _UNICODE is defined

You can read more about the mappings in MSDN.

On the other hand, the Windows API comes in two versions: for Unicode and for ASCII/Multi-byte. If you read the MSDN page for MessageBox it says:

The MessageBox function creates, displays, and operates a message box. The message box contains an application-defined message and title, plus any combination of predefined icons and push buttons.

int MessageBox(
HWND hWnd,
LPCTSTR lpText,
LPCTSTR lpCaption,
UINT uType);

Actually, MessageBox and LPCTSTR are both macros. You can see how MessageBox it's defined in WinUser.h:

#ifdef UNICODE
#define MessageBox MessageBoxW
#else
#define MessageBox MessageBoxA
#endif // !UNICODE

There are two version of the function, actually: MessageBoxA for ASCII & MBCS and MessageBoxW for Unicode. When UNICODE (which is the same with _UNICODE) is defined then MessageBox resolves to MessageBoxW and LPCTSTR to LPCWSTR (i.e. const whar_t*); otherwise MessageBox resolves to MessageBoxA and LPCTSTR to LPCSTR (i.e. const char*).

Q: How do I write my program so that it builds for any of these character sets without modifying the code when the character set changes?
A: In a single-byte or multi-byte character set the strings and characters are not prefixed my anything ('string', 'c'). However, for Unicode strings and characters required the suffix L, such as L"string" and L'c'. You can use the Microsoft-specific macros _T() or _TEXT(). These macros are removed by the pre-processor when _UNICODE is not defined, and replaced with L when _UNICODE is defined.

Unicode defined:

no: _T("string") becomes "string" and _T('c') becomes 'c'
yes: _T("string") becomes L"string" and _T('c') becomes L'c'

Q: How do I fix the mention line of code?
A: It should be clear now:

MessageBox(NULL, _T("Test message"), _T("Title"), MB_OK);

Get Ready to Unleash the Power of UCanCode .NET