Notice: Any messages purporting to come from this site telling you that your password has expired, or that you need to verify your details, confirm your email, resolve issues, making threats, or asking for money, are
spam. We do not email users with any such messages. If you have lost your password you can obtain a new one by using the
password reset link.
Entire forum
➜ MUSHclient
➜ International
➜ Localization - is it needed?
Localization - is it needed?
|
It is now over 60 days since the last post. This thread is closed.
Refresh page
Pages: 1
2
Posted by
| Ked
Russia (524 posts) Bio
|
Date
| Reply #15 on Mon 11 Jun 2007 06:11 AM (UTC) |
Message
|
Quote: I'm a bit puzzled it was that easy - this was in the full MUSHclient compile?
Well, it's not that easy. On closer inspection it turned out that I merely confirmed that Russian is displayed, not that Unicode is displayed, but Russian characters are on most English codepages so my tests provide no proof that Japanese, for example, will also be displayed without problems.
Later yesterday I tried changing the codeset for the entire solution to Unicode and sure enough - that spawned over 2000 compile errors, mostly having to do with conversion between LPCTSTR/LPCWSTR and char. Some of those errors (those that involve literal strings in assignments and function calls) are easy to solve, but implications of fixing the rest are not as obvious to me.
Quote: The normal AfxMessageBox function (in a non-Unicode application) expects 8-bit data.
VS8 supposedly has a "Unicode version" of MFC. At least many of the errors I mentioned above seem to indicate that AfxMessageBox expects a wide char* instead of char*, which is what it is getting right now all over the place. Converting literal strings to wide (with the L macro) solves this chunk of errors.
Quote: As for changing things like dialog boxes - do you think you could make a copy of the resources that use Russian characters, and I could merge them into the existing source? I don't have the .NET compiler (yet, anyway).
Sure. | Top |
|
Posted by
| Nick Gammon
Australia (23,120 posts) Bio
Forum Administrator |
Date
| Reply #16 on Mon 11 Jun 2007 07:05 AM (UTC) |
Message
|
Quote:
... sure enough - that spawned over 2000 compile errors, mostly having to do with conversion between LPCTSTR/LPCWSTR and char ...
Well I tried that once with similar results. You can put _T(...) around strings but then that introduces another heap of errors.
For example, functions that have a 'const char *' prototype will not now accept the new strings. So, you change those, and then they fail because they call something like fwrite, (or strlen) which still expects char *.
It was about at this point when my mind started to boggle. For example, Lua uses 8-bit strings, not 16-bit strings. Also, MUSHclient world files are 8-bit data, not 16-bit data. Also, incoming text from a MUD is 8-bit data.
I think the simpler solution is to stick to UTF-8 for Unicode data, provided we can solve the problem of getting dialog boxes to display in Unicode, which judging by the earlier post, someone has done. |
- Nick Gammon
www.gammon.com.au, www.mushclient.com | Top |
|
Posted by
| Nick Gammon
Australia (23,120 posts) Bio
Forum Administrator |
Date
| Reply #17 on Mon 11 Jun 2007 07:09 AM (UTC) |
Message
|
Quote:
On closer inspection it turned out that I merely confirmed that Russian is displayed, not that Unicode is displayed,
What I think you need to do, is look up the Unicode code points by referring to the appropriate page here:
http://www.unicode.org/charts/
Then you can use the "Debug Simulated Input" dialog in MUSHclient to convert Unicode code points into UTF-8 sequences in hex. Having done that you can plant those into a dialog box and confirm they are displayed correctly. |
- Nick Gammon
www.gammon.com.au, www.mushclient.com | Top |
|
Posted by
| Ked
Russia (524 posts) Bio
|
Date
| Reply #18 on Mon 11 Jun 2007 08:23 PM (UTC) |
Message
| Ok, I checked a UTF-8 sequence as you suggested. I used the following call, which might even be a wrong way to do it:
::AfxMessageBox("\d0" "\90")
And the dialog displayed two separate characters instead of a single capital "A".
This was with the multi-byte character setting, since Unicode doesn't compile.
From what I understand in "multi-byte mode" AfxMessageBox uses the current locale's codepage to convert an array of chars, so it will display Russian or Chinese text properly as long as the corresponding codepage is selected. But it'll treat UTF-8 also according to this codepage, not UTF-8 itself. At least MSDN docs on setlocale() explicitly say that you cannot set the locale to UTF-7 or UTF-8.
| Top |
|
Posted by
| Nick Gammon
Australia (23,120 posts) Bio
Forum Administrator |
Date
| Reply #19 on Mon 11 Jun 2007 09:50 PM (UTC) |
Message
| Yes, UTF-8 is an encoding system, not a locale.
This is roughly what I am using:
#include <vector>
// display message box - using UTF-8
int UMessageBox (const char * lpszText, UINT nType)
{
// find how big table has to be
int iLength = MultiByteToWideChar (CP_UTF8, 0, lpszText, -1, NULL, NULL);
// vector to hold Unicode
vector<WCHAR> v;
// adjust size
v.resize (iLength);
// do the conversion now
MultiByteToWideChar (CP_UTF8, 0, lpszText, -1, &v [0], iLength);
// determine icon based on type specified
if ((nType & MB_ICONMASK) == 0)
{
switch (nType & MB_TYPEMASK)
{
case MB_OK:
case MB_OKCANCEL:
nType |= MB_ICONEXCLAMATION;
break;
case MB_YESNO:
case MB_YESNOCANCEL:
nType |= MB_ICONEXCLAMATION;
break;
case MB_ABORTRETRYIGNORE:
case MB_RETRYCANCEL:
// No default icon for these types, since they are rarely used.
// The caller should specify the icon.
break;
}
}
int nResult = ::MessageBoxW (NULL, &v [0], L"MUSHclient", nType);
return nResult;
} // end of UMessageBox
And test like this:
UMessageBox ("\xC9\xB3\xC9\xA8\xC9\x95\xC9\xAE");
That UTF-8 sequence should show 4 characters that look vaguely like "Nick".
What this is doing is taking UTF-8 input, converting to wide characters (WCHAR) using MultiByteToWideChar, and then calling MessageBoxW to display the Unicode text.
I am basically going through the MUSHclient source changing all calls to AfxMessageBox to UMessageBox, thus facilitating the display of Unicode. |
- Nick Gammon
www.gammon.com.au, www.mushclient.com | Top |
|
Posted by
| Nick Gammon
Australia (23,120 posts) Bio
Forum Administrator |
Date
| Reply #20 on Mon 11 Jun 2007 09:52 PM (UTC) |
Message
| I'm not sure about the L"MUSHclient" - that is displayed in the dialog box title. It is a proper name, after all, so perhaps it doesn't need translating? |
- Nick Gammon
www.gammon.com.au, www.mushclient.com | Top |
|
Posted by
| Nick Gammon
Australia (23,120 posts) Bio
Forum Administrator |
Date
| Reply #21 on Tue 12 Jun 2007 06:14 AM (UTC) |
Message
| |
The dates and times for posts above are shown in Universal Co-ordinated Time (UTC).
To show them in your local time you can join the forum, and then set the 'time correction' field in your profile to the number of hours difference between your location and UTC time.
78,435 views.
This is page 2, subject is 2 pages long:
1
2
It is now over 60 days since the last post. This thread is closed.
Refresh page
top