Register forum user name Search FAQ

Gammon Forum

Notice: Any messages purporting to come from this site telling you that your password has expired, or that you need to verify your details, confirm your email, resolve issues, making threats, or asking for money, are spam. We do not email users with any such messages. If you have lost your password you can obtain a new one by using the password reset link.
 Entire forum ➜ MUSHclient ➜ International ➜ Localization - is it needed?

Localization - is it needed?

It is now over 60 days since the last post. This thread is closed.     Refresh page


Pages: 1  2 

Posted by Ked   Russia  (524 posts)  Bio
Date Reply #15 on Mon 11 Jun 2007 06:11 AM (UTC)
Message
Quote:
I'm a bit puzzled it was that easy - this was in the full MUSHclient compile?


Well, it's not that easy. On closer inspection it turned out that I merely confirmed that Russian is displayed, not that Unicode is displayed, but Russian characters are on most English codepages so my tests provide no proof that Japanese, for example, will also be displayed without problems.

Later yesterday I tried changing the codeset for the entire solution to Unicode and sure enough - that spawned over 2000 compile errors, mostly having to do with conversion between LPCTSTR/LPCWSTR and char. Some of those errors (those that involve literal strings in assignments and function calls) are easy to solve, but implications of fixing the rest are not as obvious to me.

Quote:
The normal AfxMessageBox function (in a non-Unicode application) expects 8-bit data.


VS8 supposedly has a "Unicode version" of MFC. At least many of the errors I mentioned above seem to indicate that AfxMessageBox expects a wide char* instead of char*, which is what it is getting right now all over the place. Converting literal strings to wide (with the L macro) solves this chunk of errors.

Quote:
As for changing things like dialog boxes - do you think you could make a copy of the resources that use Russian characters, and I could merge them into the existing source? I don't have the .NET compiler (yet, anyway).


Sure.
Top

Posted by Nick Gammon   Australia  (23,120 posts)  Bio   Forum Administrator
Date Reply #16 on Mon 11 Jun 2007 07:05 AM (UTC)
Message
Quote:

... sure enough - that spawned over 2000 compile errors, mostly having to do with conversion between LPCTSTR/LPCWSTR and char ...


Well I tried that once with similar results. You can put _T(...) around strings but then that introduces another heap of errors.

For example, functions that have a 'const char *' prototype will not now accept the new strings. So, you change those, and then they fail because they call something like fwrite, (or strlen) which still expects char *.

It was about at this point when my mind started to boggle. For example, Lua uses 8-bit strings, not 16-bit strings. Also, MUSHclient world files are 8-bit data, not 16-bit data. Also, incoming text from a MUD is 8-bit data.

I think the simpler solution is to stick to UTF-8 for Unicode data, provided we can solve the problem of getting dialog boxes to display in Unicode, which judging by the earlier post, someone has done.

- Nick Gammon

www.gammon.com.au, www.mushclient.com
Top

Posted by Nick Gammon   Australia  (23,120 posts)  Bio   Forum Administrator
Date Reply #17 on Mon 11 Jun 2007 07:09 AM (UTC)
Message
Quote:

On closer inspection it turned out that I merely confirmed that Russian is displayed, not that Unicode is displayed,


What I think you need to do, is look up the Unicode code points by referring to the appropriate page here:

http://www.unicode.org/charts/

Then you can use the "Debug Simulated Input" dialog in MUSHclient to convert Unicode code points into UTF-8 sequences in hex. Having done that you can plant those into a dialog box and confirm they are displayed correctly.

- Nick Gammon

www.gammon.com.au, www.mushclient.com
Top

Posted by Ked   Russia  (524 posts)  Bio
Date Reply #18 on Mon 11 Jun 2007 08:23 PM (UTC)
Message
Ok, I checked a UTF-8 sequence as you suggested. I used the following call, which might even be a wrong way to do it:

::AfxMessageBox("\d0" "\90")


And the dialog displayed two separate characters instead of a single capital "A".

This was with the multi-byte character setting, since Unicode doesn't compile.

From what I understand in "multi-byte mode" AfxMessageBox uses the current locale's codepage to convert an array of chars, so it will display Russian or Chinese text properly as long as the corresponding codepage is selected. But it'll treat UTF-8 also according to this codepage, not UTF-8 itself. At least MSDN docs on setlocale() explicitly say that you cannot set the locale to UTF-7 or UTF-8.

Top

Posted by Nick Gammon   Australia  (23,120 posts)  Bio   Forum Administrator
Date Reply #19 on Mon 11 Jun 2007 09:50 PM (UTC)
Message
Yes, UTF-8 is an encoding system, not a locale.

This is roughly what I am using:


#include <vector>

// display message box - using UTF-8
int UMessageBox (const char * lpszText, UINT nType)
  {

  // find how big table has to be
  int iLength = MultiByteToWideChar (CP_UTF8, 0, lpszText, -1, NULL, NULL);

  // vector to hold Unicode
  vector<WCHAR> v;

  // adjust size
  v.resize (iLength);

  // do the conversion now
  MultiByteToWideChar (CP_UTF8, 0, lpszText, -1, &v [0], iLength);

  // determine icon based on type specified
  if ((nType & MB_ICONMASK) == 0)
  {
    switch (nType & MB_TYPEMASK)
    {
    case MB_OK:
    case MB_OKCANCEL:
      nType |= MB_ICONEXCLAMATION;
      break;

    case MB_YESNO:
    case MB_YESNOCANCEL:
      nType |= MB_ICONEXCLAMATION;
      break;

    case MB_ABORTRETRYIGNORE:
    case MB_RETRYCANCEL:
      // No default icon for these types, since they are rarely used.
      // The caller should specify the icon.
      break;
    }
  }

  int nResult = ::MessageBoxW (NULL, &v [0], L"MUSHclient", nType);

  return nResult;

  }   // end of UMessageBox



And test like this:


   UMessageBox ("\xC9\xB3\xC9\xA8\xC9\x95\xC9\xAE");


That UTF-8 sequence should show 4 characters that look vaguely like "Nick".

What this is doing is taking UTF-8 input, converting to wide characters (WCHAR) using MultiByteToWideChar, and then calling MessageBoxW to display the Unicode text.

I am basically going through the MUSHclient source changing all calls to AfxMessageBox to UMessageBox, thus facilitating the display of Unicode.

- Nick Gammon

www.gammon.com.au, www.mushclient.com
Top

Posted by Nick Gammon   Australia  (23,120 posts)  Bio   Forum Administrator
Date Reply #20 on Mon 11 Jun 2007 09:52 PM (UTC)
Message
I'm not sure about the L"MUSHclient" - that is displayed in the dialog box title. It is a proper name, after all, so perhaps it doesn't need translating?

- Nick Gammon

www.gammon.com.au, www.mushclient.com
Top

Posted by Nick Gammon   Australia  (23,120 posts)  Bio   Forum Administrator
Date Reply #21 on Tue 12 Jun 2007 06:14 AM (UTC)
Message
See follow-up thread with progress to-date:

http://www.gammon.com.au/forum/?id=7953

- Nick Gammon

www.gammon.com.au, www.mushclient.com
Top

The dates and times for posts above are shown in Universal Co-ordinated Time (UTC).

To show them in your local time you can join the forum, and then set the 'time correction' field in your profile to the number of hours difference between your location and UTC time.


78,435 views.

This is page 2, subject is 2 pages long:  [Previous page]  1  2 

It is now over 60 days since the last post. This thread is closed.     Refresh page

Go to topic:           Search the forum


[Go to top] top

Information and images on this site are licensed under the Creative Commons Attribution 3.0 Australia License unless stated otherwise.