[Home] [Downloads] [Search] [Help/forum]

Gammon Software Solutions forum

See www.mushclient.com/spam for dealing with forum spam. Please read the MUSHclient FAQ!

[Folder]  Entire forum
-> [Folder]  MUSHclient
. -> [Folder]  International
. . -> [Subject]  Full Unicode support

Home  |  Users  |  Search  |  FAQ
Username:
Register forum user name
Password:
Forgotten password?
(New message)
Subject: Full Unicode support
Name:
Your forum user name.
Register forum user name
Password:
Your forum password.
Forgotten password?
Message:
Message to be posted (in English, please).
Forum codes:
Check this if your message uses 'forum codes' or templates (auto-detected for new posts).
Forum codes Templates

Save this message ...


Subject review (reverse sequence)

Pages: 1  2 3  4  5  

Posted by Nick Gammon   Australia  (18,797 posts)  [Biography] bio   Forum Administrator
Date Sun 05 Jun 2011 09:40 PM (UTC)  quote  ]
Message
Changed to date_modified, thanks.

- Nick Gammon

www.gammon.com.au, www.mushclient.com
[Go to top] top

Posted by Fiendish   USA  (850 posts)  [Biography] bio   Global Moderator
Date Sun 05 Jun 2011 06:10 PM (UTC)  quote  ]
Message
the first plugin shown on http://www.gammon.com.au/forum/bbshowpost.php?id=2681&page=4 currently has two entries for date_written, which will cause the plugin to fail to load

http://aardwolfclientpackage.googlecode.com/
[Go to top] top

Posted by Atltais   (8 posts)  [Biography] bio
Date Mon 09 Jun 2008 07:10 AM (UTC)  quote  ]

Amended on Mon 09 Jun 2008 10:09 PM (UTC) by Atltais

Message
It's a whole range, yes, and UTF8 is a superset of ASCII, which, as I understand, is one of its biggest advantages. The larger problem, I suppose, is that (as perviously stated) Windows uses UTF16 internally (source: http://msdn.microsoft.com/en-us/library/ms776459(VS.85).aspx), which complicates matters somewhat. (additionally, you get into the endianness issue) With UTF8 you get 'pretty much' any character in regular use. (the entirety of the BMP, past this most fonts don't even have representations anyways, but that's getting wildly off topic. e:Plus, to my understanding, UTF8 supports up to U+10FFFF anyways.)

All in all, I suppose it's a relatively minor issue (since those honestly needing client-side UTF8 support can't be all that numerous) and development time may be better spent elsewhere.

edit: That is to say, endianness doesn't matter in UTF8 as it does in UTF16/32, since UTF8 is byte oriented. One thing to note though, in both UTF8 [i]and[/i] UTF16, is that characters aren't fixed width. (as in size) Therefore, UTF16 can handle codes above U+FFFF (and indeed, so can UTF8)

UTF8 is as widely supported as it is simply because it's (more or less) backwards compatible with ASCII right out of the box, so it can take a standard ASCII string (if the characters are all <=0x7F, that is) and be happy with it. In any case, it's quite an undertaking to convert a program as big as MUSHclient into a 100% UTF8 program.
[Go to top] top

Posted by Nick Gammon   Australia  (18,797 posts)  [Biography] bio   Forum Administrator
Date Mon 09 Jun 2008 06:33 AM (UTC)  quote  ]
Message
Unicode isn't one single thing for a start. Just check out www.unicode.org to see what I mean. Basically the idea is to represent various characters (glyphs) in a consistent way by assigning a different number to each one. But how that number is stored can vary somewhat. UTF-8 uses an encoding system that is indeed identical to non-Unicode for characters <= 0x7F, however once you move to higher values you have heaps of options. Do you want 16-bit characters? 32-bits? Which orders are the bytes? Big-endian or little-endian?

Under the Windows compiler, enabling the UNICODE define switches the representation of characters from char (8 bit) to long (16 bit). Straight away this won't work for Unicode characters > 0xFFFF. Also you can't just copy stuff from the MUD (8 bit characters) into the internal spaces (16-bit characters) without using a special call.

It's a can of worms, one I don't propose to open in the near future.

- Nick Gammon

www.gammon.com.au, www.mushclient.com
[Go to top] top

Posted by Atltais   (8 posts)  [Biography] bio
Date Mon 09 Jun 2008 12:47 AM (UTC)  quote  ]

Amended on Mon 09 Jun 2008 12:56 AM (UTC) by Atltais

Message
Isn't Unicode (e: Well, UTF8 that is, causing additional fun/grief because Windows uses UTF16, which is a bit different.) more or less backwards compliant with ASCII characters below 0x7F anyways?

Granted, you would have to go from UTF16 to UTF8 for regexp, true enough.

But, for most MUDs, it shouldn't be a problem if they don't use characters over 0x7F, but if they do (if it's a non-unicode, non-english MUD), you end up with a bit of a problem. Which, I suppose, is one reason why other clients aren't unicode.
[Go to top] top

Posted by Nick Gammon   Australia  (18,797 posts)  [Biography] bio   Forum Administrator
Date Mon 09 Jun 2008 12:20 AM (UTC)  quote  ]
Message
Yes I tried that, and the problem is not easily solved. For example, some things like the PCRE regexp-matcher don't use Unicode, they use 8-bit strings. It accepts UTF-8, but that means you need to convert back and forwards from wide strings to UTF-8. And then there are the MUDs, most of which send 8-bit text, not UTF-8 nor wide strings.

- Nick Gammon

www.gammon.com.au, www.mushclient.com
[Go to top] top

Posted by Atltais   (8 posts)  [Biography] bio
Date Sun 08 Jun 2008 10:22 PM (UTC)  quote  ]
Message
It looks like defining UNICODE would (technically) do it, but it'll cough up a whole slew of errors (about 5000 or so) when you attempt this.
[Go to top] top

Posted by Nick Gammon   Australia  (18,797 posts)  [Biography] bio   Forum Administrator
Date Wed 04 Jun 2008 03:39 AM (UTC)  quote  ]
Message

An example of the first plugin (on this page) in operation is here:

This demonstrates how (with a Greek version, using the codes just above), I was able to type in the command window using Greek characters, and send them. The characters arrived in the MUD converted to UTF-8, which were then echoed back correctly in the output window.

- Nick Gammon

www.gammon.com.au, www.mushclient.com
[Go to top] top

Posted by Nick Gammon   Australia  (18,797 posts)  [Biography] bio   Forum Administrator
Date Wed 04 Jun 2008 02:46 AM (UTC)  quote  ]
Message
As an example of another language, if you wanted Greek encoding, you could go to this page:

http://www.unicode.org/Public/MAPPINGS/VENDORS/MICSFT/WINDOWS/CP1253.TXT

Grab the mappings for 0x80 to 0xFF (see below) and paste them into the two plugins instead of the Russian ones:


0x80	0x20AC	#EURO SIGN
0x81	      	#UNDEFINED
0x82	0x201A	#SINGLE LOW-9 QUOTATION MARK
0x83	0x0192	#LATIN SMALL LETTER F WITH HOOK
0x84	0x201E	#DOUBLE LOW-9 QUOTATION MARK
0x85	0x2026	#HORIZONTAL ELLIPSIS
0x86	0x2020	#DAGGER
0x87	0x2021	#DOUBLE DAGGER
0x88	      	#UNDEFINED
0x89	0x2030	#PER MILLE SIGN
0x8A	      	#UNDEFINED
0x8B	0x2039	#SINGLE LEFT-POINTING ANGLE QUOTATION MARK
0x8C	      	#UNDEFINED
0x8D	      	#UNDEFINED
0x8E	      	#UNDEFINED
0x8F	      	#UNDEFINED
0x90	      	#UNDEFINED
0x91	0x2018	#LEFT SINGLE QUOTATION MARK
0x92	0x2019	#RIGHT SINGLE QUOTATION MARK
0x93	0x201C	#LEFT DOUBLE QUOTATION MARK
0x94	0x201D	#RIGHT DOUBLE QUOTATION MARK
0x95	0x2022	#BULLET
0x96	0x2013	#EN DASH
0x97	0x2014	#EM DASH
0x98	      	#UNDEFINED
0x99	0x2122	#TRADE MARK SIGN
0x9A	      	#UNDEFINED
0x9B	0x203A	#SINGLE RIGHT-POINTING ANGLE QUOTATION MARK
0x9C	      	#UNDEFINED
0x9D	      	#UNDEFINED
0x9E	      	#UNDEFINED
0x9F	      	#UNDEFINED
0xA0	0x00A0	#NO-BREAK SPACE
0xA1	0x0385	#GREEK DIALYTIKA TONOS
0xA2	0x0386	#GREEK CAPITAL LETTER ALPHA WITH TONOS
0xA3	0x00A3	#POUND SIGN
0xA4	0x00A4	#CURRENCY SIGN
0xA5	0x00A5	#YEN SIGN
0xA6	0x00A6	#BROKEN BAR
0xA7	0x00A7	#SECTION SIGN
0xA8	0x00A8	#DIAERESIS
0xA9	0x00A9	#COPYRIGHT SIGN
0xAA	      	#UNDEFINED
0xAB	0x00AB	#LEFT-POINTING DOUBLE ANGLE QUOTATION MARK
0xAC	0x00AC	#NOT SIGN
0xAD	0x00AD	#SOFT HYPHEN
0xAE	0x00AE	#REGISTERED SIGN
0xAF	0x2015	#HORIZONTAL BAR
0xB0	0x00B0	#DEGREE SIGN
0xB1	0x00B1	#PLUS-MINUS SIGN
0xB2	0x00B2	#SUPERSCRIPT TWO
0xB3	0x00B3	#SUPERSCRIPT THREE
0xB4	0x0384	#GREEK TONOS
0xB5	0x00B5	#MICRO SIGN
0xB6	0x00B6	#PILCROW SIGN
0xB7	0x00B7	#MIDDLE DOT
0xB8	0x0388	#GREEK CAPITAL LETTER EPSILON WITH TONOS
0xB9	0x0389	#GREEK CAPITAL LETTER ETA WITH TONOS
0xBA	0x038A	#GREEK CAPITAL LETTER IOTA WITH TONOS
0xBB	0x00BB	#RIGHT-POINTING DOUBLE ANGLE QUOTATION MARK
0xBC	0x038C	#GREEK CAPITAL LETTER OMICRON WITH TONOS
0xBD	0x00BD	#VULGAR FRACTION ONE HALF
0xBE	0x038E	#GREEK CAPITAL LETTER UPSILON WITH TONOS
0xBF	0x038F	#GREEK CAPITAL LETTER OMEGA WITH TONOS
0xC0	0x0390	#GREEK SMALL LETTER IOTA WITH DIALYTIKA AND TONOS
0xC1	0x0391	#GREEK CAPITAL LETTER ALPHA
0xC2	0x0392	#GREEK CAPITAL LETTER BETA
0xC3	0x0393	#GREEK CAPITAL LETTER GAMMA
0xC4	0x0394	#GREEK CAPITAL LETTER DELTA
0xC5	0x0395	#GREEK CAPITAL LETTER EPSILON
0xC6	0x0396	#GREEK CAPITAL LETTER ZETA
0xC7	0x0397	#GREEK CAPITAL LETTER ETA
0xC8	0x0398	#GREEK CAPITAL LETTER THETA
0xC9	0x0399	#GREEK CAPITAL LETTER IOTA
0xCA	0x039A	#GREEK CAPITAL LETTER KAPPA
0xCB	0x039B	#GREEK CAPITAL LETTER LAMDA
0xCC	0x039C	#GREEK CAPITAL LETTER MU
0xCD	0x039D	#GREEK CAPITAL LETTER NU
0xCE	0x039E	#GREEK CAPITAL LETTER XI
0xCF	0x039F	#GREEK CAPITAL LETTER OMICRON
0xD0	0x03A0	#GREEK CAPITAL LETTER PI
0xD1	0x03A1	#GREEK CAPITAL LETTER RHO
0xD2	      	#UNDEFINED
0xD3	0x03A3	#GREEK CAPITAL LETTER SIGMA
0xD4	0x03A4	#GREEK CAPITAL LETTER TAU
0xD5	0x03A5	#GREEK CAPITAL LETTER UPSILON
0xD6	0x03A6	#GREEK CAPITAL LETTER PHI
0xD7	0x03A7	#GREEK CAPITAL LETTER CHI
0xD8	0x03A8	#GREEK CAPITAL LETTER PSI
0xD9	0x03A9	#GREEK CAPITAL LETTER OMEGA
0xDA	0x03AA	#GREEK CAPITAL LETTER IOTA WITH DIALYTIKA
0xDB	0x03AB	#GREEK CAPITAL LETTER UPSILON WITH DIALYTIKA
0xDC	0x03AC	#GREEK SMALL LETTER ALPHA WITH TONOS
0xDD	0x03AD	#GREEK SMALL LETTER EPSILON WITH TONOS
0xDE	0x03AE	#GREEK SMALL LETTER ETA WITH TONOS
0xDF	0x03AF	#GREEK SMALL LETTER IOTA WITH TONOS
0xE0	0x03B0	#GREEK SMALL LETTER UPSILON WITH DIALYTIKA AND TONOS
0xE1	0x03B1	#GREEK SMALL LETTER ALPHA
0xE2	0x03B2	#GREEK SMALL LETTER BETA
0xE3	0x03B3	#GREEK SMALL LETTER GAMMA
0xE4	0x03B4	#GREEK SMALL LETTER DELTA
0xE5	0x03B5	#GREEK SMALL LETTER EPSILON
0xE6	0x03B6	#GREEK SMALL LETTER ZETA
0xE7	0x03B7	#GREEK SMALL LETTER ETA
0xE8	0x03B8	#GREEK SMALL LETTER THETA
0xE9	0x03B9	#GREEK SMALL LETTER IOTA
0xEA	0x03BA	#GREEK SMALL LETTER KAPPA
0xEB	0x03BB	#GREEK SMALL LETTER LAMDA
0xEC	0x03BC	#GREEK SMALL LETTER MU
0xED	0x03BD	#GREEK SMALL LETTER NU
0xEE	0x03BE	#GREEK SMALL LETTER XI
0xEF	0x03BF	#GREEK SMALL LETTER OMICRON
0xF0	0x03C0	#GREEK SMALL LETTER PI
0xF1	0x03C1	#GREEK SMALL LETTER RHO
0xF2	0x03C2	#GREEK SMALL LETTER FINAL SIGMA
0xF3	0x03C3	#GREEK SMALL LETTER SIGMA
0xF4	0x03C4	#GREEK SMALL LETTER TAU
0xF5	0x03C5	#GREEK SMALL LETTER UPSILON
0xF6	0x03C6	#GREEK SMALL LETTER PHI
0xF7	0x03C7	#GREEK SMALL LETTER CHI
0xF8	0x03C8	#GREEK SMALL LETTER PSI
0xF9	0x03C9	#GREEK SMALL LETTER OMEGA
0xFA	0x03CA	#GREEK SMALL LETTER IOTA WITH DIALYTIKA
0xFB	0x03CB	#GREEK SMALL LETTER UPSILON WITH DIALYTIKA
0xFC	0x03CC	#GREEK SMALL LETTER OMICRON WITH TONOS
0xFD	0x03CD	#GREEK SMALL LETTER UPSILON WITH TONOS
0xFE	0x03CE	#GREEK SMALL LETTER OMEGA WITH TONOS
0xFF	      	#UNDEFINED

- Nick Gammon

www.gammon.com.au, www.mushclient.com
[Go to top] top

Posted by Nick Gammon   Australia  (18,797 posts)  [Biography] bio   Forum Administrator
Date Tue 03 Jun 2008 11:10 PM (UTC)  quote  ]
Message
Quote:

the input box itself needs to be an Unicode window, though


Well that is the hard bit. I tried for a few days previously, but I don't think you can make individual windows in a non-Unicode program to be Unicode.

The source code is freely available, if someone can make it work I would be pleased to hear from them.

Quote:

Alas, pasting from other programs is still broken,


To make that work, as I presume the code on the clipboard is UTF-8, would be to make a plugin that does what my "copy from the output window" plugin does. You would hit a function key (eg. F8), the plugin kicks in, grabs the clipboard contents, switches it from UTF-8 to the code page, and puts it back. Or that is the theory at least.

I know this isn't perfect, but my initial tests with Russian seemed to show it worked smoothly enough, providing you didn't try to get too fancy and copy and paste from one application to another.

I suspect the copying/pasting thing might be related to clipboard "types" where the UTF-8 data is not necessarily stored in the TEXT data type (but I could be wrong).

- Nick Gammon

www.gammon.com.au, www.mushclient.com
[Go to top] top

Posted by Atltais   (8 posts)  [Biography] bio
Date Tue 03 Jun 2008 11:06 PM (UTC)  quote  ]
Message
Applocale is a much simpler way to change the code page, as an aside.
[Go to top] top

Posted by Nick Gammon   Australia  (18,797 posts)  [Biography] bio   Forum Administrator
Date Tue 03 Jun 2008 11:05 PM (UTC)  quote  ]
Message

You should at least be able to get typing into the window working, as I am no expert but I managed after trying a couple of things.

First, I added a second language as a keyboard option:

Then, I set my code page to Russian, so that when I used my extra keyboard settings, Russian is what I would see:

Now in the bottom-right corner I see a keyboard code (EN) which shows what I type will appear in English:

I type "say " into the command window (which appears as such), and then hit the keyboard modifier to switch to Russian (Alt+Left_Shift which is the default). Now the keyboard code changes:

Now what I type appears in Russian, which is what I want. However when I hit <Enter> to send it, the plugin switches the code page data into UTF-8, which is what arrives at the MUD.


- Nick Gammon

www.gammon.com.au, www.mushclient.com
[Go to top] top

Posted by Castamir   (2 posts)  [Biography] bio
Date Tue 03 Jun 2008 08:44 PM (UTC)  quote  ]
Message
Alas, pasting from other programs is still broken, and what's most important, so is typing things from the keyboard.

My knowledge of Windows is sketchy (I'm an Unix guy), but I would tackle the problem the following way:
* no 16-bit internal strings. That's a pain in the rear, and as you said, there's 6k strings you would have to change, not to mention all function calls and what not...
* the input box itself needs to be an Unicode window, though
* when taking data from it, you would call GetWindowTextW() (always!) then WideCharToMultiByte() to CP_UTF8 if mushclient is in UTF-8 mode and to CP_ACP otherwise.
* ... and the same with MultiByteToWideChar() and SetWindowTextW() the other way
[Go to top] top

Posted by Nick Gammon   Australia  (18,797 posts)  [Biography] bio   Forum Administrator
Date Tue 03 Jun 2008 04:02 AM (UTC)  quote  ]
Message
Version 4.26 has now been released, which should work properly with copying and pasting from the output window, if you install the above plugin.

- Nick Gammon

www.gammon.com.au, www.mushclient.com
[Go to top] top

Posted by Nick Gammon   Australia  (18,797 posts)  [Biography] bio   Forum Administrator
Date Tue 03 Jun 2008 01:36 AM (UTC)  quote  ]
Message
I should point out that this is, fairly obviously, a hybrid "work around" solution.

The main problem really is that MUSHclient is not Unicode-enabled, and despite my best efforts previously, it was hard to get Unicode out of the command window.

What you will have here is, if you set the code page appropriately (in International settings), one language you can use in the Command window (presumably your native language), however by enabling UTF-8 in the Output window, all languages can be displayed (with a suitable font).

The proposed "copy from output window" plugin would let you copy your *own* language from the output window, for editing and resending. Copying a different code set would result in gibberish still, because the Command window is not Unicode.

- Nick Gammon

www.gammon.com.au, www.mushclient.com
[Go to top] top

The dates and times for posts above are shown in Universal Co-ordinated Time (UTC).

To show them in your local time you can join the forum, and then set the 'time correction' field in your profile to the number of hours difference between your location and UTC time.


32,735 views.

This is page 2, subject is 5 pages long:  [Previous page]  1  2 3  4  5  [Next page]

[Reply to this subject]  Reply to this subject   [New subject]  Start a new subject   [Refresh] Refresh page

Go to topic:           Search the forum


[Go to top] top

[Home]

Written by Nick Gammon - 5K

Comments to: Gammon Software support
[RH click to get RSS URL] Forum RSS feed ( http://www.gammon.com.au/rss/forum.xml )

[Best viewed with any browser - 2K]    [Internet Contents Rating Association (ICRA) - 2K]    [Web site powered by FutureQuest.Net]