[Home] [Downloads] [Search] [Help/forum]


Register forum user name Search FAQ

Gammon Forum

[Folder]  Entire forum
-> [Folder]  MUSHclient
. -> [Folder]  International
. . -> [Subject]  Full Unicode support

Full Unicode support

It is now over 60 days since the last post. This thread is closed.     [Refresh] Refresh page


Pages: 1  2  3  4  5 

Posted by Nick Gammon   Australia  (22,973 posts)  [Biography] bio   Forum Administrator
Date Reply #60 on Mon 09 Jun 2008 12:20 AM (UTC)
Message
Yes I tried that, and the problem is not easily solved. For example, some things like the PCRE regexp-matcher don't use Unicode, they use 8-bit strings. It accepts UTF-8, but that means you need to convert back and forwards from wide strings to UTF-8. And then there are the MUDs, most of which send 8-bit text, not UTF-8 nor wide strings.

- Nick Gammon

www.gammon.com.au, www.mushclient.com
[Go to top] top

Posted by Atltais   (8 posts)  [Biography] bio
Date Reply #61 on Mon 09 Jun 2008 12:47 AM (UTC)

Amended on Mon 09 Jun 2008 12:56 AM (UTC) by Atltais

Message
Isn't Unicode (e: Well, UTF8 that is, causing additional fun/grief because Windows uses UTF16, which is a bit different.) more or less backwards compliant with ASCII characters below 0x7F anyways?

Granted, you would have to go from UTF16 to UTF8 for regexp, true enough.

But, for most MUDs, it shouldn't be a problem if they don't use characters over 0x7F, but if they do (if it's a non-unicode, non-english MUD), you end up with a bit of a problem. Which, I suppose, is one reason why other clients aren't unicode.
[Go to top] top

Posted by Nick Gammon   Australia  (22,973 posts)  [Biography] bio   Forum Administrator
Date Reply #62 on Mon 09 Jun 2008 06:33 AM (UTC)
Message
Unicode isn't one single thing for a start. Just check out www.unicode.org to see what I mean. Basically the idea is to represent various characters (glyphs) in a consistent way by assigning a different number to each one. But how that number is stored can vary somewhat. UTF-8 uses an encoding system that is indeed identical to non-Unicode for characters <= 0x7F, however once you move to higher values you have heaps of options. Do you want 16-bit characters? 32-bits? Which orders are the bytes? Big-endian or little-endian?

Under the Windows compiler, enabling the UNICODE define switches the representation of characters from char (8 bit) to long (16 bit). Straight away this won't work for Unicode characters > 0xFFFF. Also you can't just copy stuff from the MUD (8 bit characters) into the internal spaces (16-bit characters) without using a special call.

It's a can of worms, one I don't propose to open in the near future.

- Nick Gammon

www.gammon.com.au, www.mushclient.com
[Go to top] top

Posted by Atltais   (8 posts)  [Biography] bio
Date Reply #63 on Mon 09 Jun 2008 07:10 AM (UTC)

Amended on Mon 09 Jun 2008 10:09 PM (UTC) by Atltais

Message
It's a whole range, yes, and UTF8 is a superset of ASCII, which, as I understand, is one of its biggest advantages. The larger problem, I suppose, is that (as perviously stated) Windows uses UTF16 internally (source: http://msdn.microsoft.com/en-us/library/ms776459(VS.85).aspx), which complicates matters somewhat. (additionally, you get into the endianness issue) With UTF8 you get 'pretty much' any character in regular use. (the entirety of the BMP, past this most fonts don't even have representations anyways, but that's getting wildly off topic. e:Plus, to my understanding, UTF8 supports up to U+10FFFF anyways.)

All in all, I suppose it's a relatively minor issue (since those honestly needing client-side UTF8 support can't be all that numerous) and development time may be better spent elsewhere.

edit: That is to say, endianness doesn't matter in UTF8 as it does in UTF16/32, since UTF8 is byte oriented. One thing to note though, in both UTF8 [i]and[/i] UTF16, is that characters aren't fixed width. (as in size) Therefore, UTF16 can handle codes above U+FFFF (and indeed, so can UTF8)

UTF8 is as widely supported as it is simply because it's (more or less) backwards compatible with ASCII right out of the box, so it can take a standard ASCII string (if the characters are all <=0x7F, that is) and be happy with it. In any case, it's quite an undertaking to convert a program as big as MUSHclient into a 100% UTF8 program.
[Go to top] top

Posted by Fiendish   USA  (2,514 posts)  [Biography] bio   Global Moderator
Date Reply #64 on Sun 05 Jun 2011 06:10 PM (UTC)
Message
the first plugin shown on http://www.gammon.com.au/forum/bbshowpost.php?id=2681&page=4 currently has two entries for date_written, which will cause the plugin to fail to load

https://github.com/fiendish/aardwolfclientpackage
[Go to top] top

Posted by Nick Gammon   Australia  (22,973 posts)  [Biography] bio   Forum Administrator
Date Reply #65 on Sun 05 Jun 2011 09:40 PM (UTC)
Message
Changed to date_modified, thanks.

- Nick Gammon

www.gammon.com.au, www.mushclient.com
[Go to top] top

The dates and times for posts above are shown in Universal Co-ordinated Time (UTC).

To show them in your local time you can join the forum, and then set the 'time correction' field in your profile to the number of hours difference between your location and UTC time.


199,596 views.

This is page 5, subject is 5 pages long:  [Previous page]  1  2  3  4  5 

It is now over 60 days since the last post. This thread is closed.     [Refresh] Refresh page

Go to topic:           Search the forum


[Go to top] top

Quick links: MUSHclient. MUSHclient help. Forum shortcuts. Posting templates. Lua modules. Lua documentation.

Information and images on this site are licensed under the Creative Commons Attribution 3.0 Australia License unless stated otherwise.

[Home]


Written by Nick Gammon - 5K   profile for Nick Gammon on Stack Exchange, a network of free, community-driven Q&A sites   Marriage equality

Comments to: Gammon Software support
[RH click to get RSS URL] Forum RSS feed ( https://gammon.com.au/rss/forum.xml )

[Best viewed with any browser - 2K]    [Hosted at HostDash]