[Home] [Downloads] [Search] [Help/forum]


Register forum user name Search FAQ

Gammon Forum

[Folder]  Entire forum
-> [Folder]  MUSHclient
. -> [Folder]  International
. . -> [Subject]  Full Unicode support

Full Unicode support

It is now over 60 days since the last post. This thread is closed.     [Refresh] Refresh page


Pages: 1  2  3 4  5  

Posted by Mandor   (4 posts)  [Biography] bio
Date Reply #30 on Wed 22 Sep 2004 12:18 PM (UTC)
Message
Hello everyone.

I am not even a newbie in Unicode support knowledge, so I ask for your patience while reading this message.

I have a mush game running. It is coded in my native language that uses diacritic signs from ISO-8859-02 (latin2) charset. The game is very automated; it is coded in storyteller's style, the player involves in different plots by just visiting different places. We never really cared for the input of special characters, but we wanted all possible descriptions to appear to a user in most accurate form with correct syntax and grammar, etc. We required our users to use a program called puTTY (no advertisement intended). It has an option that allows to translate character set ("Character set translation on received data"). We can set there a charset for option called "Received data assumed to be in which character set:". Thus, all descriptions from the game appear with our native diacritic characters. However, that program is not a very good solution for gaming purposes. Nevertheless, it was the only one we could find which worked for us in a way we wanted it.

You may ask how diacritic signs appeared in game's database. Well, everything is coded without special characters, since no client supports the input in this way. We just opened the database file in a simple editor which highligthed words with errors and we made needed corrections. It may sound as a lot of work, but frankly it is not. Everyone can try it by himself/herself.

My questions is, how much it differs from the Unicode support you want to implement? Can it happen the MUSHclient will have a similar function before having full Unicode support?

Thank you for your time.


--
Mandor
[Go to top] top

Posted by Nick Gammon   Australia  (22,973 posts)  [Biography] bio   Forum Administrator
Date Reply #31 on Wed 22 Sep 2004 10:27 PM (UTC)
Message
I don't mind your mentioning puTTY - I use it myself for connecting to other hosts. It is very good, but not a good MUD client.

I'm not sure what the option you refer to does exactly. I don't have access to my copy of puTTY right now. What is the problem with just using the correct font?

At present MUSHclient supports UTF-8 encoding for Unicode, which is described earlier in this thread. You should be able to adapt your server to output the correct UTF-8 codes without too much trouble.

- Nick Gammon

www.gammon.com.au, www.mushclient.com
[Go to top] top

Posted by Mandor   (4 posts)  [Biography] bio
Date Reply #32 on Wed 22 Sep 2004 10:56 PM (UTC)

Amended on Wed 22 Sep 2004 10:57 PM (UTC) by Mandor

Message
As far as I noticed, in MUSHclient every character is displayed as being in latin1 charset.

I set the font for the output window (Courier New). I set it to display Central European characters. However, I still see latin1 (not latin2) characters in output window. I do enabled Unicode support in Settings.

It's a middle of the night for me. I will check everything in the morning and explain in detail what I have done and what the result was.

Thank you for your answer.


--
Mandor
[Go to top] top

Posted by Mandor   (4 posts)  [Biography] bio
Date Reply #33 on Wed 29 Sep 2004 12:37 PM (UTC)
Message
I must have been doing something wrong. Everything I set was in World properties -> Appearance -> Output. I set a font with fixed width and chose Central European charset for it. I matched UTF-8 (Unicode) option. When I restart the world, all my native characters were not shown. They were omitted in the output.

Apologies for the delay in reply, I was off-world.


--
Mandor
[Go to top] top

Posted by Nick Gammon   Australia  (22,973 posts)  [Biography] bio   Forum Administrator
Date Reply #34 on Wed 29 Sep 2004 11:53 PM (UTC)
Message
I think you are mixing two methods here.

There are two ways of displaying non-standard (from my point of view) characters, ie. non-Latin ones.

The first way is to simply use a different code set, so that values like 0x80 to 0xFF (in hex) map to special characters for your language. This it not Unicode, and you should not check the UTF-8 box in this case. However you should choose an appropriate font that shows the correct characters that match what the MUD is sending.

The second way is to use UTF-8 encoding, which (as described earlier in this thread) maps every character in various languages to different Unicode characters. This is done by multi-byte sequences, so a single character on the screen may be sent from the MUD as two or three consecutive bytes, all with the high-order bit set. In this case you should check the UTF-8 box, and choose a suitable font that supports Unicode (eg. Lucida Sans Unicode).

However that will only work if the MUD itself sends out UTF-8 character sequences. Sounds like it doesn't.

- Nick Gammon

www.gammon.com.au, www.mushclient.com
[Go to top] top

Posted by Mandor   (4 posts)  [Biography] bio
Date Reply #35 on Thu 30 Sep 2004 12:43 PM (UTC)
Message
You are right. As I said before, I do not have required knowledge here.

I tried doing it using the first way you described. I have received national characters in the output window. However, they are shown in CP 1250, and not in ISO-8859-2 (it shows character #165 instead of #161, etc.). Is there any way to change it? Or would I need to change everything one more time in mush database? I do not like that idea...



--
Mandor
[Go to top] top

Posted by Nick Gammon   Australia  (22,973 posts)  [Biography] bio   Forum Administrator
Date Reply #36 on Tue 05 Oct 2004 02:10 AM (UTC)
Message
I'm not sure what the problem here is exactly, but it might need changes to the way MUSHclient selects fonts. I have added that as suggestion #531.

- Nick Gammon

www.gammon.com.au, www.mushclient.com
[Go to top] top

Posted by Cybertiger   (1 post)  [Biography] bio
Date Reply #37 on Mon 11 Sep 2006 04:41 AM (UTC)
Message
I've probably missed the boat with answering this question, but I think any character in windows can be accessed via holding down the alt key, and typing a number on the number pad.

I'm using linux right now, but I know that þ can be done in windows buy holding down alt and typing "0222" (though I don't know exactly what number character þ is).

-CT
[Go to top] top

Posted by Nick Gammon   Australia  (22,973 posts)  [Biography] bio   Forum Administrator
Date Reply #38 on Mon 11 Sep 2006 07:47 AM (UTC)
Message
You can generate characters in the range 0 to 255 in that way, certainly.

But that isn't really Unicode, it is just another way of getting non-typeable characters to appear.

You can't go higher than that (eg. to get Chinese characters) because they are stored internally in a single byte.

- Nick Gammon

www.gammon.com.au, www.mushclient.com
[Go to top] top

Posted by Shadowfyr   USA  (1,786 posts)  [Biography] bio
Date Reply #39 on Mon 11 Sep 2006 07:22 PM (UTC)
Message
This is one of the most frustrating things about various cheats to get unicode (or just characters). There is no certainty that #165 *is* the same character in font A and in font B, so you might end up thinking it is showing #161, simply because its mapped differently.

But, to get a bit technical about the real bloody mess this makes of things, Microsoft, when confronted with the need to use their OEM (the font used by DOS and built into all graphics cards for the PC, rather than making a True Type for it when in there console, cheated and used a trick like this:

if FONT = "System" then
  display char
else
  select case char
    case <blah>
      display box_drawing_x25
    ...
  end select
end if

Or something roughly like that. I suspect ones like puTTY work the same way, and translate the "normal" characters into there local versions, by substitution of them. The problem with both UTF-8 and MS' goofy solution is that it only works if "either" the sender and the reciever know what is going on and both methods prevent you from accessing parts of the unicode. There is a lot of stuff in the code spaces for, for example, x01NN, x02NN, x03NN, etc., where x0n is the section to display and NN is the character in that section. UTF-8 can't get at those, simply because it relies of values of x8000-xFFFF to designate a "unicode" character, and anything in x00-x7F as a "normal" letter. The letter N in true unicode is x004E, because something like 4E4F5051 would be *not* be NOPQ, but something totally different. Specifically, 4E is the CJK Unified Ideograph section, while 50 is in a no mans land, where there is no alternate set (at least that my viewer shows).

BTW, check out http://www.jhlabs.com/java/unicodeviewer.html for a good look at how this mess is arranged in just one common font.

---

Now for the "good" news. It is probably possible to use the OnPacketRecieved functions to intercept incoming text, then perform substitutions on it. There are examples of this around, though I am not sure what the links are. But even with UTF-8 turned on, there "may" be cases where the character you need is some place in the no-mans land of stuff that UTF-8 won't let you get to.

This, http://www.lischke-online.de/FontViewer.php should help anyone that wants to try to figure out a) which fonts will even work, and b) what to map the changes to in order to actually get those characters.
[Go to top] top

Posted by Hex1   Australia  (8 posts)  [Biography] bio
Date Reply #40 on Thu 14 Dec 2006 09:10 AM (UTC)

Amended on Fri 15 Dec 2006 04:55 AM (UTC) by Hex1

Message
So, what's the current status of this whole Unicode thing as of right now? Fully implemented? Partially?

When I type Unicode chars it shows question marks usually, but sometimes a different character all together (when I type '&#345;' it shows as 'ø'). When I try to copy and paste '&#345;' it shows as 'r(' Is the input field compatible with unicode?

PS: My friend reported that "&#345;š&#269;&#271;&#357;&#328;äë߀&#12362;&#12420;&#12377;&#12415;&#12394;&#12373;&#12356;" showed correcly when sent from PuTTY, but on trying to copy and paste it and send it back to me it showed as question marks. So I'm assuming the input box is plain-ASCII only?
[Go to top] top

Posted by Nick Gammon   Australia  (22,973 posts)  [Biography] bio   Forum Administrator
Date Reply #41 on Thu 14 Dec 2006 07:41 PM (UTC)
Message
I think the answer is "partially implemented", because there is some support in the output window for it. This is a checkbox (in the output window configuration) that will convert UTF-8 characters received from the MUD into Unicode, and then display as Unicode.

However, after extensive investigation that is documented earlier in this thread, I think, I cannot see an easy way of making the input (command) window handle Unicode in any form.

This is because when the client was first written it was made as an 8-bit application (straight ASCII), not Unicode. I attempted a while back to convert to Unicode, but it was a very hard job. There are thousands of strings internally that would need conversion, plus the issue of integrating with certain operating system calls (eg. that load/save files, or receive data from the Internet) which may need conversion to/from Unicode to 8-bit.

The conversion of the output window to optional Unicode display was easy enough, because the output window is "hand-coded" by me, and I know what it is doing. The command window on the other hand is a simple "edit control", and the details of its implementation are hidden away in the operating system.

The frustrating thing from a testing point of view is that it is hard for me to say how the client would behave if it was running under (say) a Chinese version of Windows. I am reluctant to install such a thing because, for one thing, I wouldn't be able to read anything on the screen, and thus would be uncertain whether it was working properly or not.

I think some people have reported success with MUSHclient under various other languages, but I might be wrong.

- Nick Gammon

www.gammon.com.au, www.mushclient.com
[Go to top] top

Posted by Atltais   (8 posts)  [Biography] bio
Date Reply #42 on Mon 02 Jun 2008 05:28 PM (UTC)
Message
Sorry for the necropost, but I'd really like UTF8 input, so I'm bumping this. :)

In any case, if you want to test MUSHClient's behaviour in a non-UTF8 language, I suggest installing Microsoft's Applocale (from http://www.microsoft.com/globaldev/tools/apploc.mspx) or by manually changing your locale in regional and language settings.

However, regardless, if you use UTF8 encoded characters in Windows, it -should- act the same no matter what your locale is set to. If you want to test using a non-IME language such as Russian, you can pull up the onscreen keyboard to make sure that the key you press on your keyboard corresponds with the text that appears. With an IME language, however, you're stuck with either looking at the character before it's sent to the program, or using a lookup table.

Some fonts in Windows of course don't have the full gamut of characters, but I believe MS Gothic does. (At the least, I know it has Japanese and Russian)
[Go to top] top

Posted by Nick Gammon   Australia  (22,973 posts)  [Biography] bio   Forum Administrator
Date Reply #43 on Mon 02 Jun 2008 11:23 PM (UTC)

Amended on Mon 02 Jun 2008 11:24 PM (UTC) by Nick Gammon

Message

Well, after a bit of experimentation I installed the Russian keyboard, which didn't seem to make much difference (perhaps because I don't have one).

Then I switched to the Russian code page (I think), which gave these results:

I couldn't type "say" so I put auto-say on and typed "abcd". Now it seemed to send that to the MUD, because it replied appropriately. However I don't think this is UTF-8. In fact I am sure of it, because the packet debug shows:

That is one character each for "abcd" namely F4 E8 F1 E2 - which I presume is the code page for Russian.


- Nick Gammon

www.gammon.com.au, www.mushclient.com
[Go to top] top

Posted by Nick Gammon   Australia  (22,973 posts)  [Biography] bio   Forum Administrator
Date Reply #44 on Mon 02 Jun 2008 11:25 PM (UTC)
Message
My first question to people who use another language is, how do you type stuff like "say"?

I couldn't because if I typed "say hello" while the code page was Russian it just sent a whole lot of Russian characters, none of which was the word "say".

- Nick Gammon

www.gammon.com.au, www.mushclient.com
[Go to top] top

The dates and times for posts above are shown in Universal Co-ordinated Time (UTC).

To show them in your local time you can join the forum, and then set the 'time correction' field in your profile to the number of hours difference between your location and UTC time.


199,593 views.

This is page 3, subject is 5 pages long:  [Previous page]  1  2  3 4  5  [Next page]

It is now over 60 days since the last post. This thread is closed.     [Refresh] Refresh page

Go to topic:           Search the forum


[Go to top] top

Quick links: MUSHclient. MUSHclient help. Forum shortcuts. Posting templates. Lua modules. Lua documentation.

Information and images on this site are licensed under the Creative Commons Attribution 3.0 Australia License unless stated otherwise.

[Home]


Written by Nick Gammon - 5K   profile for Nick Gammon on Stack Exchange, a network of free, community-driven Q&A sites   Marriage equality

Comments to: Gammon Software support
[RH click to get RSS URL] Forum RSS feed ( https://gammon.com.au/rss/forum.xml )

[Best viewed with any browser - 2K]    [Hosted at HostDash]