Register forum user name Search FAQ

Gammon Forum

Notice: Any messages purporting to come from this site telling you that your password has expired, or that you need to verify your details, confirm your email, resolve issues, making threats, or asking for money, are spam. We do not email users with any such messages. If you have lost your password you can obtain a new one by using the password reset link.
 Entire forum ➜ MUSHclient ➜ Development ➜ UTF-8 & ANSI positioning code

UTF-8 & ANSI positioning code

It is now over 60 days since the last post. This thread is closed.     Refresh page


Pages: 1 2  

Posted by FallenTree   (14 posts)  Bio
Date Wed 13 Feb 2013 03:29 AM (UTC)
Message
Hi, Thanks for making mushclient open source!

I'm current working on a chinese lpmud mudlib that outputs UTF-8, which is working fine after select UTF-8 in output setting, but when I want to type something in UTF-8 , it only send garbagled input. (type english is fine)

also.. text selection / text wrapping sometime result in "half characters" in utf-8 mode too. (this happens even without utf-8, since gbk is 2 ascii character wide, selection apparently allows to select half character. same applies to wrapping)

Also, is there plan to support ANSI positioning codes? like clear screen, set position etc.

MUSHClient is the default choice for many chinese mud player now, thanks again for making it available.

Cheers.
Top

Posted by Nick Gammon   Australia  (23,102 posts)  Bio   Forum Administrator
Date Reply #1 on Wed 13 Feb 2013 06:10 AM (UTC)
Message
Quote:

I'm current working on a chinese lpmud mudlib that outputs UTF-8, which is working fine after select UTF-8 in output setting, but when I want to type something in UTF-8 , it only send garbagled input. (type english is fine)


I thought (hoped?) that was working, but it's hard for me to test. I thought someone had that working, but I'm not so sure now. It's not a Unicode application, and the UTF8 in the output window is done "manually" if that flag is on.

Quote:

also.. text selection / text wrapping sometime result in "half characters" in utf-8 mode too. (this happens even without utf-8, since gbk is 2 ascii character wide, selection apparently allows to select half character. same applies to wrapping)


That shouldn't happen because it only wraps on a space, and UTF-8 has the high-order bit set for non-ASCII characters. Do you have an example? Screenshot and packet debug would be helpful.

Quote:

Also, is there plan to support ANSI positioning codes? like clear screen, set position etc.


No, that has been discussed many times. It isn't practical because of the way triggers, logging, etc. work.

Quote:

MUSHClient is the default choice for many chinese mud player now, thanks again for making it available.


I'm glad to hear that. How is it working though if you can't type in Chinese?

Have you done the localization? See this post:

http://www.gammon.com.au/forum/?id=7953

You should be able to get the menus, dialogs, etc. into Chinese. Also the internal messages used by the client. That post explains it all.



- Nick Gammon

www.gammon.com.au, www.mushclient.com
Top

Posted by Nick Gammon   Australia  (23,102 posts)  Bio   Forum Administrator
Date Reply #2 on Wed 13 Feb 2013 07:32 AM (UTC)

Amended on Wed 13 Feb 2013 07:41 AM (UTC) by Nick Gammon

Message
This post might help:

http://www.gammon.com.au/forum/bbshowpost.php?id=7696&page=1

Quote:

also.. text selection / text wrapping sometime result in "half characters" in utf-8 mode too. (this happens even without utf-8, since gbk is 2 ascii character wide, selection apparently allows to select half character. same applies to wrapping)


I can't reproduce that. For example ...

Press Shift+Ctrl+F12 to enter "Debug simulated world input".

Paste this:



\E4\BD\A0\E5\A5\BD\E4\BD\A0\E5\A5\BD\E4\BD\A0\E5\A5\BD\E4\BD\A0\E5\A5\BD\E4\BD\A0\E5\A5\BD\E4\BD\A0\E5\A5\BD\E4\BD\A0\E5\A5\BD\E4\BD\A0\E5\A5\BD\E4\BD\A0\E5\A5\BD\E4\BD\A0\E5\A5\BD\E4\BD\A0\E5\A5\BD\E4\BD\A0\E5\A5\BD\E4\BD\A0\E5\A5\BD\E4\BD\A0\E5\A5\BD\E4\BD\A0\E5\A5\BD\E4\BD\A0\E5\A5\BD\E4\BD\A0\E5\A5\BD\E4\BD\A0\E5\A5\BD\E4\BD\A0\E5\A5\BD\E4\BD\A0\E5\A5\BD\E4\BD\A0\E5\A5\BD\E4\BD\A0\E5\A5\BD



You should see this:



Hit OK.

With the output window width set to 20, I see this:



Exactly 20 Chinese characters, as expected, and wrapped at the end of a character, not in the middle of one.

Template:version Please help us by advising the version of MUSHclient you are using. Use the Help menu -> About MUSHclient.

- Nick Gammon

www.gammon.com.au, www.mushclient.com
Top

Posted by FallenTree   (14 posts)  Bio
Date Reply #3 on Wed 13 Feb 2013 09:16 AM (UTC)

Amended on Wed 13 Feb 2013 09:31 AM (UTC) by FallenTree

Message
I'm using 4.84

1) UTF-8 input issue

Is "use UTF-8" setting in Output settings have any effect on input box? If I run without UTF-8 selected (all input/output is in GBK encoding, which is 2 bytes wide), things works fine. both input and output display correctly.

see difference in 1 & 2

I suspect issue is, when UTF-8 is enabled, input is still being send out as default encoding, which result in not displaying back correctly. just send them in UTF-8 would be good.

2)selection & wrapping

see 3& 4 for demonstration of selection issue.
i think wrapping problem is, when a long series of text is being received, and i input a single command "say a" in the middle, system will try to display this interleveed with the output text stream, and it doesn't handle 2 byte character correctly, so kind of have 1/2 chance of displaying that in a middle of character.

3) why not make mush-client unicode compatible? it of course means you need to use _L("") etc in the code, but should not be too hard :-D

please see pictures here
https://github.com/sunyc/test
Top

Posted by Nick Gammon   Australia  (23,102 posts)  Bio   Forum Administrator
Date Reply #4 on Wed 13 Feb 2013 09:25 AM (UTC)
Message
No it should not. The UTF8 setting just affects how incoming text from the MUD is perceived.

- Nick Gammon

www.gammon.com.au, www.mushclient.com
Top

Posted by Nick Gammon   Australia  (23,102 posts)  Bio   Forum Administrator
Date Reply #5 on Wed 13 Feb 2013 09:25 AM (UTC)
Message
What is GBK encoding?

- Nick Gammon

www.gammon.com.au, www.mushclient.com
Top

Posted by FallenTree   (14 posts)  Bio
Date Reply #6 on Wed 13 Feb 2013 09:34 AM (UTC)
Message
i've edited my reply above, please recheck it. I thought the save function is just a draft... sorry about that.

GBK encoding you can see here http://en.wikipedia.org/wiki/GBK

it's basically designed as 2 byte ASCII
Top

Posted by Nick Gammon   Australia  (23,102 posts)  Bio   Forum Administrator
Date Reply #7 on Wed 13 Feb 2013 09:50 PM (UTC)
Message
Quote:

3) why not make mush-client unicode compatible? it of course means you need to use _L("") etc in the code, but should not be too hard :-D


I tried to do that once. It's incredibly hard. It is fine if you start off that way, but adding later, you have to change many, many things.

I think we just need to get GBK working. What I don't understand is how you are getting that far. I can't even find how to input in GBK on my Windows XP box.

As far as I can tell, it is nothing like UTF8, so that accounts for why the lines are breaking the wrong way. We could probably make a "GBK" option to handle the wrapping.

I don't understand how it is displaying at all. With UTF-8 enabled, wouldn't GBK come out wrongly?

I need a few more details:


  • What font are you using?
  • What are your regional settings in Windows?
  • Is UTF8 option on or off in the output window?
  • How are you even entering Chinese at the keyboard? When I try it gets converted to question marks.


- Nick Gammon

www.gammon.com.au, www.mushclient.com
Top

Posted by FallenTree   (14 posts)  Bio
Date Reply #8 on Wed 13 Feb 2013 10:03 PM (UTC)

Amended on Wed 13 Feb 2013 10:04 PM (UTC) by FallenTree

Message

I should have been more clear:

1) GBK characters in utf-8 off mode, works. input/output works fine, except for occasionally selection/wrapping issue. see my picture 2.png to see demo.

2) UTF-8 chinese characters in UTF-8 on mode, output works fine. input doesn't work. see my picture 1.png to see input issue. and I hope we can solve that.

To answer your other questions:

Quote:

I think we just need to get GBK working. What I don't understand is how you are getting that far. I can't even find how to input in GBK on my Windows XP box.

I don't understand how it is displaying at all. With UTF-8 enabled, wouldn't GBK come out wrongly?


GBK (with output UTF-8 un-checked), is working. Just selection/wrapping sometimes broke in middle of character.

Quote:

*What font are you using?
*What are your regional settings in Windows?
*Is UTF8 option on or off in the output window?
*How are you even entering Chinese at the keyboard? When I try it gets converted to question marks.


1) SimSun works, actually FixedSys works too, as long as you've installed Asian package by ticking that checkbox in your "control panel->language " and you can view chinese webpages, it should have SimSun. There's also SimHei.

2) regional settings to US time/currency format, default locale for non-unicode application is set to PRC, China.

3) UTF-8 is off for GBK mode.

4) best way will be copying from chinese websites, like this one http://news.163.com/13/0213/07/8NJ0CTL90001121M.html , just copy some characters from it and type it in.



Top

Posted by Nick Gammon   Australia  (23,102 posts)  Bio   Forum Administrator
Date Reply #9 on Wed 13 Feb 2013 11:35 PM (UTC)
Message
FallenTree said:


2) UTF-8 chinese characters in UTF-8 on mode, output works fine. input doesn't work. see my picture 1.png to see input issue. and I hope we can solve that.


Does the server have a UTF8 mode? How does it know whether to send GBK or UTF8 characters?

- Nick Gammon

www.gammon.com.au, www.mushclient.com
Top

Posted by Nick Gammon   Australia  (23,102 posts)  Bio   Forum Administrator
Date Reply #10 on Wed 13 Feb 2013 11:48 PM (UTC)
Message
I propose as a possible solution an alteration to the wrapping so that it wraps earlier if it finds characters with the high-order bit set (and no space to wrap on).

I think it would have to work forwards, and if a byte was found with the high-order bit set, skip the next byte, and proceed. The last such byte pair could be considered the wrap point.

Do you find that with UTF8 off, you have to double the screen width? If it takes two bytes per character, you would need to set the width to 160 to see 80 characters.

- Nick Gammon

www.gammon.com.au, www.mushclient.com
Top

Posted by FallenTree   (14 posts)  Bio
Date Reply #11 on Thu 14 Feb 2013 12:33 AM (UTC)
Message
This are from two different server. Servers actually have no knowledge of the output, it just outputs what's written in mudlib. If stuff is written in UTF-8 , it output in UTF-8. otherwise it just output whatever encoding it got.

Your suggestion can work, but only for GBK 2 bytes cases (since it's always one high, one low), it will not work in UTF-8 mode, since UTF-8 can be 2bytes, 3bytes or 4bytes wide. (doesn't always end on low either).

Without making it a unicode application (which I thought is just a matter of using TCHAR & _TEXT() etc) I think any attempt is going to be difficult. I will take a look at the source code later today.

I think the selection/wrapping issue is minor, can you make the change for input box to send UTF-8 encoded string , when output is in utf-8 mode?
Top

Posted by Nick Gammon   Australia  (23,102 posts)  Bio   Forum Administrator
Date Reply #12 on Thu 14 Feb 2013 12:39 AM (UTC)
Message
Quote:

... it will not work in UTF-8 mode ...


I would not do it in UTF8 mode.


Tell me this ... how does the server react to what you type?

Say I type "go east" in Chinese (往东走)? Does that match on the GBK version, the UTF-8 version, or what?


- Nick Gammon

www.gammon.com.au, www.mushclient.com
Top

Posted by Nick Gammon   Australia  (23,102 posts)  Bio   Forum Administrator
Date Reply #13 on Thu 14 Feb 2013 12:42 AM (UTC)
Message
Quote:

Without making it a unicode application (which I thought is just a matter of using TCHAR & _TEXT() etc) I think any attempt is going to be difficult.


Yes, but there are many, many functions that take char* or const char* arguments. They would all need to be changed. Then the functions that work on the data (eg. strcpy) need to be changed to the wide versions.

Then you have things like the world files, which are currently read in a byte at a time. Believe me, I got enthusiastic once, and tried it. About a day later, I just reverted everything. Every change I made caused another two changes to be required. And those changes caused more changes, and so on.

- Nick Gammon

www.gammon.com.au, www.mushclient.com
Top

Posted by FallenTree   (14 posts)  Bio
Date Reply #14 on Thu 14 Feb 2013 06:34 AM (UTC)
Message
Nick Gammon said:

Quote:

... it will not work in UTF-8 mode ...

Say I type "go east" in Chinese (往东走)? Does that match on the GBK version, the UTF-8 version, or what?


The server is operating in byte mode. what it does it basically takes whatever your input is, and send to the mudlib function and mudlib function compare byte-to-byte of what it should do, if you say "go east", it parse it as "go" and "east" , split by " " and find "go" command , pass "east" to it.

If you type "go 东" , it got "go" and "东"(in whatever encoding it might be, GBK or UTF-8), and send to go command.

The problem here, is that on mushclient when output is set to UTF-8, the input is still raw bytes. when you type "东”, mushclient got input "<byte1><byte2>" and try to draw it on the screen, but it then findout, screen is in utf-8 mode, so <byte1><byte2> is decoded-as utf-8, thus doesn't draw correctly.

what it should have done, is, when output is in UTF-8 mode, and mushclient got input "<byte1><byte2>" it should encode it as UTF-8 "\Uxxxx\Uxxx\Uxxx\Uxxx" and then send it out, and draw it to screen.

Let me know if i havn't made it clear.
Top

The dates and times for posts above are shown in Universal Co-ordinated Time (UTC).

To show them in your local time you can join the forum, and then set the 'time correction' field in your profile to the number of hours difference between your location and UTC time.


84,329 views.

This is page 1, subject is 2 pages long: 1 2  [Next page]

It is now over 60 days since the last post. This thread is closed.     Refresh page

Go to topic:           Search the forum


[Go to top] top

Information and images on this site are licensed under the Creative Commons Attribution 3.0 Australia License unless stated otherwise.