Notice: Any messages purporting to come from this site telling you that your password has expired, or that you need to verify your details, confirm your email, resolve issues, making threats, or asking for money, are
spam. We do not email users with any such messages. If you have lost your password you can obtain a new one by using the
password reset link.
Due to spam on this forum, all posts now need moderator approval.
Entire forum
➜ MUSHclient
➜ Python
➜ Unicode Encoding issues with world.WindowTextWidth
Unicode Encoding issues with world.WindowTextWidth
|
It is now over 60 days since the last post. This thread is closed.
Refresh page
Posted by
| Mr.lundmark
(51 posts) Bio
|
Date
| Thu 14 Oct 2010 06:41 PM (UTC) Amended on Thu 14 Oct 2010 06:44 PM (UTC) by Mr.lundmark
|
Message
| Hi.
Calling:
world.WindowTextWidth() with a string that's a with a circle above (this stupid forum won't even accept it, wtf?) will not work since that will return -3 because of bad utf-8 format.
Trying to do encode on it will give me this error instead:
UnicodeEncodeError: 'ascii' codec can't encode character u'\xe5' in position 58: ordinal not in range() | Top |
|
Posted by
| Nick Gammon
Australia (23,133 posts) Bio
Forum Administrator |
Date
| Reply #1 on Thu 14 Oct 2010 08:44 PM (UTC) Amended on Thu 14 Oct 2010 08:45 PM (UTC) by Nick Gammon
|
Message
| You can't put Unicode characters directly into strings, you have to UTF-8 encode them.
eg.
print (utils.tohex (utils.utf8encode (0xe5))) --> C3A5
t = utils.utf8decode (utils.fromhex ("C3A5"))
print (t [1]) --> 229 (which is 0xE5)
So to display the a-with-a-circle (sorry about the forum problem) this works:
win = "test_" .. GetPluginID ()
WindowCreate (win, 0, 0, 200, 200, miniwin.pos_center_all, 0, ColourNameToRGB("white")) -- create window
WindowShow (win, true) -- show it
WindowFont (win, "f", "Trebuchet MS", 14, true, false, false, false) -- define font
s = "Test<" .. utils.fromhex ("C3A5") .. ">"
width = WindowTextWidth (win, "f", s, true) -- width of text
print (width) --> 71 pixels
WindowText (win, "f",
s, -- text
5, 20, 0, 0, -- rectangle
ColourNameToRGB ("darkgreen"), -- colour
true) -- Unicode
|
- Nick Gammon
www.gammon.com.au, www.mushclient.com | Top |
|
Posted by
| Nick Gammon
Australia (23,133 posts) Bio
Forum Administrator |
Date
| Reply #2 on Thu 14 Oct 2010 10:11 PM (UTC) Amended on Fri 15 Oct 2010 06:26 AM (UTC) by Nick Gammon
|
Message
| Alternatively, if you only want to display characters in the range 0x00 to 0xFF don't use Unicode mode. This works for your test character:
win = "test_" .. GetPluginID ()
WindowCreate (win, 0, 0, 200, 200, miniwin.pos_center_all, 0, ColourNameToRGB("white")) -- create window
WindowShow (win, true) -- show it
WindowFont (win, "f", "Trebuchet MS", 14, true, false, false, false) -- define font
s = "Test<" .. string.char (0xe5) .. ">"
width = WindowTextWidth (win, "f", s, false) -- width of text
print (width) --> 71 pixels
WindowText (win, "f",
s, -- text
5, 20, 0, 0, -- rectangle
ColourNameToRGB ("darkgreen"), -- colour
false) -- not Unicode
It's only for characters with a code point of 0x100 (256) upwards you need to use UTF-8 encoding. |
- Nick Gammon
www.gammon.com.au, www.mushclient.com | Top |
|
Posted by
| Mr.lundmark
(51 posts) Bio
|
Date
| Reply #3 on Fri 15 Oct 2010 11:47 AM (UTC) |
Message
| Ah great. My issue though is that I'm matching text from the world so actually I get the unicode-string from a trigger. Is there anyway to decode it properly so that the WindowTextWidth-method can calculate it? Currently I have to do a replace on all the letters that I know cause issues, which is both time consuming and ugly.
Thanks for the fast answers! | Top |
|
Posted by
| Nick Gammon
Australia (23,133 posts) Bio
Forum Administrator |
Date
| Reply #4 on Fri 15 Oct 2010 09:22 PM (UTC) |
Message
| When you say "Unicode string" do you mean the MUD is sending UTF-8? Or just characters in the range 0x80 to 0xFF? They aren't really Unicode because Unicode and ASCII only share the values 0x00 to 0x7F. After that it has to be UTF-8 encoded (or sent as two bytes per character, or some other method).
But it they are just sending stuff like 0xE5 for the a-with-a-circle-on-top character, just tell WindowTextWidth that it isn't Unicode. The Unicode argument really means "is it UTF-8?". |
- Nick Gammon
www.gammon.com.au, www.mushclient.com | Top |
|
Posted by
| Mr.lundmark
(51 posts) Bio
|
Date
| Reply #5 on Sat 16 Oct 2010 07:39 AM (UTC) |
Message
| Yeah it's UTF-8. When I try to send it to WindowTextWidth with the unicode argument to false, it says that it can't decode those. I think that the string received from a trigger is actually in a python-unicode format? (u"blah" instead of "blah") I can't do an utf-8 decode on that because it complains about the 0x80 to 0xff range. | Top |
|
Posted by
| Worstje
Netherlands (899 posts) Bio
|
Date
| Reply #6 on Sat 16 Oct 2010 02:32 PM (UTC) |
Message
| An u"Something" string is not in any specific encoding. It merely represents unicode codepoints. If you want it as UTF8, which is a way to represent unicode codepoints, you'll want to .encode to UTF-8. Not decode, as that applies to a normal "" string which is in a certain encoding. | Top |
|
Posted by
| Mr.lundmark
(51 posts) Bio
|
Date
| Reply #7 on Sat 16 Oct 2010 02:57 PM (UTC) |
Message
| That works perfectly Worstje, thanks! | Top |
|
The dates and times for posts above are shown in Universal Co-ordinated Time (UTC).
To show them in your local time you can join the forum, and then set the 'time correction' field in your profile to the number of hours difference between your location and UTC time.
26,379 views.
It is now over 60 days since the last post. This thread is closed.
Refresh page
top