Notice: Any messages purporting to come from this site telling you that your password has expired, or that you need to verify your details, confirm your email, resolve issues, making threats, or asking for money, are
spam. We do not email users with any such messages. If you have lost your password you can obtain a new one by using the
password reset link.
Due to spam on this forum, all posts now need moderator approval.
Entire forum
➜ MUSHclient
➜ International
➜ Full Unicode support
It is now over 60 days since the last post. This thread is closed.
Refresh page
Pages: 1
2
3
4 5
Posted by
| Atltais
(8 posts) Bio
|
Date
| Reply #45 on Mon 02 Jun 2008 11:30 PM (UTC) |
Message
| I just use alt+left shift to switch between languages. | Top |
|
Posted by
| Nick Gammon
Australia (23,140 posts) Bio
Forum Administrator |
Date
| Reply #46 on Tue 03 Jun 2008 12:36 AM (UTC) Amended on Sun 05 Jun 2011 09:39 PM (UTC) by Nick Gammon
|
Message
| Ah OK I get the picture now. You type "say" in EN mode, switch to RL or whatever and type what you want to say in Russian? It all becomes clearer now. :)
I think this plugin below might help. The basic problem is to get the input window to send Unicode, which it isn't designed to do.
What this plugin does is take the code-page characters, and turn them into UTF-8 for sending. It builds up a table on-the-fly from the conversion downloaded from www.unicode.org. In this particular case I used the 1251 code page (Cyrillic) however the general idea could be used for any code page, as you just access the correct table from www.unicode.org.
For example, taking the entry for:
0xC0 0x0410 #CYRILLIC CAPITAL LETTER A
The script parses the line, and extracts the 0xC0 and 0x0410. The 0xC0 is turned into a single byte which is the key of a table entry (the entry if you type "CYRILLIC CAPITAL LETTER A" on the keyboard). Then the 0x0410 is converted into a UTF-8 sequence by calling utils.utf8encode. This is the value that 0xC0 "maps to". In this case it is 0xD0 0x90.
Now we are ready to roll. The plugin then intercepts all text sent to the MUD by using OnPluginSend. It does a table lookup to convert the code-page values into UTF-8. The original text is dropped (by returning false) and the new text is sent instead.
To make this work, you need to configure Windows to display the correct code page, by using Control Panel -> Regional and Language Options -> Advanced. Set the "Language for non-Unicode programs" (such as MUSHclient) to the appropriate language (I used Russian for my test).
Now, when you enable the keyboard to be Russian mode (Alt+Left-Shift), as you type you see Russian characters in the input box. With the plugin installed they are converted to UTF-8 on their way out to the MUD.
To see them correctly displayed on the way back (eg. when you say something and the MUD echoes the said text), you need to check the UTF-8 (Unicode) check box in the Output window configuration.
Copy between the lines below and save this text as Translate_Unicode.xml - then use File -> Plugins to install it as a MUSHclient plugin. For a different language than Russian simply find the correct table from the Unicode web site.
<?xml version="1.0" encoding="iso-8859-1"?>
<!DOCTYPE muclient>
<!-- Saved on Tuesday, June 03, 2008, 10:16 AM -->
<!-- MuClient version 4.25 -->
<!-- Plugin "Translate_Unicode" generated by Plugin Wizard -->
<muclient>
<plugin
name="Translate_Unicode_RU"
author="Nick Gammon"
id="bb1c8d004c596b19748fc66c"
language="Lua"
purpose="Translate sent text into UTF-8 (for Russian)"
date_written="2008-06-03 10:11:10"
date_modified="2008-06-04 13:20:00"
requires="4.25"
version="1.1"
>
</plugin>
<!-- Script -->
<script>
<![CDATA[
-- see http://www.unicode.org/Public/MAPPINGS/VENDORS/MICSFT/WINDOWS/CP1251.TXT
-- <------------- replace here for other languages ------------->
conversion = [[
0x80 0x0402 #CYRILLIC CAPITAL LETTER DJE
0x81 0x0403 #CYRILLIC CAPITAL LETTER GJE
0x82 0x201A #SINGLE LOW-9 QUOTATION MARK
0x83 0x0453 #CYRILLIC SMALL LETTER GJE
0x84 0x201E #DOUBLE LOW-9 QUOTATION MARK
0x85 0x2026 #HORIZONTAL ELLIPSIS
0x86 0x2020 #DAGGER
0x87 0x2021 #DOUBLE DAGGER
0x88 0x20AC #EURO SIGN
0x89 0x2030 #PER MILLE SIGN
0x8A 0x0409 #CYRILLIC CAPITAL LETTER LJE
0x8B 0x2039 #SINGLE LEFT-POINTING ANGLE QUOTATION MARK
0x8C 0x040A #CYRILLIC CAPITAL LETTER NJE
0x8D 0x040C #CYRILLIC CAPITAL LETTER KJE
0x8E 0x040B #CYRILLIC CAPITAL LETTER TSHE
0x8F 0x040F #CYRILLIC CAPITAL LETTER DZHE
0x90 0x0452 #CYRILLIC SMALL LETTER DJE
0x91 0x2018 #LEFT SINGLE QUOTATION MARK
0x92 0x2019 #RIGHT SINGLE QUOTATION MARK
0x93 0x201C #LEFT DOUBLE QUOTATION MARK
0x94 0x201D #RIGHT DOUBLE QUOTATION MARK
0x95 0x2022 #BULLET
0x96 0x2013 #EN DASH
0x97 0x2014 #EM DASH
0x98 #UNDEFINED
0x99 0x2122 #TRADE MARK SIGN
0x9A 0x0459 #CYRILLIC SMALL LETTER LJE
0x9B 0x203A #SINGLE RIGHT-POINTING ANGLE QUOTATION MARK
0x9C 0x045A #CYRILLIC SMALL LETTER NJE
0x9D 0x045C #CYRILLIC SMALL LETTER KJE
0x9E 0x045B #CYRILLIC SMALL LETTER TSHE
0x9F 0x045F #CYRILLIC SMALL LETTER DZHE
0xA0 0x00A0 #NO-BREAK SPACE
0xA1 0x040E #CYRILLIC CAPITAL LETTER SHORT U
0xA2 0x045E #CYRILLIC SMALL LETTER SHORT U
0xA3 0x0408 #CYRILLIC CAPITAL LETTER JE
0xA4 0x00A4 #CURRENCY SIGN
0xA5 0x0490 #CYRILLIC CAPITAL LETTER GHE WITH UPTURN
0xA6 0x00A6 #BROKEN BAR
0xA7 0x00A7 #SECTION SIGN
0xA8 0x0401 #CYRILLIC CAPITAL LETTER IO
0xA9 0x00A9 #COPYRIGHT SIGN
0xAA 0x0404 #CYRILLIC CAPITAL LETTER UKRAINIAN IE
0xAB 0x00AB #LEFT-POINTING DOUBLE ANGLE QUOTATION MARK
0xAC 0x00AC #NOT SIGN
0xAD 0x00AD #SOFT HYPHEN
0xAE 0x00AE #REGISTERED SIGN
0xAF 0x0407 #CYRILLIC CAPITAL LETTER YI
0xB0 0x00B0 #DEGREE SIGN
0xB1 0x00B1 #PLUS-MINUS SIGN
0xB2 0x0406 #CYRILLIC CAPITAL LETTER BYELORUSSIAN-UKRAINIAN I
0xB3 0x0456 #CYRILLIC SMALL LETTER BYELORUSSIAN-UKRAINIAN I
0xB4 0x0491 #CYRILLIC SMALL LETTER GHE WITH UPTURN
0xB5 0x00B5 #MICRO SIGN
0xB6 0x00B6 #PILCROW SIGN
0xB7 0x00B7 #MIDDLE DOT
0xB8 0x0451 #CYRILLIC SMALL LETTER IO
0xB9 0x2116 #NUMERO SIGN
0xBA 0x0454 #CYRILLIC SMALL LETTER UKRAINIAN IE
0xBB 0x00BB #RIGHT-POINTING DOUBLE ANGLE QUOTATION MARK
0xBC 0x0458 #CYRILLIC SMALL LETTER JE
0xBD 0x0405 #CYRILLIC CAPITAL LETTER DZE
0xBE 0x0455 #CYRILLIC SMALL LETTER DZE
0xBF 0x0457 #CYRILLIC SMALL LETTER YI
0xC0 0x0410 #CYRILLIC CAPITAL LETTER A
0xC1 0x0411 #CYRILLIC CAPITAL LETTER BE
0xC2 0x0412 #CYRILLIC CAPITAL LETTER VE
0xC3 0x0413 #CYRILLIC CAPITAL LETTER GHE
0xC4 0x0414 #CYRILLIC CAPITAL LETTER DE
0xC5 0x0415 #CYRILLIC CAPITAL LETTER IE
0xC6 0x0416 #CYRILLIC CAPITAL LETTER ZHE
0xC7 0x0417 #CYRILLIC CAPITAL LETTER ZE
0xC8 0x0418 #CYRILLIC CAPITAL LETTER I
0xC9 0x0419 #CYRILLIC CAPITAL LETTER SHORT I
0xCA 0x041A #CYRILLIC CAPITAL LETTER KA
0xCB 0x041B #CYRILLIC CAPITAL LETTER EL
0xCC 0x041C #CYRILLIC CAPITAL LETTER EM
0xCD 0x041D #CYRILLIC CAPITAL LETTER EN
0xCE 0x041E #CYRILLIC CAPITAL LETTER O
0xCF 0x041F #CYRILLIC CAPITAL LETTER PE
0xD0 0x0420 #CYRILLIC CAPITAL LETTER ER
0xD1 0x0421 #CYRILLIC CAPITAL LETTER ES
0xD2 0x0422 #CYRILLIC CAPITAL LETTER TE
0xD3 0x0423 #CYRILLIC CAPITAL LETTER U
0xD4 0x0424 #CYRILLIC CAPITAL LETTER EF
0xD5 0x0425 #CYRILLIC CAPITAL LETTER HA
0xD6 0x0426 #CYRILLIC CAPITAL LETTER TSE
0xD7 0x0427 #CYRILLIC CAPITAL LETTER CHE
0xD8 0x0428 #CYRILLIC CAPITAL LETTER SHA
0xD9 0x0429 #CYRILLIC CAPITAL LETTER SHCHA
0xDA 0x042A #CYRILLIC CAPITAL LETTER HARD SIGN
0xDB 0x042B #CYRILLIC CAPITAL LETTER YERU
0xDC 0x042C #CYRILLIC CAPITAL LETTER SOFT SIGN
0xDD 0x042D #CYRILLIC CAPITAL LETTER E
0xDE 0x042E #CYRILLIC CAPITAL LETTER YU
0xDF 0x042F #CYRILLIC CAPITAL LETTER YA
0xE0 0x0430 #CYRILLIC SMALL LETTER A
0xE1 0x0431 #CYRILLIC SMALL LETTER BE
0xE2 0x0432 #CYRILLIC SMALL LETTER VE
0xE3 0x0433 #CYRILLIC SMALL LETTER GHE
0xE4 0x0434 #CYRILLIC SMALL LETTER DE
0xE5 0x0435 #CYRILLIC SMALL LETTER IE
0xE6 0x0436 #CYRILLIC SMALL LETTER ZHE
0xE7 0x0437 #CYRILLIC SMALL LETTER ZE
0xE8 0x0438 #CYRILLIC SMALL LETTER I
0xE9 0x0439 #CYRILLIC SMALL LETTER SHORT I
0xEA 0x043A #CYRILLIC SMALL LETTER KA
0xEB 0x043B #CYRILLIC SMALL LETTER EL
0xEC 0x043C #CYRILLIC SMALL LETTER EM
0xED 0x043D #CYRILLIC SMALL LETTER EN
0xEE 0x043E #CYRILLIC SMALL LETTER O
0xEF 0x043F #CYRILLIC SMALL LETTER PE
0xF0 0x0440 #CYRILLIC SMALL LETTER ER
0xF1 0x0441 #CYRILLIC SMALL LETTER ES
0xF2 0x0442 #CYRILLIC SMALL LETTER TE
0xF3 0x0443 #CYRILLIC SMALL LETTER U
0xF4 0x0444 #CYRILLIC SMALL LETTER EF
0xF5 0x0445 #CYRILLIC SMALL LETTER HA
0xF6 0x0446 #CYRILLIC SMALL LETTER TSE
0xF7 0x0447 #CYRILLIC SMALL LETTER CHE
0xF8 0x0448 #CYRILLIC SMALL LETTER SHA
0xF9 0x0449 #CYRILLIC SMALL LETTER SHCHA
0xFA 0x044A #CYRILLIC SMALL LETTER HARD SIGN
0xFB 0x044B #CYRILLIC SMALL LETTER YERU
0xFC 0x044C #CYRILLIC SMALL LETTER SOFT SIGN
0xFD 0x044D #CYRILLIC SMALL LETTER E
0xFE 0x044E #CYRILLIC SMALL LETTER YU
0xFF 0x044F #CYRILLIC SMALL LETTER YA
]]
-- <------------- end of part to be replaced for other languages ------------->
-- convert from above code page into UTF-8
unicode_table = {}
function OnPluginInstall ()
require "getlines"
for line in getlines (conversion) do
from, to = string.match (line, "^0x(%x+)%s+0x(%x+)")
if from and to then
from = tonumber (from, 16) -- convert from hex to decimal
to = tonumber (to, 16) -- ditto
unicode_table [string.char (from)] = utils.utf8encode (to)
else -- look for an undefined code point
from = string.match (string.lower (line), "^0x(%x+)%s+%#undefined")
if from then
from = tonumber (from, 16) -- convert from hex to decimal
unicode_table [string.char (from)] = "?" -- don't want bad UTF-8
end -- if undefined code
end -- if found
end -- for
ColourNote ("white", "green",
GetPluginInfo (GetPluginID (), 1) .. " plugin installed")
end -- OnPluginInstall
-- replace bytes with high-order bit set with UTF-8 equivalents
function OnPluginSend (sText)
Send ((string.gsub (sText, "[\128-\255]", unicode_table)))
return false
end -- OnPluginSend
]]>
</script>
</muclient>
|
- Nick Gammon
www.gammon.com.au, www.mushclient.com | Top |
|
Posted by
| Atltais
(8 posts) Bio
|
Date
| Reply #47 on Tue 03 Jun 2008 12:59 AM (UTC) |
Message
| Neat, thanks! (Though while it's not a complete fix, any sort of progress is progress :D)
I'm not actually Russian. (though I do know a bit of it) For an all-Russian MUD I'd imagine people would at least attempt to change commands to be more native.
On a game that I work on we've been looking at using UTF8 for various stuff and have a working implementation of it MUD-side, but many of us use MUSHclient so it's a bit tricky. :)
Have you taken a look at Uniscribe? http://www.microsoft.com/typography/developers/uniscribe/default.htm
Sadly, I have no GUI experience to speak of, so I'm not really of much help. | Top |
|
Posted by
| Nick Gammon
Australia (23,140 posts) Bio
Forum Administrator |
Date
| Reply #48 on Tue 03 Jun 2008 01:19 AM (UTC) |
Message
| I hadn't read it but am doing so now.
Internationalization of MUDs is somewhat of a tricky issue - I had to recompile SMAUG to even test my code, as it assumed that every character over 0x7F should be discarded.
Anyway, my suggested solution has raised its own issues. It seems that once you turn on UTF-8 in the Output display configuration, any text entered into the command window, other than straight ASCII (ie. less than 0x80) then fails alias processing (if you have any aliases at all), with a message about "Error execution regular expression: Bad UTF8".
I hadn't noticed this before, because I normally type English text.
It seems I have to release a new version of MUSHclient that doesn't both attempting to match UTF-8 in the *command* window (that is, aliases), because it won't be UTF-8, it will be text localized to a particular code page.
There is also an issue of copying and pasting from the output window to the command window (eg. copying some text to echo it, or a player's name) - if the text in the output window is UTF-8 then it turns into gibberish in the command window. So, the conversion from code page to UTF-8 has to be reversed when copying and pasting.
|
- Nick Gammon
www.gammon.com.au, www.mushclient.com | Top |
|
Posted by
| Nick Gammon
Australia (23,140 posts) Bio
Forum Administrator |
Date
| Reply #49 on Tue 03 Jun 2008 01:21 AM (UTC) Amended on Wed 04 Jun 2008 02:50 AM (UTC) by Nick Gammon
|
Message
| The plugin below lets you copy selected text from the output window, if it is in UTF-8 format, and converts it back to the appropriate code-page symbols.
<?xml version="1.0" encoding="iso-8859-1"?>
<!DOCTYPE muclient>
<!-- Saved on vrijdag, augustus 03, 2007, 2:06 -->
<!-- MuClient version 4.14 -->
<!-- Plugin "CopyScript" generated by Plugin Wizard -->
<!-- Amended slightly by Nick Gammon, from Worstje's version, on 17 Feb 2008 -->
<!-- Also amended on 3rd June 2008 to convert UTF-8 back to non-UTF-8 for use in the command window -->
<muclient>
<plugin
name="Unicode_Copy_Output"
author="Worstje"
id="29ce226131a0af3140c35141"
language="Lua"
purpose="Allows you to use CTRL+C for the output window if 'All typing goes to command window' is turned on."
save_state="n"
date_written="2007-08-03 02:04:12"
requires="4.00"
version="2.0"
>
</plugin>
<aliases>
<alias
match="^Copy_Output:Copy:29ce226131a0af3140c35141$"
enabled="y"
regexp="y"
omit_from_output="y"
sequence="100"
script="CopyScript"
>
</alias>
</aliases>
<!-- Script -->
<script>
<![CDATA[
-- THIS VERSION CONVERTS UTF-8 back to code-page text
-- See: http://www.gammon.com.au/forum/?id=2681&page=4
-- Thank you, Shaun Biggs, for taking your time to write the CopyScript
-- (formerly Copy2) function below. It was slightly altered by me to suit
-- my usage (wordwrapped lines and no \r\n at start of selection).
-- See forum: http://www.gammon.com.au/forum/?id=8052
-- see http://www.unicode.org/Public/MAPPINGS/VENDORS/MICSFT/WINDOWS/CP1251.TXT
-- <------------- replace here for other languages ------------->
conversion = [[
0x80 0x0402 #CYRILLIC CAPITAL LETTER DJE
0x81 0x0403 #CYRILLIC CAPITAL LETTER GJE
0x82 0x201A #SINGLE LOW-9 QUOTATION MARK
0x83 0x0453 #CYRILLIC SMALL LETTER GJE
0x84 0x201E #DOUBLE LOW-9 QUOTATION MARK
0x85 0x2026 #HORIZONTAL ELLIPSIS
0x86 0x2020 #DAGGER
0x87 0x2021 #DOUBLE DAGGER
0x88 0x20AC #EURO SIGN
0x89 0x2030 #PER MILLE SIGN
0x8A 0x0409 #CYRILLIC CAPITAL LETTER LJE
0x8B 0x2039 #SINGLE LEFT-POINTING ANGLE QUOTATION MARK
0x8C 0x040A #CYRILLIC CAPITAL LETTER NJE
0x8D 0x040C #CYRILLIC CAPITAL LETTER KJE
0x8E 0x040B #CYRILLIC CAPITAL LETTER TSHE
0x8F 0x040F #CYRILLIC CAPITAL LETTER DZHE
0x90 0x0452 #CYRILLIC SMALL LETTER DJE
0x91 0x2018 #LEFT SINGLE QUOTATION MARK
0x92 0x2019 #RIGHT SINGLE QUOTATION MARK
0x93 0x201C #LEFT DOUBLE QUOTATION MARK
0x94 0x201D #RIGHT DOUBLE QUOTATION MARK
0x95 0x2022 #BULLET
0x96 0x2013 #EN DASH
0x97 0x2014 #EM DASH
0x98 #UNDEFINED
0x99 0x2122 #TRADE MARK SIGN
0x9A 0x0459 #CYRILLIC SMALL LETTER LJE
0x9B 0x203A #SINGLE RIGHT-POINTING ANGLE QUOTATION MARK
0x9C 0x045A #CYRILLIC SMALL LETTER NJE
0x9D 0x045C #CYRILLIC SMALL LETTER KJE
0x9E 0x045B #CYRILLIC SMALL LETTER TSHE
0x9F 0x045F #CYRILLIC SMALL LETTER DZHE
0xA0 0x00A0 #NO-BREAK SPACE
0xA1 0x040E #CYRILLIC CAPITAL LETTER SHORT U
0xA2 0x045E #CYRILLIC SMALL LETTER SHORT U
0xA3 0x0408 #CYRILLIC CAPITAL LETTER JE
0xA4 0x00A4 #CURRENCY SIGN
0xA5 0x0490 #CYRILLIC CAPITAL LETTER GHE WITH UPTURN
0xA6 0x00A6 #BROKEN BAR
0xA7 0x00A7 #SECTION SIGN
0xA8 0x0401 #CYRILLIC CAPITAL LETTER IO
0xA9 0x00A9 #COPYRIGHT SIGN
0xAA 0x0404 #CYRILLIC CAPITAL LETTER UKRAINIAN IE
0xAB 0x00AB #LEFT-POINTING DOUBLE ANGLE QUOTATION MARK
0xAC 0x00AC #NOT SIGN
0xAD 0x00AD #SOFT HYPHEN
0xAE 0x00AE #REGISTERED SIGN
0xAF 0x0407 #CYRILLIC CAPITAL LETTER YI
0xB0 0x00B0 #DEGREE SIGN
0xB1 0x00B1 #PLUS-MINUS SIGN
0xB2 0x0406 #CYRILLIC CAPITAL LETTER BYELORUSSIAN-UKRAINIAN I
0xB3 0x0456 #CYRILLIC SMALL LETTER BYELORUSSIAN-UKRAINIAN I
0xB4 0x0491 #CYRILLIC SMALL LETTER GHE WITH UPTURN
0xB5 0x00B5 #MICRO SIGN
0xB6 0x00B6 #PILCROW SIGN
0xB7 0x00B7 #MIDDLE DOT
0xB8 0x0451 #CYRILLIC SMALL LETTER IO
0xB9 0x2116 #NUMERO SIGN
0xBA 0x0454 #CYRILLIC SMALL LETTER UKRAINIAN IE
0xBB 0x00BB #RIGHT-POINTING DOUBLE ANGLE QUOTATION MARK
0xBC 0x0458 #CYRILLIC SMALL LETTER JE
0xBD 0x0405 #CYRILLIC CAPITAL LETTER DZE
0xBE 0x0455 #CYRILLIC SMALL LETTER DZE
0xBF 0x0457 #CYRILLIC SMALL LETTER YI
0xC0 0x0410 #CYRILLIC CAPITAL LETTER A
0xC1 0x0411 #CYRILLIC CAPITAL LETTER BE
0xC2 0x0412 #CYRILLIC CAPITAL LETTER VE
0xC3 0x0413 #CYRILLIC CAPITAL LETTER GHE
0xC4 0x0414 #CYRILLIC CAPITAL LETTER DE
0xC5 0x0415 #CYRILLIC CAPITAL LETTER IE
0xC6 0x0416 #CYRILLIC CAPITAL LETTER ZHE
0xC7 0x0417 #CYRILLIC CAPITAL LETTER ZE
0xC8 0x0418 #CYRILLIC CAPITAL LETTER I
0xC9 0x0419 #CYRILLIC CAPITAL LETTER SHORT I
0xCA 0x041A #CYRILLIC CAPITAL LETTER KA
0xCB 0x041B #CYRILLIC CAPITAL LETTER EL
0xCC 0x041C #CYRILLIC CAPITAL LETTER EM
0xCD 0x041D #CYRILLIC CAPITAL LETTER EN
0xCE 0x041E #CYRILLIC CAPITAL LETTER O
0xCF 0x041F #CYRILLIC CAPITAL LETTER PE
0xD0 0x0420 #CYRILLIC CAPITAL LETTER ER
0xD1 0x0421 #CYRILLIC CAPITAL LETTER ES
0xD2 0x0422 #CYRILLIC CAPITAL LETTER TE
0xD3 0x0423 #CYRILLIC CAPITAL LETTER U
0xD4 0x0424 #CYRILLIC CAPITAL LETTER EF
0xD5 0x0425 #CYRILLIC CAPITAL LETTER HA
0xD6 0x0426 #CYRILLIC CAPITAL LETTER TSE
0xD7 0x0427 #CYRILLIC CAPITAL LETTER CHE
0xD8 0x0428 #CYRILLIC CAPITAL LETTER SHA
0xD9 0x0429 #CYRILLIC CAPITAL LETTER SHCHA
0xDA 0x042A #CYRILLIC CAPITAL LETTER HARD SIGN
0xDB 0x042B #CYRILLIC CAPITAL LETTER YERU
0xDC 0x042C #CYRILLIC CAPITAL LETTER SOFT SIGN
0xDD 0x042D #CYRILLIC CAPITAL LETTER E
0xDE 0x042E #CYRILLIC CAPITAL LETTER YU
0xDF 0x042F #CYRILLIC CAPITAL LETTER YA
0xE0 0x0430 #CYRILLIC SMALL LETTER A
0xE1 0x0431 #CYRILLIC SMALL LETTER BE
0xE2 0x0432 #CYRILLIC SMALL LETTER VE
0xE3 0x0433 #CYRILLIC SMALL LETTER GHE
0xE4 0x0434 #CYRILLIC SMALL LETTER DE
0xE5 0x0435 #CYRILLIC SMALL LETTER IE
0xE6 0x0436 #CYRILLIC SMALL LETTER ZHE
0xE7 0x0437 #CYRILLIC SMALL LETTER ZE
0xE8 0x0438 #CYRILLIC SMALL LETTER I
0xE9 0x0439 #CYRILLIC SMALL LETTER SHORT I
0xEA 0x043A #CYRILLIC SMALL LETTER KA
0xEB 0x043B #CYRILLIC SMALL LETTER EL
0xEC 0x043C #CYRILLIC SMALL LETTER EM
0xED 0x043D #CYRILLIC SMALL LETTER EN
0xEE 0x043E #CYRILLIC SMALL LETTER O
0xEF 0x043F #CYRILLIC SMALL LETTER PE
0xF0 0x0440 #CYRILLIC SMALL LETTER ER
0xF1 0x0441 #CYRILLIC SMALL LETTER ES
0xF2 0x0442 #CYRILLIC SMALL LETTER TE
0xF3 0x0443 #CYRILLIC SMALL LETTER U
0xF4 0x0444 #CYRILLIC SMALL LETTER EF
0xF5 0x0445 #CYRILLIC SMALL LETTER HA
0xF6 0x0446 #CYRILLIC SMALL LETTER TSE
0xF7 0x0447 #CYRILLIC SMALL LETTER CHE
0xF8 0x0448 #CYRILLIC SMALL LETTER SHA
0xF9 0x0449 #CYRILLIC SMALL LETTER SHCHA
0xFA 0x044A #CYRILLIC SMALL LETTER HARD SIGN
0xFB 0x044B #CYRILLIC SMALL LETTER YERU
0xFC 0x044C #CYRILLIC SMALL LETTER SOFT SIGN
0xFD 0x044D #CYRILLIC SMALL LETTER E
0xFE 0x044E #CYRILLIC SMALL LETTER YU
0xFF 0x044F #CYRILLIC SMALL LETTER YA
]]
-- <------------- end of part to be replaced for other languages ------------->
-- convert from above code page into UTF-8
unicode_table = {}
function OnPluginInstall ()
require "getlines"
for line in getlines (conversion) do
from, to = string.match (line, "^0x(%x+)%s+0x(%x+)")
if from and to then
from = tonumber (from, 16) -- convert from hex to decimal
to = tonumber (to, 16) -- ditto
unicode_table [utils.utf8encode (to)] = string.char (from)
end -- if found
end -- for
end -- OnPluginInstall
-- some long alias that no-one will ever want to type
Accelerator ("Ctrl+C", "Copy_Output:Copy:29ce226131a0af3140c35141")
function CopyScript(name, line, wildcs)
-- find selection in output window, if any
local first_line, last_line = GetSelectionStartLine(),
math.min (GetSelectionEndLine(), GetLinesInBufferCount ())
local first_column, last_column = GetSelectionStartColumn(), GetSelectionEndColumn()
-- nothing selected, do normal copy
if first_line <= 0 then
DoCommand("copy")
return
end -- if nothing to copy from output window
local copystring = ""
-- iterate to build up copy text
for line = first_line, last_line do
if line < last_line then
copystring = copystring .. GetLineInfo(line).text:sub (first_column) -- copy rest of line
first_column = 1
-- Is this a new line or merely the continuation of a paragraph?
if GetLineInfo (line, 3) then
copystring = copystring .. "\r\n"
end -- new line
else
copystring = copystring .. GetLineInfo(line).text:sub (first_column, last_column - 1)
end -- if
end -- for loop
-- Get rid of a spurious extra new line at the start.
if copystring:sub (1, 2) == "\r\n" then
copystring = copystring:sub (3)
end -- if newline at start
-- correct UTF-8
-- see: http://www.gammon.com.au/forum/?id=2681&page=4
copystring = string.gsub (copystring, "[\192-\247][\128-\191]+", unicode_table)
-- finally can set clipboard contents
SetClipboard(copystring)
end -- function CopyScript
]]>
</script>
</muclient>
|
- Nick Gammon
www.gammon.com.au, www.mushclient.com | Top |
|
Posted by
| Nick Gammon
Australia (23,140 posts) Bio
Forum Administrator |
Date
| Reply #50 on Tue 03 Jun 2008 01:27 AM (UTC) |
Message
|
Quote:
I'm not actually Russian.
Me neither. ;)
If the general idea works, you should be able to publish customized plugins for the languages which are in use on your MUD. I presume there aren't hundreds.
Basically look up the code-page translations on unicode.org, insert them into the plugin at the appropriate place, and save with some sort of suffix (eg. Translate_Unicode.RU.xml). |
- Nick Gammon
www.gammon.com.au, www.mushclient.com | Top |
|
Posted by
| Nick Gammon
Australia (23,140 posts) Bio
Forum Administrator |
Date
| Reply #51 on Tue 03 Jun 2008 01:36 AM (UTC) |
Message
| I should point out that this is, fairly obviously, a hybrid "work around" solution.
The main problem really is that MUSHclient is not Unicode-enabled, and despite my best efforts previously, it was hard to get Unicode out of the command window.
What you will have here is, if you set the code page appropriately (in International settings), one language you can use in the Command window (presumably your native language), however by enabling UTF-8 in the Output window, all languages can be displayed (with a suitable font).
The proposed "copy from output window" plugin would let you copy your *own* language from the output window, for editing and resending. Copying a different code set would result in gibberish still, because the Command window is not Unicode. |
- Nick Gammon
www.gammon.com.au, www.mushclient.com | Top |
|
Posted by
| Nick Gammon
Australia (23,140 posts) Bio
Forum Administrator |
Date
| Reply #52 on Tue 03 Jun 2008 04:02 AM (UTC) |
Message
| Version 4.26 has now been released, which should work properly with copying and pasting from the output window, if you install the above plugin. |
- Nick Gammon
www.gammon.com.au, www.mushclient.com | Top |
|
Posted by
| Castamir
(2 posts) Bio
|
Date
| Reply #53 on Tue 03 Jun 2008 08:44 PM (UTC) |
Message
| Alas, pasting from other programs is still broken, and what's most important, so is typing things from the keyboard.
My knowledge of Windows is sketchy (I'm an Unix guy), but I would tackle the problem the following way:
* no 16-bit internal strings. That's a pain in the rear, and as you said, there's 6k strings you would have to change, not to mention all function calls and what not...
* the input box itself needs to be an Unicode window, though
* when taking data from it, you would call GetWindowTextW() (always!) then WideCharToMultiByte() to CP_UTF8 if mushclient is in UTF-8 mode and to CP_ACP otherwise.
* ... and the same with MultiByteToWideChar() and SetWindowTextW() the other way | Top |
|
Posted by
| Nick Gammon
Australia (23,140 posts) Bio
Forum Administrator |
Date
| Reply #54 on Tue 03 Jun 2008 11:05 PM (UTC) |
Message
| You should at least be able to get typing into the window working, as I am no expert but I managed after trying a couple of things.
First, I added a second language as a keyboard option:

Then, I set my code page to Russian, so that when I used my extra keyboard settings, Russian is what I would see:

Now in the bottom-right corner I see a keyboard code (EN) which shows what I type will appear in English:

I type "say " into the command window (which appears as such), and then hit the keyboard modifier to switch to Russian (Alt+Left_Shift which is the default). Now the keyboard code changes:

Now what I type appears in Russian, which is what I want. However when I hit <Enter> to send it, the plugin switches the code page data into UTF-8, which is what arrives at the MUD. |
- Nick Gammon
www.gammon.com.au, www.mushclient.com | Top |
|
Posted by
| Atltais
(8 posts) Bio
|
Date
| Reply #55 on Tue 03 Jun 2008 11:06 PM (UTC) |
Message
| Applocale is a much simpler way to change the code page, as an aside. | Top |
|
Posted by
| Nick Gammon
Australia (23,140 posts) Bio
Forum Administrator |
Date
| Reply #56 on Tue 03 Jun 2008 11:10 PM (UTC) |
Message
|
Quote:
the input box itself needs to be an Unicode window, though
Well that is the hard bit. I tried for a few days previously, but I don't think you can make individual windows in a non-Unicode program to be Unicode.
The source code is freely available, if someone can make it work I would be pleased to hear from them.
Quote:
Alas, pasting from other programs is still broken,
To make that work, as I presume the code on the clipboard is UTF-8, would be to make a plugin that does what my "copy from the output window" plugin does. You would hit a function key (eg. F8), the plugin kicks in, grabs the clipboard contents, switches it from UTF-8 to the code page, and puts it back. Or that is the theory at least.
I know this isn't perfect, but my initial tests with Russian seemed to show it worked smoothly enough, providing you didn't try to get too fancy and copy and paste from one application to another.
I suspect the copying/pasting thing might be related to clipboard "types" where the UTF-8 data is not necessarily stored in the TEXT data type (but I could be wrong). |
- Nick Gammon
www.gammon.com.au, www.mushclient.com | Top |
|
Posted by
| Nick Gammon
Australia (23,140 posts) Bio
Forum Administrator |
Date
| Reply #57 on Wed 04 Jun 2008 02:46 AM (UTC) |
Message
| As an example of another language, if you wanted Greek encoding, you could go to this page:
http://www.unicode.org/Public/MAPPINGS/VENDORS/MICSFT/WINDOWS/CP1253.TXT
Grab the mappings for 0x80 to 0xFF (see below) and paste them into the two plugins instead of the Russian ones:
0x80 0x20AC #EURO SIGN
0x81 #UNDEFINED
0x82 0x201A #SINGLE LOW-9 QUOTATION MARK
0x83 0x0192 #LATIN SMALL LETTER F WITH HOOK
0x84 0x201E #DOUBLE LOW-9 QUOTATION MARK
0x85 0x2026 #HORIZONTAL ELLIPSIS
0x86 0x2020 #DAGGER
0x87 0x2021 #DOUBLE DAGGER
0x88 #UNDEFINED
0x89 0x2030 #PER MILLE SIGN
0x8A #UNDEFINED
0x8B 0x2039 #SINGLE LEFT-POINTING ANGLE QUOTATION MARK
0x8C #UNDEFINED
0x8D #UNDEFINED
0x8E #UNDEFINED
0x8F #UNDEFINED
0x90 #UNDEFINED
0x91 0x2018 #LEFT SINGLE QUOTATION MARK
0x92 0x2019 #RIGHT SINGLE QUOTATION MARK
0x93 0x201C #LEFT DOUBLE QUOTATION MARK
0x94 0x201D #RIGHT DOUBLE QUOTATION MARK
0x95 0x2022 #BULLET
0x96 0x2013 #EN DASH
0x97 0x2014 #EM DASH
0x98 #UNDEFINED
0x99 0x2122 #TRADE MARK SIGN
0x9A #UNDEFINED
0x9B 0x203A #SINGLE RIGHT-POINTING ANGLE QUOTATION MARK
0x9C #UNDEFINED
0x9D #UNDEFINED
0x9E #UNDEFINED
0x9F #UNDEFINED
0xA0 0x00A0 #NO-BREAK SPACE
0xA1 0x0385 #GREEK DIALYTIKA TONOS
0xA2 0x0386 #GREEK CAPITAL LETTER ALPHA WITH TONOS
0xA3 0x00A3 #POUND SIGN
0xA4 0x00A4 #CURRENCY SIGN
0xA5 0x00A5 #YEN SIGN
0xA6 0x00A6 #BROKEN BAR
0xA7 0x00A7 #SECTION SIGN
0xA8 0x00A8 #DIAERESIS
0xA9 0x00A9 #COPYRIGHT SIGN
0xAA #UNDEFINED
0xAB 0x00AB #LEFT-POINTING DOUBLE ANGLE QUOTATION MARK
0xAC 0x00AC #NOT SIGN
0xAD 0x00AD #SOFT HYPHEN
0xAE 0x00AE #REGISTERED SIGN
0xAF 0x2015 #HORIZONTAL BAR
0xB0 0x00B0 #DEGREE SIGN
0xB1 0x00B1 #PLUS-MINUS SIGN
0xB2 0x00B2 #SUPERSCRIPT TWO
0xB3 0x00B3 #SUPERSCRIPT THREE
0xB4 0x0384 #GREEK TONOS
0xB5 0x00B5 #MICRO SIGN
0xB6 0x00B6 #PILCROW SIGN
0xB7 0x00B7 #MIDDLE DOT
0xB8 0x0388 #GREEK CAPITAL LETTER EPSILON WITH TONOS
0xB9 0x0389 #GREEK CAPITAL LETTER ETA WITH TONOS
0xBA 0x038A #GREEK CAPITAL LETTER IOTA WITH TONOS
0xBB 0x00BB #RIGHT-POINTING DOUBLE ANGLE QUOTATION MARK
0xBC 0x038C #GREEK CAPITAL LETTER OMICRON WITH TONOS
0xBD 0x00BD #VULGAR FRACTION ONE HALF
0xBE 0x038E #GREEK CAPITAL LETTER UPSILON WITH TONOS
0xBF 0x038F #GREEK CAPITAL LETTER OMEGA WITH TONOS
0xC0 0x0390 #GREEK SMALL LETTER IOTA WITH DIALYTIKA AND TONOS
0xC1 0x0391 #GREEK CAPITAL LETTER ALPHA
0xC2 0x0392 #GREEK CAPITAL LETTER BETA
0xC3 0x0393 #GREEK CAPITAL LETTER GAMMA
0xC4 0x0394 #GREEK CAPITAL LETTER DELTA
0xC5 0x0395 #GREEK CAPITAL LETTER EPSILON
0xC6 0x0396 #GREEK CAPITAL LETTER ZETA
0xC7 0x0397 #GREEK CAPITAL LETTER ETA
0xC8 0x0398 #GREEK CAPITAL LETTER THETA
0xC9 0x0399 #GREEK CAPITAL LETTER IOTA
0xCA 0x039A #GREEK CAPITAL LETTER KAPPA
0xCB 0x039B #GREEK CAPITAL LETTER LAMDA
0xCC 0x039C #GREEK CAPITAL LETTER MU
0xCD 0x039D #GREEK CAPITAL LETTER NU
0xCE 0x039E #GREEK CAPITAL LETTER XI
0xCF 0x039F #GREEK CAPITAL LETTER OMICRON
0xD0 0x03A0 #GREEK CAPITAL LETTER PI
0xD1 0x03A1 #GREEK CAPITAL LETTER RHO
0xD2 #UNDEFINED
0xD3 0x03A3 #GREEK CAPITAL LETTER SIGMA
0xD4 0x03A4 #GREEK CAPITAL LETTER TAU
0xD5 0x03A5 #GREEK CAPITAL LETTER UPSILON
0xD6 0x03A6 #GREEK CAPITAL LETTER PHI
0xD7 0x03A7 #GREEK CAPITAL LETTER CHI
0xD8 0x03A8 #GREEK CAPITAL LETTER PSI
0xD9 0x03A9 #GREEK CAPITAL LETTER OMEGA
0xDA 0x03AA #GREEK CAPITAL LETTER IOTA WITH DIALYTIKA
0xDB 0x03AB #GREEK CAPITAL LETTER UPSILON WITH DIALYTIKA
0xDC 0x03AC #GREEK SMALL LETTER ALPHA WITH TONOS
0xDD 0x03AD #GREEK SMALL LETTER EPSILON WITH TONOS
0xDE 0x03AE #GREEK SMALL LETTER ETA WITH TONOS
0xDF 0x03AF #GREEK SMALL LETTER IOTA WITH TONOS
0xE0 0x03B0 #GREEK SMALL LETTER UPSILON WITH DIALYTIKA AND TONOS
0xE1 0x03B1 #GREEK SMALL LETTER ALPHA
0xE2 0x03B2 #GREEK SMALL LETTER BETA
0xE3 0x03B3 #GREEK SMALL LETTER GAMMA
0xE4 0x03B4 #GREEK SMALL LETTER DELTA
0xE5 0x03B5 #GREEK SMALL LETTER EPSILON
0xE6 0x03B6 #GREEK SMALL LETTER ZETA
0xE7 0x03B7 #GREEK SMALL LETTER ETA
0xE8 0x03B8 #GREEK SMALL LETTER THETA
0xE9 0x03B9 #GREEK SMALL LETTER IOTA
0xEA 0x03BA #GREEK SMALL LETTER KAPPA
0xEB 0x03BB #GREEK SMALL LETTER LAMDA
0xEC 0x03BC #GREEK SMALL LETTER MU
0xED 0x03BD #GREEK SMALL LETTER NU
0xEE 0x03BE #GREEK SMALL LETTER XI
0xEF 0x03BF #GREEK SMALL LETTER OMICRON
0xF0 0x03C0 #GREEK SMALL LETTER PI
0xF1 0x03C1 #GREEK SMALL LETTER RHO
0xF2 0x03C2 #GREEK SMALL LETTER FINAL SIGMA
0xF3 0x03C3 #GREEK SMALL LETTER SIGMA
0xF4 0x03C4 #GREEK SMALL LETTER TAU
0xF5 0x03C5 #GREEK SMALL LETTER UPSILON
0xF6 0x03C6 #GREEK SMALL LETTER PHI
0xF7 0x03C7 #GREEK SMALL LETTER CHI
0xF8 0x03C8 #GREEK SMALL LETTER PSI
0xF9 0x03C9 #GREEK SMALL LETTER OMEGA
0xFA 0x03CA #GREEK SMALL LETTER IOTA WITH DIALYTIKA
0xFB 0x03CB #GREEK SMALL LETTER UPSILON WITH DIALYTIKA
0xFC 0x03CC #GREEK SMALL LETTER OMICRON WITH TONOS
0xFD 0x03CD #GREEK SMALL LETTER UPSILON WITH TONOS
0xFE 0x03CE #GREEK SMALL LETTER OMEGA WITH TONOS
0xFF #UNDEFINED
|
- Nick Gammon
www.gammon.com.au, www.mushclient.com | Top |
|
Posted by
| Nick Gammon
Australia (23,140 posts) Bio
Forum Administrator |
Date
| Reply #58 on Wed 04 Jun 2008 03:39 AM (UTC) |
Message
| An example of the first plugin (on this page) in operation is here:

This demonstrates how (with a Greek version, using the codes just above), I was able to type in the command window using Greek characters, and send them. The characters arrived in the MUD converted to UTF-8, which were then echoed back correctly in the output window. |
- Nick Gammon
www.gammon.com.au, www.mushclient.com | Top |
|
Posted by
| Atltais
(8 posts) Bio
|
Date
| Reply #59 on Sun 08 Jun 2008 10:22 PM (UTC) |
Message
| It looks like defining UNICODE would (technically) do it, but it'll cough up a whole slew of errors (about 5000 or so) when you attempt this. | Top |
|
The dates and times for posts above are shown in Universal Co-ordinated Time (UTC).
To show them in your local time you can join the forum, and then set the 'time correction' field in your profile to the number of hours difference between your location and UTC time.
235,208 views.
This is page 4, subject is 5 pages long:
1
2
3
4 5
It is now over 60 days since the last post. This thread is closed.
Refresh page
top