UTF-8 Plug-In Encoding Not Recognized ?!?

Posted by 1of10   Canada  (54 posts)  Bio
Date Sat 10 Apr 2004 08:20 AM (UTC)

Amended on Sat 10 Apr 2004 08:26 AM (UTC) by 1of10

I'm not sure if this is the fault of my editor incorrectly saving the files, or if MUclient has a small bug in it when loading certain edited UTF-8 files...

My editor is UltraEdit-32 v10.10c. MUclient is v3.47.

My editor status line says the file format is U8-UNIX (or U8-DOS).

When I attempt to load one of these edited files, I get the following error:

Line 1: Expected '<', got "ÿ" (content not permitted here) (problem in this file)

I can only solve this problem by converting the file "UTF-8 to ASCII" and changing the 'encoding=' in the first line of the plug-in to "ASCII."

At first, while adding this report/post, I thought it might be my editor was improperly handling UTF-8 (maybe it still is?). I was using 10.10a, and just now upgraded to 10.10c. The changelog says improved UTF-8 handling. However, this problem still exists...

Oddly enough, when I examine both an ASCII encoded, edited file and a UTF-8 encoded, un-edited (original MUclient distributed) file, the same two characters are always present, before the '<' of the <?xml ...?> tag: ÿþ

Posted by Nick Gammon   Australia  (23,133 posts)  Bio   Forum Administrator
Date Reply #1 on Sat 10 Apr 2004 10:47 PM (UTC)
The characters you refer to are hex FF FE which are used as a "unicode-marker" at the start of a text file (eg. by Notepad).

MUSHclient does not at present detect that marker. I have added it as suggestion #515 for it to do so.

Posted by Nick Gammon   Australia  (23,133 posts)  Bio   Forum Administrator
Date Reply #2 on Fri 16 Apr 2004 04:13 AM (UTC)
After checking the file (a Unicode file created with Notepad) it is really a 16-byte Unicode file, not a UTF-8 file, so strictly speaking MUSHclient *is* handling UTF-8 (however not 16-byte Unicode).

However I have changed MUSHclient v 3.48 to detect these "indicator bytes" and convert the file from 16-bit Unicode to UTF-8, and then process it.

Posted by Nick Gammon   Australia  (23,133 posts)  Bio   Forum Administrator
Date Reply #3 on Mon 26 Apr 2004 02:23 AM (UTC)
Version 3.48 should recognise those files correctly now.

