Quote:
I ran into this problem with MUSH not giving back precisely what it stores too. Nick, is there a chance you could have a look into this and see if it can be fixed up?
I am a bit lost about what "pickling" is exactly, I thought you did that to vegetables, like onions.
Anyway, I am assuming from the general gist here that you are using some function to dump something into binary format, which is saved in a variable, which is in a plugins save file or a world file, and then that variable either won't load in correctly next time, or has some characters omitted or replaced. Is that it?
Maybe it would help to construct a very small test case so we can agree on the exact problem. For example:
SetVariable ("test", "test\n")
A quick test seems to show that any combination of \n, \r, and \r\n save correctly - that is, you can examine the saved variable and it is saved as written. However, attempting to save \0 does not work because MUSHclient uses 0x00 as a string terminator in a number of places, hence the warning under functions like the Base 64 decode.
It has been a standard practice when using C libraries for a long time to use 0x00 as a string terminator (rightly or wrongly), and when I started writing MUSHclient I used that convention. Various functions (like strlen) which are used extensively, detect string lengths by scanning a string for that terminator.
Lua uses a different method, which is to store the length seperately, which is why the Lua versions of those functions can handle the 0x00 byte, however only for storage in Lua variables. Once they are placed into MUSHclient variables the same problem is likely to rear its head.
However back to the pickled onions problem, things change when *loading* the saved variables. You can look at the source for the XML parser in MUSHclient (xmlparse.cpp), to see a couple of things it does to the string data whilst loading it:
// convert tabs to spaces, we don't want tabs in our data
m_strxmlBuffer.Replace ('\t', ' '); // line 216
And a bit further on (line 675 onwards):
// copy if not nested, and not inside an element definition
// -- omit carriage returns
if (iDepth == 0 &&
!bInside &&
*pi != '\r')
{
// make linefeeds into carriage return/linefeed
if (*pi == '\n')
*po++ = '\r';
*po++ = *pi; // copy if not inside an element
}
What I read from this is that you will find:
- Tabs (0x09) will be converted to spaces
- Carriage returns (0x0D) will be omitted
- Line feeds (0x0A) will be loaded as 0x0D 0x0A
I don't remember the reason to convert tabs to spaces, there must have been one. Perhaps it was because people were using external editors to create plugins, and put tabs inside the XML, which were then not recognised later on in the XML parser. As for the carriage-returns/linefeeds it is basically doing what it does to incoming MUD data - try to normalise the various line endings into standard Windows CR/LF form. For example, if you edited a multi-line trigger on a Unix platform, and only had linefeeds between lines, it would not display correctly on Windows.
What I recommend you do is convert "binary" strings yourself before saving them (eg. using Base64Encode) and then converting them back on loading (eg. using Base64Decode). However you will still have problems with a string with 0x00 in it. In that case I assume if Python is indeed creating strings with 0x00 in them, it will have tools to convert them into printable form - like its own version of
Base64Encode / Base64Decode.
To be honest, when I wrote the variables stuff (and the XML loader) I expected people to want to store things like mob names, not pure binary data with imbedded nulls, carriage-returns and linefeeds. If I had expected that, then MUSHclient would have been written from the start with more robust string handling - that is, the ability to store strings with 0x00 in them, which basically means all the standard C string libraries can't be used. |