Notice: Any messages purporting to come from this site telling you that your password has expired, or that you need to verify your details, confirm your email, resolve issues, making threats, or asking for money, are
spam. We do not email users with any such messages. If you have lost your password you can obtain a new one by using the
password reset link.
Entire forum
➜ MUSHclient
➜ Lua
➜ Parsing XML documents
It is now over 60 days since the last post. This thread is closed.
Refresh page
Posted by
| Nick Gammon
Australia (23,120 posts) Bio
Forum Administrator |
Date
| Sun 13 Nov 2005 01:17 AM (UTC) Amended on Sun 13 Nov 2005 01:26 AM (UTC) by Nick Gammon
|
Message
| Version 3.69 of MUSHclient adds a new function (xmlread) to the "utils" table, which uses MUSHclient's internal XML parser to parse an XML string you supply. This effectively would let you parse triggers, aliases etc. that you have copied to the clipboard as text (or created with ExportXML script routine), and see exactly what each value is set to. Or, by reading a MUSHclient world file into memory as a string, you could parse that.
The XML parser is not necessarily 100% industry-standard XML parsing, however it is the method MUSHclient uses for its own XML documents, and should be reasonably compatible with standard XML unless you use some of the more fancy XML extensions. It should certainly parse the XML output by MUSHclient itself (eg. triggers, aliases, world files, plugins) as that is the same routine it uses to read them in.
You pass to the parser a single string, which is the XML to be parsed. If the parsing is successful three results are returned:
- The root node (all other nodes are children of this node)
- The root document name (eg. "muclient")
- A table of custom entities in the document, or nil if no custom entities
If the parsing fails, three results are returned:
- nil - to indicate failure
- The error reason
- The line the error occurred at
You can pass the first 2 results to "assert" to quickly check if the parsing was successful.
Each node consists of a table with the following entries:
- name - name of the node (eg. <trigger>foo</trigger> - the name is "trigger")
- content - contents of the node (eg. <trigger>foo</trigger> - the content is "foo")
- empty - boolean to indicate if the node is empty. (eg. <br/> is an empty node)
- line - which line in the XML string the node occurred on (eg. line 5)
- attributes - a table of attributes for this node, keyed by the attribute name
(eg. "world_file_version"="15"). Attribute names have to be unique so we can used a keyed lookup to find them.
The attributes table is not present if there are no attributes defined.
- nodes - a table of child nodes, keyed by ascending number (the order they appeared in). Each child node has the same contents as described above.
Children are not necessarily unique (eg. there may be more than one <trigger> node in a document) so they are keyed by number, and not by node name.
The nodes table is not present if there are no children of this node.
Example of use:
a, b, c = utils.xmlread [[
<foo width="1" height="2">
contents of foo
<bar west="true" fish="bicycle">
child of foo
</bar>
</foo>
<goat blood="100">eep</goat>
]]
if not a then
print ("error on line = ", c)
end -- if
assert (a, b)
tprint (a)
Output:
"line"=0
"name"=""
"content"=""
"nodes":
1:
"line"=2
"name"="foo"
"nodes":
1:
"line"=4
"name"="bar"
"content"="
child of foo
"
"attributes":
"fish"="bicycle"
"west"="true"
"content"="
contents of foo
"
"attributes":
"height"="2"
"width"="1"
2:
"line"=8
"name"="goat"
"content"="eep"
"attributes":
"blood"="100"
You can see from the above that the "root" node is really just an unnamed node which is the placeholder for the top level nodes (ie. the first "real" node is a child of the root node). In this case the node "foo" is the first child of the root node. The node "goat" is the 2nd child of the root node.
Custom entities are declared in the <!DOCTYPE> directive, like this:
<!DOCTYPE muclient [
<!ENTITY afk_command "afk" >
<!ENTITY timer_mins "5" >
<!ENTITY timer_secs "0" >
<!ENTITY afk_message "You are now afk." >
<!ENTITY not_afk_message "You are no longer afk." >
]>
If your XML document contains such entries, they will appear in the "custom entities" table returned as the 3rd result from utils.xmlread.
Note that custom entities are automatically replaced in the body of the document, it is not possible to reconstruct from the nodes where they occurred.
Another example, parsing a standard alias:
<aliases>
<alias
name="test"
match="eat"
sequence="100"
>
<send>eat food</send>
</alias>
</aliases>
Using tprint to print the result gives this:
"line"=0
"name"=""
"nodes":
1:
"line"=2
"name"="aliases"
"nodes":
1:
"line"=3
"name"="alias"
"nodes":
1:
"line"=8
"content"="eat food"
"name"="send"
"content"="
"
"attributes":
"sequence"="100"
"name"="test"
"match"="eat"
"content"="
"
"content"=""
Here you can see the first child node (key 1, name "aliases") is the "all aliases" node. Under that (a child of that) is the node (key 1, name "alias") for the first alias. That also has a child, the <send> node, which is what the alias sends.
Thus the hierarchy is:
root (unnamed) -> aliases -> alias -> send
|
- Nick Gammon
www.gammon.com.au, www.mushclient.com | Top |
|
Posted by
| Nick Gammon
Australia (23,120 posts) Bio
Forum Administrator |
Date
| Reply #1 on Sun 13 Nov 2005 03:18 AM (UTC) Amended on Sun 13 Nov 2005 03:33 AM (UTC) by Nick Gammon
|
Message
| As an example of using the returned XML information, this is a simple Lua function that would re-write its contents as XML again. It does not handle every case (such as the doctype and entities) however it shows the general idea ...
function writenode (node)
-- root node won't have a name
if node.name ~= "" then
-- show node name followed by attributes (if any)
Tell ("<" .. node.name)
if node.attributes then
print ""
for k, v in pairs (node.attributes) do
print (" " .. k .. '="' .. FixupHTML (v) .. '"')
end -- doing attributes
end -- if
if node.empty then
print ("/>")
return -- no closing tag
else
Tell (">")
end -- if
end -- if have a node name
-- print node contents
Tell (FixupHTML (node.content))
-- do children
if node.nodes then
for k, v in ipairs (node.nodes) do
writenode (v)
end -- for
end -- of having children
-- root node won't have a name
if node.name ~= "" then
-- closing tag
print ("</" .. node.name .. ">")
end -- if have a node name
end -- writenode
If we call this for the above alias, like this:
writenode (a)
We get the following XML output, which is similar to what we had in the first place:
<aliases>
<alias
sequence="100"
name="test"
match="eat"
>
<send>eat food</send>
</alias>
</aliases>
|
- Nick Gammon
www.gammon.com.au, www.mushclient.com | Top |
|
Posted by
| Ked
Russia (524 posts) Bio
|
Date
| Reply #2 on Sun 13 Nov 2005 07:29 AM (UTC) |
Message
| Would it be possible to also add a string splitting function, and possibly a microsecond timestamping one to the same table? I've found this code for splitting in PIL and compiled it as per your instructions on extending Lua. It seems to work OK, though I can't tell if it's really up to speed or not:
#define LUA_API __declspec(dllexport)
#pragma comment( lib, "lua.lib" )
#pragma comment( lib, "lualib.lib" )
#include "lua.h"
#include "lauxlib.h"
#include "lualib.h"
static int l_split (lua_State *L) {
const char *s = luaL_checkstring(L, 1);
const char *sep = luaL_checkstring(L, 2);
const char *e;
int i = 1;
lua_newtable(L); /* result */
/* repeat for each separator */
while ((e = strchr(s, *sep)) != NULL) {
lua_pushlstring(L, s, e-s); /* push substring */
lua_rawseti(L, -2, i++);
s = e + 1; /* skip separator */
}
/* push last substring */
lua_pushstring(L, s);
lua_rawseti(L, -2, i);
return 1; /* return the table */
}
static const luaL_reg strlib[] =
{
{"split", l_split},
{NULL, NULL}
};
/*
** Open test library
*/
LUALIB_API int luaopen_strlib (lua_State *L)
{
luaL_openlib(L, "strlib", strlib, 0);
return 1;
}
| Top |
|
Posted by
| Nick Gammon
Australia (23,120 posts) Bio
Forum Administrator |
Date
| Reply #3 on Sun 13 Nov 2005 07:55 PM (UTC) |
Message
| I tried out your code, it seems to work OK. You realise that it does something similar to what I describe on:
http://www.gammon.com.au/forum/bbshowpost.php?bbsubject_id=6034
It is certainly faster, taking 11 seconds in my test compared to 34 seconds the Lua way (to do 1000000 iterations).
It also behaves slightly differently, returning 1 more element in the case of the speedwalk example:
1="north"
2="north"
3="north"
4="north"
5="north"
6="north"
7="east"
8="east"
9="east"
10="south"
11="south"
12="south"
13="south"
14="ne"
15="ne"
16="ne"
17=""
Your entry 17 seems to be the "empty" text beyond the final newline. I'm not sure whether it should really be there, although you might argue it should be. A test you could add is:
/* push last substring, if it exists */
if (*s)
{
lua_pushstring(L, s);
lua_rawseti(L, -2, i);
}
Anyway, apart from the speed increase, this doesn't really add anything that can't be done in straight Lua, and I am a bit reluctant to expand the library with useful utilities that can already be done reasonably quickly and easily in straight Lua. However feel free to argue for its inclusion, it isn't much extra code.
This contrasts with the new things I recently added:
- Directory scanner
- XML parser
The directory scanner probably simply wasn't possible in MUSHclient and Lua, and the XML parser would have been tedious to write in Lua.
Quote:
... possibly a microsecond timestamping one to the same table?
You mean, like GetInfo (232)?
http://www.gammon.com.au/scripts/doc.php?function=GetInfo
That is already available from the above function call.
|
- Nick Gammon
www.gammon.com.au, www.mushclient.com | Top |
|
Posted by
| Ked
Russia (524 posts) Bio
|
Date
| Reply #4 on Mon 14 Nov 2005 01:49 AM (UTC) |
Message
| As for GetInfo(232) - must've missed it, but it works great. Lua's timing was so bad that I had to Google for a way to cram RDTSC into a dll.
As for the split function... I think it's needed. Why Lua's string library doesn't have it, especially since their documentation has the code for it, is beyond me, but this is a very basic and a very useful thing. Lua has table.concat but is missing string.split for some absurd reason. Basically, if you want to accomplish something as trivial as checking a trigger name in a function by splitting it, you need to either c/p the split() function into your script, or require a file that has it, or compile it yourself and load it as a library.
What I am getting at is that this is a pretty standard thing. I grew used to it first in vbs and then in Python, and Lua not having it out of the box is a bit of a shock. I understand your concerns about it not fitting ideologically together with XML and directory parsing, but at the same time it seems to be a very small and a very useful thing that could maybe slip by. | Top |
|
Posted by
| Nick Gammon
Australia (23,120 posts) Bio
Forum Administrator |
Date
| Reply #5 on Mon 14 Nov 2005 02:34 AM (UTC) Amended on Mon 14 Nov 2005 02:47 AM (UTC) by Nick Gammon
|
Message
| OK, I agree it is rather asymmetric of Lua to provide a function to turn a string into a table, but not vice-versa.
I'll add that as another "utils" function.
I have added a test that the separator be a single character. With your code you could conceivably pass a multi-byte string, which would give unexpected results, or an empty string, which would fail in strange ways.
I have also added a "split count" as the optional 3rd argument. A couple of MUSHclient callbacks (MXP, I think) are passed something like:
arg=value
Where "value" might contains any characters, including the "=" symbol. Thus in this case you would split with one replacement.
|
- Nick Gammon
www.gammon.com.au, www.mushclient.com | Top |
|
The dates and times for posts above are shown in Universal Co-ordinated Time (UTC).
To show them in your local time you can join the forum, and then set the 'time correction' field in your profile to the number of hours difference between your location and UTC time.
17,805 views.
It is now over 60 days since the last post. This thread is closed.
Refresh page
top