Posted by
| Nick Gammon
Australia (23,133 posts) Bio
Forum Administrator |
Message
| Just to kick-start this, here is my preliminary attempt to do a JSON grammar in LPEG.
-- See: http://www.ietf.org/rfc/rfc4627.txt?number=4627
jsongrammar = lpeg.P {
"JSON_text"; -- main rule name
-- A JSON text is a serialized object or array.
JSON_text = lpeg.V "object" + lpeg.V "array";
-- separators
name_separator = lpeg.V "ws" * ":" * lpeg.V "ws";
value_separator = lpeg.V "ws" * "," * lpeg.V "ws";
-- whitespace
ws = ( lpeg.P " " + "\t" + "\n" + "\r" )^0; -- zero or more of space, tab, newline, cr
-- A JSON value MUST be an object, array, number, or string, or one of
-- the following three literal names: false, null, true
value = lpeg.C ( -- capture values for now
lpeg.P "false" +
"null" +
"true" +
lpeg.V "object" +
lpeg.V "array" +
lpeg.V "number" +
lpeg.V "string"
);
-- objects are comma-separated name/value pairs inside curly brackets
begin_object = lpeg.V "ws" * "{" * lpeg.V "ws";
end_object = lpeg.V "ws" * "}" * lpeg.V "ws";
object = lpeg.V "begin_object" *
( lpeg.V "member" * ( lpeg.V "value_separator" * lpeg.V "member" )^0 )^-1 *
lpeg.V "end_object";
-- object member is a name/value pair
member = lpeg.V "string" * lpeg.V "name_separator" * lpeg.V "value";
-- arrays are comma-separated values inside square brackets
begin_array = lpeg.V "ws" * "[" * lpeg.V "ws";
end_array = lpeg.V "ws" * "]" * lpeg.V "ws";
array = lpeg.V "begin_array" *
( lpeg.V "value" * ( lpeg.V "value_separator" * lpeg.V "value" )^0 )^-1 *
lpeg.V "end_array";
-- inside a string is \x or anything other than a quote
char = (lpeg.P '\\' * lpeg.P (1)) +
(lpeg.P (1) - '"');
-- strings are "<something>"
string = lpeg.P '"' * (lpeg.V "char"^0) * '"';
-- numbers
number = (lpeg.P '-')^-1 * lpeg.V "int" * (lpeg.V "frac")^-1 * (lpeg.V "exp")^-1;
digit = lpeg.R "09"; -- any digit
digit1_9 = lpeg.R "19"; -- digits 1 to 9
e = lpeg.P "e" + "E"; -- e or E
exp = lpeg.V "e" * (lpeg.P "-" + "+")^-1 * lpeg.V "digit"^1; -- exponent
frac = lpeg.P "." * lpeg.V "digit"^1; -- fractional part
int = lpeg.P "." + (lpeg.V "digit1_9" * lpeg.V "digit"^0 ); -- integer part
} -- end of jsongrammar
result = lpeg.match (lpeg.Ct (jsongrammar),
[[
{
"Image": {
"Width": 800,
"Height": 600,
"Title": "View from \"15th\" \r \n \\ Floor",
"Thumbnail": {
"Url": "http://www.example.com/image/481989943",
"Height": 125,
"Width": "100",
"Nick": -1234e24
},
"IDs": [116, 943, 234, 38793]
}
}
]])
if result then
print "matched!"
tprint (result)
else
print "no match"
end -- if
The outputting is not great, I threw in a lpeg.C (capture) around the value part, so I can see what JSON values are being output. This is the result:
matched!
1="{
"Width": 800,
"Height": 600,
"Title": "View from \"15th\" \r \n \\ Floor",
"Thumbnail": {
"Url": "http://www.example.com/image/481989943",
"Height": 125,
"Width": "100",
"Nick": -1234e24
},
"IDs": [116, 943, 234, 38793]
}
"
2="800"
3="600"
4=""View from \"15th\" \r \n \\ Floor""
5="{
"Url": "http://www.example.com/image/481989943",
"Height": 125,
"Width": "100",
"Nick": -1234e24
}"
6=""http://www.example.com/image/481989943""
7="125"
8=""100""
9="-1234e24"
10="[116, 943, 234, 38793]
"
11="116"
12="943"
13="234"
14="38793"
I didn't look at the Lua JSON module (currently supplied with MUSHclient) but did look at the RFC which defines the JSON syntax, and tried to simply follow their grammar diagrams.
I gave up a bit on their fairly tedious string definitions and resorted to matching on "<something>" where the <something> is any character other than a quote, or the sequence \x where x can be anything. In practice the x should be limited to the values in the RFC (to be pedantic) and it should also allow for their somewhat imaginative idea of \uxxxx for Unicode sequences.
The outputting should be written in such a was that it outputs nice Lua tables, but this at least kicks of the grammar match.
References:
http://json.org/
http://www.ietf.org/rfc/rfc4627.txt?number=4627 |
- Nick Gammon
www.gammon.com.au, www.mushclient.com | Top |
|