Posted by
| Nick Gammon
Australia (23,133 posts) Bio
Forum Administrator |
Message
| Lua has an inbuilt regular expression system, including:
- string.find
- string.gfind
- string.gsub
However these use its internal regular expression syntax, which is different from that which MUSHclient users are used to.
You can use those, however the MUSHclient implementation also includes, as an extension, the PCRE (Perl Compatible Regular Expression) library. This is closely modelled on the lrexlib library written by Reuben Thomas and Shmuel Zeigerman.
Documentation for the PCRE syntax is at:
http://mushclient.com/pcre/pcrepattern.html
This functionality is in the table "rex", which has three functions:
- rex.new (p [, cf [, lo]])
Compiles and returns a PCRE regular expression object for the pattern p, subject to compilation flags cf (a number) and locale lo (a string).
- flags - flags for use in regular expressions
- version - the current version of PCRE
If you use this library, you get the exact same behaviour that you do with MUSHclient regular expressions (it is the same code). The only thing you need to be aware of is that, if you use literal strings, then backslashes need to be "escaped" (because backslashes in Lua strings already have a meaning) by putting another backslash in front of them.
To use a PCRE regular expression you have two steps:
Here is an example:
-- create new regular expression
re = rex.new ("(?P<who>.+) goes (?P<where>.+)")
-- match regular expression to a string
a, b, c = re:match ("john goes east")
-- print results
print (a, b, c) --> 1 14 table: 00638D20
-- examine table of matching patterns:
table.foreach (c, print)
-- output:
1 john
2 east
where east
who john
This example shows that:
- Wildcard 1 is "john"
- Wildcard 2 is "east"
- Named wildcard "where" is "east"
- Named wildcard "who" is "john"
Also, the first matching column (1-relative) is 1, and the last matching column is 14.
The three functions exposed by a compiled regular expression are:
- r:match (s [, st [, ef]])
- This function returns the start and end point of the first match of the compiled regexp r in the string s, starting from offset st (a number), subject to execution flags ef (a number). The offset is 1-relative (that is, the first column is number 1). If you give a negative number then that is counted from the right. For example, offset -2 starts 2 characters from the right-hand size of the string.
- Substring matches ("captures" in Lua terminology) are returned as a third result, in a table (this table contains false in the positions where the corresponding sub-pattern did not match).
- r:exec (s [, st [, ef]])
- This function is like r:match except that a table returned as a third result contains offsets of substring matches rather than substring matches themselves.
For example, if the whole match is at offsets 10,20 and substring matches are at offsets 12,14 and 16,19 then the function returns the following: 10, 20, { 12,14,16,19 }.
- r:gmatch (s, f [, n [, ef]])
- Tries to match the regex r against s up to n times (or as many as possible if n is either not given or is not a positive number), subject to execution flags ef.
- Each time there is a match, f is called as f(m, t), where m is the matched string and t is a table of substring matches (this table contains false in the positions where the corresponding sub-pattern did not match.).
- If f returns a true value, then gmatch immediately returns;
- gmatch returns the number of matches made.
Here is an example of gmatch:
-- create new regular expression which matches words
re = rex.new ("([[:word:]]+)")
-- make a function to process the results
function f (m, t)
Note ("match = ", m)
end
-- match regular expression to a string
re:gmatch ("john goes east", f)
-- output:
match = john
match = goes
match = east
Now with Lua you can create anonymous functions (ie. inline functions) so you can do the match in one statement:
re:gmatch ("john goes east",
function (m, t) Note ("match = ", m) end)
In this case the unnamed function is passed directly to the gmatch function.
And, to simplify further, we can create and execute the regular expression in one statement, thus not needing to make a variable for the compiled regular expression either:
rex.new ("([[:word:]]+)"):gmatch ("john goes east",
function (m, t) Note ("match = ", m) end)
To use the 'version' function, try this:
print (rex.version ()) --> 4.3 21-May-2003
To use the 'flags' function, try this:
table.foreach (rex.flags (), print)
-- output (sorted):
ANCHORED 16
CASELESS 1
DOLLAR_ENDONLY 32
DOTALL 4
EXTENDED 8
EXTRA 64
MULTILINE 2
NOTBOL 128
NOTEMPTY 1024
NOTEOL 256
NO_AUTO_CAPTURE 4096
UNGREEDY 512
UTF8 2048
The following are valid compile-time flags:
- ANCHORED
- CASELESS
- DOLLAR_ENDONLY
- DOTALL
- EXTENDED
- EXTRA
- MULTILINE
- NO_AUTO_CAPTURE
- UNGREEDY
- UTF8
The following are valid execution-time flags:
- ANCHORED
- NOTBOL
- NOTEOL
- NOTEMPTY
|
- Nick Gammon
www.gammon.com.au, www.mushclient.com | Top |
|