Register forum user name Search FAQ

Gammon Forum

Notice: Any messages purporting to come from this site telling you that your password has expired, or that you need to verify your details, confirm your email, resolve issues, making threats, or asking for money, are spam. We do not email users with any such messages. If you have lost your password you can obtain a new one by using the password reset link.

Due to spam on this forum, all posts now need moderator approval.

 Entire forum ➜ MUSHclient ➜ Lua ➜ Regular expressions

Regular expressions

It is now over 60 days since the last post. This thread is closed.     Refresh page


Posted by Nick Gammon   Australia  (23,133 posts)  Bio   Forum Administrator
Date Wed 24 Nov 2004 08:09 AM (UTC)

Amended on Sat 05 Nov 2005 04:56 AM (UTC) by Nick Gammon

Message
Lua has an inbuilt regular expression system, including:


  • string.find

  • string.gfind

  • string.gsub


However these use its internal regular expression syntax, which is different from that which MUSHclient users are used to.

You can use those, however the MUSHclient implementation also includes, as an extension, the PCRE (Perl Compatible Regular Expression) library. This is closely modelled on the lrexlib library written by Reuben Thomas and Shmuel Zeigerman.

Documentation for the PCRE syntax is at:


http://mushclient.com/pcre/pcrepattern.html


This functionality is in the table "rex", which has three functions:


  • rex.new (p [, cf [, lo]])

    Compiles and returns a PCRE regular expression object for the pattern p, subject to compilation flags cf (a number) and locale lo (a string).

  • flags - flags for use in regular expressions

  • version - the current version of PCRE


If you use this library, you get the exact same behaviour that you do with MUSHclient regular expressions (it is the same code). The only thing you need to be aware of is that, if you use literal strings, then backslashes need to be "escaped" (because backslashes in Lua strings already have a meaning) by putting another backslash in front of them.

To use a PCRE regular expression you have two steps:


  • Compile it

  • Execute it


Here is an example:



-- create new regular expression

re = rex.new ("(?P<who>.+) goes (?P<where>.+)")

-- match regular expression to a string

a, b, c = re:match ("john goes east")

-- print results

print (a, b, c) --> 1 14 table: 00638D20

-- examine table of matching patterns:

table.foreach (c, print)

-- output:

1 john
2 east
where east
who john


This example shows that:


  • Wildcard 1 is "john"

  • Wildcard 2 is "east"

  • Named wildcard "where" is "east"

  • Named wildcard "who" is "john"


Also, the first matching column (1-relative) is 1, and the last matching column is 14.

The three functions exposed by a compiled regular expression are:


  • r:match (s [, st [, ef]])


    • This function returns the start and end point of the first match of the compiled regexp r in the string s, starting from offset st (a number), subject to execution flags ef (a number). The offset is 1-relative (that is, the first column is number 1). If you give a negative number then that is counted from the right. For example, offset -2 starts 2 characters from the right-hand size of the string.

    • Substring matches ("captures" in Lua terminology) are returned as a third result, in a table (this table contains false in the positions where the corresponding sub-pattern did not match).


  • r:exec (s [, st [, ef]])

    • This function is like r:match except that a table returned as a third result contains offsets of substring matches rather than substring matches themselves.

      For example, if the whole match is at offsets 10,20 and substring matches are at offsets 12,14 and 16,19 then the function returns the following: 10, 20, { 12,14,16,19 }.


  • r:gmatch (s, f [, n [, ef]])


    • Tries to match the regex r against s up to n times (or as many as possible if n is either not given or is not a positive number), subject to execution flags ef.

    • Each time there is a match, f is called as f(m, t), where m is the matched string and t is a table of substring matches (this table contains false in the positions where the corresponding sub-pattern did not match.).

    • If f returns a true value, then gmatch immediately returns;

    • gmatch returns the number of matches made.


    Here is an example of gmatch:

    
    
    -- create new regular expression which matches words
    
    re = rex.new ("([[:word:]]+)")
    
    -- make a function to process the results
    
    function f (m, t)
      Note ("match = ", m)
    end
    
    -- match regular expression to a string
    
    re:gmatch ("john goes east", f)
    
    -- output:
    
    match = john
    match = goes
    match = east
    
    


    Now with Lua you can create anonymous functions (ie. inline functions) so you can do the match in one statement:

    
    re:gmatch ("john goes east", 
               function (m, t) Note ("match = ", m) end)
    


    In this case the unnamed function is passed directly to the gmatch function.

    And, to simplify further, we can create and execute the regular expression in one statement, thus not needing to make a variable for the compiled regular expression either:

    
    rex.new ("([[:word:]]+)"):gmatch ("john goes east", 
               function (m, t)  Note ("match = ", m) end)
    






To use the 'version' function, try this:


print (rex.version ()) --> 4.3 21-May-2003



To use the 'flags' function, try this:


table.foreach (rex.flags (), print)

-- output (sorted):

ANCHORED 16
CASELESS 1
DOLLAR_ENDONLY 32
DOTALL 4
EXTENDED 8
EXTRA 64
MULTILINE 2
NOTBOL 128
NOTEMPTY 1024
NOTEOL 256
NO_AUTO_CAPTURE 4096
UNGREEDY 512
UTF8 2048



The following are valid compile-time flags:


  • ANCHORED
  • CASELESS
  • DOLLAR_ENDONLY
  • DOTALL
  • EXTENDED
  • EXTRA
  • MULTILINE
  • NO_AUTO_CAPTURE
  • UNGREEDY
  • UTF8



The following are valid execution-time flags:


  • ANCHORED
  • NOTBOL
  • NOTEOL
  • NOTEMPTY

- Nick Gammon

www.gammon.com.au, www.mushclient.com
Top

Posted by Nick Gammon   Australia  (23,133 posts)  Bio   Forum Administrator
Date Reply #1 on Thu 25 Nov 2004 11:22 PM (UTC)

Amended on Sat 05 Nov 2005 05:01 AM (UTC) by Nick Gammon

Message
From the lrexlib documentation, a useful thing to do is this:


-- Default constructor 
setmetatable(rex, {__call =
             function (self, p, cf, lo)
               return self.new(p, cf, lo)
             end})


If you do that, then you can make a regexp like this:


r = rex ("(a|b|c)")


What this has effectively done is overload the "call" behaviour for rex. This lets you treat what is really a table as a function and call it using the function-call syntax. If you do, it actually calls the rex.new function with the rex library.


Now we can make a quick "find string" function like this:


-- partial string.find equivalent
function rex.find(s, p, st)
  return rex(p):match(s, st)
end


This returns the result of doing a PCRE match on the first argument, using the second argument as a pattern. eg.


print (rex.find ("I see foo here", "foo"))





[EDIT]

However, I can't really see what this achieves, to do that example you could simply do:


-- partial string.find equivalent
function rex.find(s, p, st)
  return rex.new (p):match(s, st)
end


This bypasses the need to do the table "call", you simply make rex.find do the "new" followed by the "match".

- Nick Gammon

www.gammon.com.au, www.mushclient.com
Top

Posted by Nick Gammon   Australia  (23,133 posts)  Bio   Forum Administrator
Date Reply #2 on Sat 05 Nov 2005 04:57 AM (UTC)
Message
For documentation on using PCRE see PCRE Regular Expression Details.

- Nick Gammon

www.gammon.com.au, www.mushclient.com
Top

The dates and times for posts above are shown in Universal Co-ordinated Time (UTC).

To show them in your local time you can join the forum, and then set the 'time correction' field in your profile to the number of hours difference between your location and UTC time.


30,870 views.

It is now over 60 days since the last post. This thread is closed.     Refresh page

Go to topic:           Search the forum


[Go to top] top

Information and images on this site are licensed under the Creative Commons Attribution 3.0 Australia License unless stated otherwise.