Register forum user name Search FAQ

Gammon Forum

Notice: Any messages purporting to come from this site telling you that your password has expired, or that you need to verify your details, confirm your email, resolve issues, making threats, or asking for money, are spam. We do not email users with any such messages. If you have lost your password you can obtain a new one by using the password reset link.

Due to spam on this forum, all posts now need moderator approval.

 Entire forum ➜ MUSHclient ➜ Lua ➜ String.find not working for regex strings in table

String.find not working for regex strings in table

It is now over 60 days since the last post. This thread is closed.     Refresh page


Posted by Natasi   (79 posts)  Bio
Date Wed 23 May 2012 02:00 AM (UTC)
Message
I have a table (example below) filled with about 700 entries, which I go through using string.find and compare to the incoming trigger. Issue is, string.find is not working with any regex in the line, it will only match up on on plain text matches.

triggerstext = {}
triggerstext["Your mouth turns up as your face (.*?) into a smile."]=function() Note("Look mom, I found you!"); end

Here is the function I parse the trigger with:

function parseAll(label, trigger, wildcard)

for _, trig in pairs (triggerstext) do
 a = string.find(trigger, _);
 if a then
   trig();
 end
end

end

So the above will only ever match on plain, not on the regex portion. I have also written this as a regex function using, but it is extremely slow when going over 700 entries in the table :

for _, trig in pairs (triggerstext) do
re = rex.new (_);
a, b, c = re:match (trigger);
if a then
trig(c); --fires function
end
end

Any help with the string.find issue would be appreciated, or if anyone knows why the regex function is so insanely slow, that would help also. My LUA is not advanced enough yet to figure this out so far.
Top

Posted by Fiendish   USA  (2,534 posts)  Bio   Global Moderator
Date Reply #1 on Wed 23 May 2012 02:42 AM (UTC)
Message
Lua does not use regex as part of the language. Lua uses its own pattern format described here: http://www.lua.org/pil/20.2.html

You probably want to precompile your rexes instead of doing it again 700 times every line.

https://github.com/fiendish/aardwolfclientpackage
Top

Posted by Natasi   (79 posts)  Bio
Date Reply #2 on Wed 23 May 2012 04:32 AM (UTC)
Message
Ahhhh I was using (.*?) and should have been using (.*)... that fixed that issue, thanks!

As for the re:match one precompile, I tried that using the below method and it was still very slow. On load of the script I would precompile all the triggers then go through them on trigger.

function preParse()
i = 1;
for _, trig in pairs (plaintext) do
_G ["re_"..i] = rex.new (_);
i= i+1;
end
end


function parseAll1(label, trigger, wildcard)
tt = 1
for _, trig in pairs (plaintext) do
a, b, c = _G["re_"..tt]:match (trigger);
if a then
trig(c);
end
tt = tt+1;
end
end


Is there a better way of writing this?
Top

Posted by Nick Gammon   Australia  (23,133 posts)  Bio   Forum Administrator
Date Reply #3 on Wed 23 May 2012 05:45 AM (UTC)
Message
Your original one failed because the Lua regex does not recognize (.*?) syntax. This works:


triggerstext = {}
triggerstext["Your mouth turns up as your face (.*) into a smile."]=function() Note("Look mom, I found you!"); end

function parseAll(label, trigger, wildcard)

  for pattern, trig in pairs (triggerstext) do
   if string.find (trigger, pattern) then
      trig ()
      break
   end
  end

end

local start = utils.timer ()
parseAll ("foo", "Your mouth turns up as your face Nick into a smile.")
print (utils.timer () - start)



Output:


Look mom, I found you!
0.00070651416899636


So under a millisecond for detecting that. I added the "break" because once you get a match you don't need to keep trying.

Most of that time would have been processing the match, because if I send in a non-matching string it only took about 7 microseconds.

- Nick Gammon

www.gammon.com.au, www.mushclient.com
Top

Posted by Nick Gammon   Australia  (23,133 posts)  Bio   Forum Administrator
Date Reply #4 on Wed 23 May 2012 05:56 AM (UTC)
Message
What would speed this up, too, would be to have this trigger (which does the 700 matches) be a low-priority trigger, down the list in your trigger list behind other "more likely to match" triggers. For example, a prompt. So if you get a prompt your prompt trigger fires first, which saves even testing a single item in this table.

- Nick Gammon

www.gammon.com.au, www.mushclient.com
Top

Posted by Nick Gammon   Australia  (23,133 posts)  Bio   Forum Administrator
Date Reply #5 on Wed 23 May 2012 05:57 AM (UTC)

Amended on Wed 23 May 2012 05:58 AM (UTC) by Nick Gammon

Message
Look up string.find in the inbuilt help to find the exact Lua regexp syntax. The Lua regexp is quite fast, but its syntax differs a bit to the one used by triggers.

This is the web version (you get the same stuff in the help):

http://www.gammon.com.au/scripts/doc.php?lua=string.find

- Nick Gammon

www.gammon.com.au, www.mushclient.com
Top

Posted by Natasi   (79 posts)  Bio
Date Reply #6 on Wed 23 May 2012 11:26 AM (UTC)
Message
If my triggertext had multiple wildcards in it, how would I capture those?

In the code below, I would want to see the first wildcard output, which would be 'up', and the second, which would be 'breaks'. I did a quick version doing a, b, c = string.find() and I could get the 'up' by outputting c, but unless I added a d, e, f (etc) it wouldn't catch additional wildcards.

triggerstext = {}
triggerstext["Your mouth turns (.*) as your face (.*) into a smile."]=function() Note("Look mom, I found you!"); end

function parseAll(label, trigger, wildcard)

for pattern, trig in pairs (triggerstext) do
if string.find (trigger, pattern) then
trig ()
break
end
end

end

local start = utils.timer ()
parseAll ("foo", "Your mouth turns up as your face Nick into a smile.")
print (utils.timer () - start)
Top

Posted by Natasi   (79 posts)  Bio
Date Reply #7 on Wed 23 May 2012 11:34 AM (UTC)
Message
Another issue I've come across, it seems (wildcard1|wildcard2) splits do not work with Lua string.find. Would it be wise to split all my triggers apart (I have about 1284 '|' grouped triggers) into their own individual lines or try to find a way to speed up the regex version of this?
Top

Posted by Nick Gammon   Australia  (23,133 posts)  Bio   Forum Administrator
Date Reply #8 on Wed 23 May 2012 11:59 AM (UTC)
Message
Natasi said:

If my triggertext had multiple wildcards in it, how would I capture those?


Like this:


triggerstext = {}
triggerstext["Your mouth turns (.*) as your face (.*) into a smile."]
             =
             function(where, who) 
               Note("Look " .. who .. ", I found you! and your face turned " .. where); 
             end

function parseAll(label, trigger, wildcard)

  for pattern, trig in pairs (triggerstext) do
    local results = { string.match (trigger, pattern) }
    if next (results) then
      trig (unpack (results))
      break
   end
  end

end

local start = utils.timer ()
parseAll ("foo", "Your mouth turns up as your face Nick into a smile.")
print ((utils.timer () - start) )



Output:


Look Nick, I found you! and your face turned up
0.00056068558478728



Quote:

but it is extremely slow when going over 700 entries in the table :


...

Quote:

I have about 1284 '|' grouped triggers


They seem to be multiplying!

Yes, the "|" operator does not work in Lua regexps.

- Nick Gammon

www.gammon.com.au, www.mushclient.com
Top

Posted by Natasi   (79 posts)  Bio
Date Reply #9 on Wed 23 May 2012 04:03 PM (UTC)
Message
Thanks Nick!

The 700 entries referred to this:

triggerstext["Your mouth turns (.*) as your face (.*) into a smile."]...



The 1284 refers to the multiple within:

triggerstext["Your mouth turns (.*) as your face (.*) into a smile.|I like pie.|This is another one to match"]...


So I take it I just have to split them apart then, just to be sure, the string.math/find over the soon to be almost 2000 triggers will still parse faster than the regex over 700?
Top

Posted by Worstje   Netherlands  (899 posts)  Bio
Date Reply #10 on Wed 23 May 2012 04:13 PM (UTC)
Message
Natasi, it very likely will. The reason is that the complexity is less: PCRE has tons of options but like a car, the extra weight also carries its toll. Lua has less features and is thus less complex, meaning it can generally execute equivalent patterns faster.

Also, I have experience using several thousand builtin MUSHclient triggers myself, and while it lagged things just a little it was still something I didn't need to pay much heed to. This was 4+ years ago, so I imagine performance has only increased since.
Top

Posted by Nick Gammon   Australia  (23,133 posts)  Bio   Forum Administrator
Date Reply #11 on Thu 24 May 2012 02:00 AM (UTC)
Message
It seems like a lot, but yes you will have to live without the "|" if you are going to use Lua.

I suggest you "anchor" them, that should make it much faster:


"^Your mouth turns (.*) as your face (.*) into a smile.$"


The reason is, without the anchor, say it gets:


There is a shop here.


Now it compares the regexp "Y" (as "Your mouth") to "T" (as in "There is a shop here."). It finds no match. But without the anchor, it then moves in one character and tries again. After all the line might say:


There is a shop here. Your mouth turns blue as your face crinkles into a smile.


It gets no match on the "h" and tries again. And again. And again.

But with the anchor (the "^" character) if it doesn't match at the start of the line it stops immediately.

If that isn't fast enough, you could split them up a bit.

For example, all the triggers starting "You" could be put into a group (separate table). Then a single test (does the line start with "You"?) could let you decide whether to test a further 100 triggers or no.

- Nick Gammon

www.gammon.com.au, www.mushclient.com
Top

The dates and times for posts above are shown in Universal Co-ordinated Time (UTC).

To show them in your local time you can join the forum, and then set the 'time correction' field in your profile to the number of hours difference between your location and UTC time.


34,054 views.

It is now over 60 days since the last post. This thread is closed.     Refresh page

Go to topic:           Search the forum


[Go to top] top

Information and images on this site are licensed under the Creative Commons Attribution 3.0 Australia License unless stated otherwise.