[Home] [Downloads] [Search] [Help/forum]


Register forum user name Search FAQ

Gammon Forum

[Folder]  Entire forum
-> [Folder]  MUSHclient
. -> [Folder]  Lua
. . -> [Subject]  Single trigger with multiple regex matches

Single trigger with multiple regex matches

It is now over 60 days since the last post. This thread is closed.     [Refresh] Refresh page


Posted by Soft   (7 posts)  [Biography] bio
Date Wed 25 Jan 2017 01:40 AM (UTC)

Amended on Wed 25 Jan 2017 03:12 AM (UTC) by Soft

Message
I'm trying to use a single trigger to match and send a string pattern. Perhaps it should be setup as a multi-line trigger.

Here is the MUD output I'm trying to match:

Inn of the Dripping Dagger
 N-Battered and gouged wooden door S-Selduth Street  W-Inn of the Drip U-Secon Floor of
 D-The Cellar of t
This cozy old inn is well known as  a favorite watering hole and  resting
place  for hire-swords and it  has a reputation for jovial horseplay that
keeps more timid visitors away. It is a low-ceilinged taproom with two sets of


I want to pull 5 different strings out of that text:


  1. N-Battered and gouged wooden door
  2. S-Selduth Street
  3. W-Inn of the Drip
  4. U-Secon Floor of
  5. D-The Cellar of t


This is the regex I'm using:

\b(?:(?:[NSWUD]|[NS][EW])-)(?:(?:[\w ](?!(?:[NSWUD]|[NS][EW])-))+)\b


This regex seems to work exactly as I intend. I feel quite sure that I've implemented the trigger logic improperly.

The actual trigger:

		<trigger
				name="ExitTrigger"
				enabled="y"
				regexp="y"
				multi_line="n"
				keep_evaluating="y"
				match="\b(?:(?:[NSWUD]|[NS][EW])-)(?:(?:[\w ](?!(?:[NSWUD]|[NS][EW])-))+)\b"
				send_to="12"
				sequence="100">
				<send>ListExit "%0"</send>
		</trigger>


The current behavior causes this trigger to fire once per each line that contains the direction descriptions, and only captures the first matching sequence. I'd like to capture all matching sequences on a given line. The regex works as intended when checking it with regexpal.
[Go to top] top

Posted by Worstje   Netherlands  (899 posts)  [Biography] bio
Date Reply #1 on Wed 25 Jan 2017 12:39 PM (UTC)

Amended on Wed 25 Jan 2017 12:40 PM (UTC) by Worstje

Message
From what I recall (it's been a couple of years since I've been very active with MUSHclient), the functionality is slightly different than you'd expect.

The keep_evaluating trigger option only determines whether other triggers will get matched against the same line, not the same one.

The repeat_on_same_line trigger option controls whether the same trigger gets matched repeatedly to the same line... but only for the purpose of changing the styles with the relevant settings. (Making it useful for stuff like name highlights and the sort.) Presumably, there's a lot of gotchas to repeating scripts on the same line that Nick felt were better avoided as a whole.

For what you want, the best option is to just pull the functionality out into a script, and do the repeat matching in a script. The Lua 're' (regular expression) module is an interface to the same matching engine MUSHclient uses, so if you take a moment to figure out how that works you can just toss your regexp into a loop there and process it as needed.

If you can't figure it out, ask here and I'm sure someone can help you with the details.
[Go to top] top

Posted by Soft   (7 posts)  [Biography] bio
Date Reply #2 on Wed 25 Jan 2017 01:45 PM (UTC)
Message
Thanks for your reply Worstje. I discovered much the same regarding the keep_evaluating flag last night. I wasn't aware of the repeat_on_same_line flag.

I'm new to Lua, and wasn't aware of the 're' module. I even resorted to string.match at one point, simply because I didn't do enough digging regarding regex modules/libraries. Your suggestions for breaking the regex out into a script was definitely the plan, providing I couldn't get it all working with a single trigger. Looks like I'm going down that route today. Thanks for your help!
[Go to top] top

Posted by Worstje   Netherlands  (899 posts)  [Biography] bio
Date Reply #3 on Wed 25 Jan 2017 02:30 PM (UTC)
Message
Having slept, I realize what some reasons for not matching triggers on the same line might be.

(.*+) is a popular pattern - it is probably one the most common ones! Combine that with the fact the option would require keeping track of the last match position to avoid an eternal loop, and it would be really simple for a basic trigger to match 50+ times and cause terrible performance. Hell, given a carelessly enough written trigger, I can imagine various capture groups going crazy with backtracking and cause some sort of exponentially rising match count due to all the ways characters could be matched against the regex.

If a seemingly simple trigger has the option of making the client slow as hell, it could easily give the program the reputation of being way too slow, difficult to use and generally unfit for duty. People who make such sort of simple triggers are not to kind to figure out why it might be the case; they just see it and go 'well, it is simple and something similar works fine on X, I'm switching to something else and will advise everybody to not use MUSHclient because it sucks!'.
[Go to top] top

Posted by Soft   (7 posts)  [Biography] bio
Date Reply #4 on Wed 25 Jan 2017 03:50 PM (UTC)
Message
Thanks again for your insight Worstje. Here's what I've got now:


	<trigger
		name="ExitTrigger"
		enabled="y"
		regexp="y"
		multi_line="y"
		lines_to_match="2"
		keep_evaluating="y"
		match="((?:[NSWUD]|[NS][EW])-).*"
		send_to="12"
		sequence="100">
		<send>ListExit "%0"</send>
	</trigger>


Which pulls out two separate strings:


N-Battered and gouged wooden door S-Selduth Street  W-Inn of the Drip U-Secon Floor of

D-The Cellar of t


I've imported the LPEG.re module, and am attempting to use re.match on each of those two strings.

It seems that with re.match, I should be able to use my original regex to pull out all of the individual strings simultaneously. I can also loop through and pull out one of the individual strings on each loop.

Now I'm having issues implementing either of those regex patterns in the LPEG.re syntax. I really appreciate your time and help in this matter.
[Go to top] top

Posted by Worstje   Netherlands  (899 posts)  [Biography] bio
Date Reply #5 on Wed 25 Jan 2017 07:05 PM (UTC)

Amended on Thu 26 Jan 2017 01:59 PM (UTC) by Worstje

Message
Sorry, it took a while. The matter was a bit more complicated than I liked it to be (I tried to bruteforce a solution with negative look-aheads, but your output wasn't friendly enough for that), so I had to code something up manually only to forget get myself into an infinite loop and losing all my progress. Whoops. xD

Also, my apologies for misdirecting you. The module you need is 'rex', not 're'. I'm getting my languages and scripting environments confused a little bit!

<triggers>
  <trigger
   enabled="y"
   match="^(.*?([NSWEUD]|[NS][EW])-\w.*)$"
   regexp="y"
   send_to="12"
   sequence="100"
  >
  <send>local line = "%0"

-- Match upto first letter of description for accuracy.
local exit_re = rex.new("\\\\s([NSWEUD]|[NS][EW])-\\\\w")

local s, e, m = exit_re:match(line, 1)
while (s ~= nil) do

  local new_s, new_e, new_m = exit_re:match(line, s+1)

  local exit_name = m[1]
  local exit_description = line:sub(e, (new_s or 0)-1)
  -- You may want to add a trim() to exit_description since your game
  -- seems to be littered with spaces in unexpected places. :-)

  -- BEGIN -- Do what you want at this point.

  Note("Exit found: &lt;" .. exit_name .. "&gt;")
  Note("Exit description: &lt;" .. exit_description .."&gt;")

  -- END -- Do what you want at this point.  

  s,e,m = new_s, new_e, new_m
end</send>
  </trigger>
</triggers>


There's your trigger. I did some shenanigans with the trigger itself to get the entire line to match for manual processing. I personally always use the Script Function thing because it is far easier to code for; it also offers some extra features that aren't easy to do in this format. Additionally, it gives the bonus of not having a backslash-infestation with regexes like in this particular case. :-)

The way it basically works is to first match a space, followed by a direction, followed by a dash, followed by one letter. The latter is to be as precise as possible with regards to really matching an exit to avoid false positives. Then it looks for the same pattern a second time, but one character after where it found its first match. This should give the second exit. Next, I just fiddle with the indexes and collected information to put the values you want into useful variables. (Note that s or new_s becomes nil if it cannot find a match, so I need to default it to zero, which will give a -1 and be interpreted by Lua as 'until the end of the string'.) Finally, we take the second match we did initially, and use it as a seed for the next loop.

If you can't figure out how to do the (right-)trim of the exit description, let me know and I'll get you a snippet. It should be easy enough to find on the interwebs, though!
[Go to top] top

Posted by Nick Gammon   Australia  (22,982 posts)  [Biography] bio   Forum Administrator
Date Reply #6 on Wed 25 Jan 2017 09:57 PM (UTC)
Message
Another interesting way of handling this is by using LPEG.

http://www.gammon.com.au/lpeg

First we get a trigger to match what roughly looks like an exit line, then we process the line in LPEG:


<triggers>
  <trigger
   enabled="y"
   match="^\s+([NSWEUD]|[NS][EW])-\w.*"
   regexp="y"
   send_to="12"
   sequence="100"
  >
  <send>

require "re"

exitsGrammar = re.compile[[
  ExitLine         &lt;- {| (" "+ Exit)+ |}
  Direction        &lt;- ( [NS] [EW] ) / [NSEWUD] "-" 
  Exit             &lt;- { Direction (. !Direction)+  }
]]

exits = exitsGrammar:match ("%0")

for k, v in ipairs (exits) do
  print (v)
end -- for

</send>
  </trigger>
</triggers>


Template:pasting For advice on how to copy the above, and paste it into MUSHclient, please see Pasting XML.



Breaking up the grammar (and converting back the "&gt;" symbols to how they look on the screen):


  ExitLine         <- {| (" " Exit)+ |}


An exit line consists of one or more sets of spaces followed by an exit. The brackets "{| ... |}" mean to capture into a table.



  Direction        <- ( [NS] [EW] ) / [NSEWUD] "-" 


A direction is either N or S followed by E or W, or one of "NSEWUD". Whichever choice is taken we also need a hyphen afterwards. This makes sure that things like N in the middle of a description aren't considered an exit.


  Exit             <- { Direction (. !Direction)+  }


An exit is a direction, followed by a single character which is not followed by another direction. Thus it gradually consumes characters until it hits another direction. The braces mean that this is the thing that is captured.

Example output from your test:


N-Battered and gouged wooden door
S-Selduth Street 
W-Inn of the Drip
U-Secon Floor of

- Nick Gammon

www.gammon.com.au, www.mushclient.com
[Go to top] top

Posted by Soft   (7 posts)  [Biography] bio
Date Reply #7 on Wed 25 Jan 2017 11:50 PM (UTC)

Amended on Thu 26 Jan 2017 12:37 AM (UTC) by Soft

Message
EDIT: Oops, I refreshed the page from this morning and created a double post.

Thanks for your input Nick. I really appreciate both yours and Worstje's help in this matter. I'm attempting to implement both solutions now, so I can understand any pros/cons.
[Go to top] top

Posted by Soft   (7 posts)  [Biography] bio
Date Reply #8 on Thu 26 Jan 2017 01:06 AM (UTC)
Message
After attempting to implement both of your solutions, I keep running into pattern matching issues.

With Worstje's solution, the variables "s, m, e" were always nil, even after tweaking and double-checking the regex.

With Nick's solution, I get a pattern error. I'm not sure if this is due to not importing the module properly, but I made sure to require tprint, lpeg (as well as character classes and functions, idk if that was necessary), and re on initialization.


pattern error near '| (" "+ Exit)+ |}
  D...'
stack traceback:
	[C]: in function 'error'
	X:\MUSHclient\lua\re.lua:85: in function <X:\MUSHclient\lua\re.lua:81>
	[C]: in function 'match'
	X:\MUSHclient\lua\re.lua:200: in function 'compile'
	[string "Trigger: "]:8: in main chunk


I've reviewed the line but nothing jumps out to me as different from the LPEG documentation that Nick linked.


require "re"

exitsGrammar = re.compile[[
  ExitLine		&lt;- {| (" "+ Exit)+ |}
  Direction		&lt;- ( [NS] [EW] ) / [NSEWUD] "-" 
  Exit			&lt;- { Direction (. !Direction)+  }
]]

exits = exitsGrammar:match ("%0")

for k, v in ipairs (exits) do
  print (v)
end -- for
[Go to top] top

Posted by Worstje   Netherlands  (899 posts)  [Biography] bio
Date Reply #9 on Thu 26 Jan 2017 02:30 AM (UTC)

Amended on Thu 26 Jan 2017 02:39 AM (UTC) by Worstje

Message
I am not sure why my example wouldn't work. I tested it with the example text from your first post and I got the results you wanted out of it.

I'll see if I can figure it out.

Problem found: the forum ate some backslashes. Oddly enough they do appear whilst I am editing the post, but on the forum itself they don't show. Weird. I'm sure Nick will fix that issue at some point. :-)

Make sure there are a total of _4_ backslashes before both the s near the start as well as the w on the line that defines the exit_re variable. (Which means you need to add 2 extra in front of each.)
[Go to top] top

Posted by Nick Gammon   Australia  (22,982 posts)  [Biography] bio   Forum Administrator
Date Reply #10 on Thu 26 Jan 2017 03:20 AM (UTC)
Message
You have to double backslashes for the forum. MUSHclient will do that for you. See Edit menu: Convert Clipboard Forum Codes.

- Nick Gammon

www.gammon.com.au, www.mushclient.com
[Go to top] top

Posted by Soft   (7 posts)  [Biography] bio
Date Reply #11 on Thu 26 Jan 2017 04:37 AM (UTC)
Message
Thanks so much for all your help guys, I got it working pretty much perfectly. There was one bug I noticed when directional text butts up against an ASCII minimap, but I don't think there's much to be done there, and the directions are still readable. I threw the plugin up on github and put a thank you to both of you in there. I really appreciate the help, and Nick I love MUSHclient. I'm blown away by your dedication over the years, awesome job well done.

https://github.com/codypersinger/MUSHclient_FK_Plugin
[Go to top] top

Posted by Nick Gammon   Australia  (22,982 posts)  [Biography] bio   Forum Administrator
Date Reply #12 on Thu 26 Jan 2017 05:04 AM (UTC)
Message
Thanks for the acknowledgements!

BTW, I notice from your screenshot you missed a possible easy way of handling it. The exit direction is a different colour to the description.

If you made a trigger script you get style runs as the fourth argument. By "walking" the styles you would probably find that you got:

yellow:direction
green:description

- Nick Gammon

www.gammon.com.au, www.mushclient.com
[Go to top] top

Posted by Soft   (7 posts)  [Biography] bio
Date Reply #13 on Thu 26 Jan 2017 06:43 AM (UTC)
Message
You're right of course Nick. Unfortunately the color of the strings changes somewhat often depending on what area you are in, and I'm not familiar enough with the game to know how many possibilities there are.
[Go to top] top

The dates and times for posts above are shown in Universal Co-ordinated Time (UTC).

To show them in your local time you can join the forum, and then set the 'time correction' field in your profile to the number of hours difference between your location and UTC time.


31,424 views.

It is now over 60 days since the last post. This thread is closed.     [Refresh] Refresh page

Go to topic:           Search the forum


[Go to top] top

Quick links: MUSHclient. MUSHclient help. Forum shortcuts. Posting templates. Lua modules. Lua documentation.

Information and images on this site are licensed under the Creative Commons Attribution 3.0 Australia License unless stated otherwise.

[Home]


Written by Nick Gammon - 5K   profile for Nick Gammon on Stack Exchange, a network of free, community-driven Q&A sites   Marriage equality

Comments to: Gammon Software support
[RH click to get RSS URL] Forum RSS feed ( https://gammon.com.au/rss/forum.xml )

[Best viewed with any browser - 2K]    [Hosted at HostDash]