[Home] [Downloads] [Search] [Help/forum]

Gammon Software Solutions forum

See www.mushclient.com/spam for dealing with forum spam. Please read the MUSHclient FAQ!

[Folder]  Entire forum
-> [Folder]  MUSHclient
. -> [Folder]  General
. . -> [Subject]  RegExp VM

Home  |  Users  |  Search  |  FAQ
Username:
Register forum user name
Password:
Forgotten password?
(New message)
Subject: RegExp VM
Name:
Your forum user name.
Register forum user name
Password:
Your forum password.
Forgotten password?
Message:
Message to be posted (in English, please).
Forum codes:
Check this if your message uses 'forum codes' or templates (auto-detected for new posts).
Forum codes Templates

Save this message ...


Subject review (reverse sequence)

Pages: 1 2  3  

Posted by Nick Gammon   Australia  (18,772 posts)  [Biography] bio   Forum Administrator
Date Wed 28 Apr 2004 09:34 PM (UTC)  quote  ]
Message
Yes, perhaps time to stop discussing it. :)

I don't think my method is perfect, however it was quick and easy to do - easier than some sort of on-the-fly state machine, and the easy solution has the advantage of simplicity, and simplicity has the advantage that it is likely to have less bugs.

For those that recall the many, many, discussions we had in the past about multi-line triggers and why they are hard to do, this forum thread confirms that it is a problem that doesn't have a quick, simple solution.

It reminds me of the saying:

Quote:

For every problem there is a Solution. Neat, Simple, Wrong!


My solution, which is now available for playing with, may not be the neatest in the world, but if it works to easily process stat rolls, or player stats, or inventory lists, or multi-line room descriptions, then I will be happy. :)

- Nick Gammon

www.gammon.com.au, www.mushclient.com
[Go to top] top

Posted by David Haley   USA  (3,881 posts)  [Biography] bio   Moderator
Date Wed 28 Apr 2004 06:34 PM (UTC)  quote  ]
Message
Well, fine, the point may be moot. Nonetheless, I still think that you are trying to do something that just can't be correct in all cases. The day you bring me a working program or algorithm, I'll be convinced; until then let us agree to disagree on the feasibility of your project. :)

David Haley aka Ksilyan
Head Programmer,
Legends of the Darkstone

http://david.the-haleys.org
[Go to top] top

Posted by Shadowfyr   USA  (1,774 posts)  [Biography] bio
Date Wed 28 Apr 2004 05:48 PM (UTC)  quote  ]

Amended on Wed 28 Apr 2004 05:51 PM (UTC) by Shadowfyr

Message
Geeze.. I think you are still missing the point. lol Both our methods can do the same thing, mine just doesn't need the bucket. Having a script function that returns an array? Umm. Try 'split'... Anyway, I just said I thought taking extra time to create and search a buffer, that mine doesn't need, is inelegant and a hack. This doesn't mean it doesn't work. Inelegant means clumsy and excessive when another method could be more efficient, hack in this case means "the first thing that came to mind", rather than the most efficient solution. I can't see how forcing the user to define a buffer to search, even when the trigger is absolutely and undeniably not greedy, can ever be called efficient or elegant, therefore at the very least *some* other method must be better, even if you don't agree it is mine.

The only thing that mine does that is a hack is the third option for immediately acting on each matched line. Some goofy thing that generates an array from a bloc kof lines is still a hack, since you are not handling the lines as they arrive, in fact it is what you end up doing if you want to do anything useful with Nick's output anyway. No, it doesn't solve any problem I see with the design.

It isn't about "if" they can both do the same things, it is about how they do them and the overhead they introduce that can slow down operation of what, up to this point, has been a client that strives to avoid things that can seriously slow it down.

But again, this is a mute point. I doubt I could successfully write a pre-parser and without a working version of mine, Nick's can and will get the job done. My personal opinion is simply that we are paying a higher price for the behavior than we might have if my system could have been developed instead.

main {
__if (Schrodinger_Cat is Alive or version >= "XP"){
____if version = "Vista" then Performance /= Number_of_Cores;
____call Functional_Code();}
__else
____call Crash_Windows();}
[Go to top] top

Posted by David Haley   USA  (3,881 posts)  [Biography] bio   Moderator
Date Wed 28 Apr 2004 05:26 AM (UTC)  quote  ]
Message
But if you think about it, that's the only way for the regular expression to behave coherently. You can't have a wildcard for every line in the list, simply because you don't know how many lines there are.

Again let's consider:

^Inventory:$(.*?)^$

This time note that I have the non-greedy specifier. This regular expression will match on "Inventory:" followed by a newline followed by anything followed by a blank line.

How are you going to express the notion of making a new wildcard for every line?

It seems to me that the cleanest solution is to just get the set of lines matched, split them by newline, and then do whatever you need to do with each line.

Furthermore, let's assume you wanted to have a sword in your inventory. You could check that like so:

^Inventory:$(.*?)sword(.*?)^$

So I'm not sure what the problem is. You say that we have to go looking in the bucket for what we were looking for... well... let's be more precise, what were we looking for and we can't we find it with the current method?

I'd imagine that a scripting function could be added that given a block of text, returns an array of each line. Basically a split function, that operates on newlines. Would that solve your problem?

David Haley aka Ksilyan
Head Programmer,
Legends of the Darkstone

http://david.the-haleys.org
[Go to top] top

Posted by Shadowfyr   USA  (1,774 posts)  [Biography] bio
Date Wed 28 Apr 2004 04:47 AM (UTC)  quote  ]
Message
Quote:
It seems to me as if you're breaking the elegance of the system for the sake of convenience. It just reeks of "hack" to me. :)


Ah, true, but not more inelegant or a hack than dumping a copy of every line into a bucket and telling your assistant to dig through it after every line you drop in to see if now contains everything you wanted to find. ;) lol

main {
__if (Schrodinger_Cat is Alive or version >= "XP"){
____if version = "Vista" then Performance /= Number_of_Cores;
____call Functional_Code();}
__else
____call Crash_Windows();}
[Go to top] top

Posted by Nick Gammon   Australia  (18,772 posts)  [Biography] bio   Forum Administrator
Date Wed 28 Apr 2004 03:52 AM (UTC)  quote  ]
Message
Quote:

It also has a built in limit, (I assume from another post I saw), so even if the user specifies 200 lines, it cannot match, since the internal limit of the buffer is 100.


There is a built-in limit, yes, basically to limit the number of lines that are stored "just in case" you may want to do a multi-line trigger. After all, each line takes memory.

However you will not be allowed to ask to match more than the number of lines in this limit. It has been increased from 100 to 200 in case some people have big inventories, or "who" lists or whatever.

- Nick Gammon

www.gammon.com.au, www.mushclient.com
[Go to top] top

Posted by David Haley   USA  (3,881 posts)  [Biography] bio   Moderator
Date Wed 28 Apr 2004 02:33 AM (UTC)  quote  ]
Message
Quote:
Huh?? If I have it match three specific lines and *only* those lines, then obviously it is going to return "all matched lines". Your arguing symantics here. It does match the least amount possible, but it returns all that it matched, not all that it *could* match.


I'm not sure I see what you're talking about. The problem of greedy vs. non-greedy comes when you have wildcards in your regular expression. So it wouldn't be matching three specific lines; it's matching a block of text.

I was just commenting on the fact that your statement wasn't really precise in an algorithmic sense.

Quote:
Please, enlighten me how providing a user defined limit or having it automatically fail when it hits 200 lines (check Nick's release notes, this is what his does) differs from Nick's?


I misread your original statement, I did not see the part about failing upon hitting the limit.

Quote:
This is a special case, it is not intended to act like a normal multiline trigger, but to provide a single trigger to solve a problem that currently requires several triggers working together to work. Again, this is *not* the normal behaviour it would have, but an option that could be turned on in cases where such behaviour is wanted. Yes, it doesn't technically work the way multiline triggers are supposed to, but it isn't supposed to in this case and it would be off by default.


It seems to me as if you're breaking the elegance of the system for the sake of convenience. It just reeks of "hack" to me. :)

David Haley aka Ksilyan
Head Programmer,
Legends of the Darkstone

http://david.the-haleys.org
[Go to top] top

Posted by Shadowfyr   USA  (1,774 posts)  [Biography] bio
Date Wed 28 Apr 2004 01:19 AM (UTC)  quote  ]

Amended on Wed 28 Apr 2004 01:20 AM (UTC) by Shadowfyr

Message
Quote:
1. Non-greedy - Store all matched lines until all are found or it fails.

If you're storing all matched lines, you're not really being non-greedy. A non-greedy expression would match the *least* amount of lines possible.


Huh?? If I have it match three specific lines and *only* those lines, then obviously it is going to return "all matched lines". Your arguing symantics here. It does match the least amount possible, but it returns all that it matched, not all that it *could* match.

Quote:
2. Greedy with a user defined limit - Store all matched lines, but fail if the limit is exceeded.

This is greedy, yes. Although there is the problem of matching lines that may not have the terminating condition - imagine you have a list that is 105 lines long, but your limit is 100. When you reach 100, what do you do? How do you *know* that the text beyond those 100 lines still matches the regular expression? Perhaps everything you have "matched" up to now simply does not correspond to the regular expression.


Please, enlighten me how providing a user defined limit or having it automatically fail when it hits 200 lines (check Nick's release notes, this is what his does) differs from Nick's?

If it is 105 lines long and you tell Nick's version to look for 100 lines, it still isn't going to find 105 lines. Nick's answer *and* mine is that the limit means that it is *a limit*. If by that point it has captured all the lines it was asked to, in cases like "(?s)^You have\:$(.*)*" then it is considered a match, it can't look for 105 lines, because you told it to "only" find 100. If the lines is "(?s)^You have\:$(.*)*^$", then the rules change. In this case it would fail if you have 105 lines and return nothing, again because you told it to stop at 100 lines. This is no different if you use mine or Nick's, but mine only *requires* it in the first case, since it can correctly match the second case, even if you have 200, 500, or 1,000,000 lines. It already knows 'when' to stop in such a case.

Quote:
3. Act like each match is a seperate event and return each line + wildcards as they are matched, instead of storing the entire result.

One cannot determine wildcard matches until one has the whole match, because the wildcards are built during the parsing of the whole text.
How does one "return" lines and wildcards as they are matched? Again, you have the problem of terminating conditions.


This is a special case, it is not intended to act like a normal multiline trigger, but to provide a single trigger to solve a problem that currently requires several triggers working together to work. Again, this is *not* the normal behaviour it would have, but an option that could be turned on in cases where such behaviour is wanted. Yes, it doesn't technically work the way multiline triggers are supposed to, but it isn't supposed to in this case and it would be off by default.

main {
__if (Schrodinger_Cat is Alive or version >= "XP"){
____if version = "Vista" then Performance /= Number_of_Cores;
____call Functional_Code();}
__else
____call Crash_Windows();}
[Go to top] top

Posted by David Haley   USA  (3,881 posts)  [Biography] bio   Moderator
Date Wed 28 Apr 2004 (UTC)  quote  ]
Message
My point is that I don't believe such a parser is even possible to build, to be precise and correct at the same time.

Quote:
1. Non-greedy - Store all matched lines until all are found or it fails.

If you're storing all matched lines, you're not really being non-greedy. A non-greedy expression would match the *least* amount of lines possible.
Quote:
2. Greedy with a user defined limit - Store all matched lines, but fail if the limit is exceeded.

This is greedy, yes. Although there is the problem of matching lines that may not have the terminating condition - imagine you have a list that is 105 lines long, but your limit is 100. When you reach 100, what do you do? How do you *know* that the text beyond those 100 lines still matches the regular expression? Perhaps everything you have "matched" up to now simply does not correspond to the regular expression.
Quote:
3. Act like each match is a seperate event and return each line + wildcards as they are matched, instead of storing the entire result.

One cannot determine wildcard matches until one has the whole match, because the wildcards are built during the parsing of the whole text.
How does one "return" lines and wildcards as they are matched? Again, you have the problem of terminating conditions.

Basically, like I said I have the feeling that you're operating under what is convenient to use, but not what is correct in all cases. I'm truly not convinced that it's possible to write a parser as you are suggesting, that can somehow "guess" if input beyond what it received is valid or not.

David Haley aka Ksilyan
Head Programmer,
Legends of the Darkstone

http://david.the-haleys.org
[Go to top] top

Posted by Shadowfyr   USA  (1,774 posts)  [Biography] bio
Date Tue 27 Apr 2004 11:43 PM (UTC)  quote  ]
Message
I have conceeded tha tmy idea does not work with greedy expressions. I did suggest that in those cases it is just as easy to allow the user to specify an 'optional' limit on the number of lines. Or did you miss reading this? You seem to still be operating on the assumption of the original idea, without taking into acount the changes I proposed to make it work more like Nick's. The point here is 'optional' *with* 'greedy triggers', not 'always required'. Nick's always requires it, even when the regular expression itself has no requirement for this. And as I said, it is a difference between a *slight* increased overhead for all regexp and a *major* increase in overhead the moment you start looking for large numbers of lines, since Nick's has to retest the entire buffer of lines it builds, every single time. Mine only tests one line at a time, but *could* be made to do:

1. Non-greedy - Store all matched lines until all are found or it fails.
2. Greedy with a user defined limit - Store all matched lines, but fail if the limit is exceeded.
3. Act like each match is a seperate event and return each line + wildcards as they are matched, instead of storing the entire result.

Nick's does:

1. Assume all triggers are greedy, require a limit to be set and store everything until all matches succeed or one fails. It also has a built in limit, (I assume from another post I saw), so even if the user specifies 200 lines, it cannot match, since the internal limit of the buffer is 100. Unless I misread the meaning of that post where you asked how many lines users are likely to want to test Nick?

Mine is more flexible because it doesn't need the buffer, but can do exactly the same behaviour, when needed.

However, the point is mute anyway, since I don't have the skill with parsers to build one that can correctly delegate each section of the expression to the state machine instructions. Thus, unless someone else built and proved the concept, it isn't going to happen.

main {
__if (Schrodinger_Cat is Alive or version >= "XP"){
____if version = "Vista" then Performance /= Number_of_Cores;
____call Functional_Code();}
__else
____call Crash_Windows();}
[Go to top] top

Posted by Nick Gammon   Australia  (18,772 posts)  [Biography] bio   Forum Administrator
Date Tue 27 Apr 2004 06:23 AM (UTC)  quote  ]
Message
This is pretty-much my point. That particular hypothetical pattern is matching:

Inventory:
(anything at all)
(a blank line)

How does the state engine know to move from (anything at all) to (a blank line) given that (a blank line) falls into the category of (anything at all)?

This is why the regexp executer really needs to take the text as a whole block, you can't really break it into smaller pieces and have an external decision about when to move from one piece to another.

Even having a simpler expression:

Inventory:
(anything)
You have x items.

... still has that problem. The phrase "You have x items." still matches the middle expression of (anything).


- Nick Gammon

www.gammon.com.au, www.mushclient.com
[Go to top] top

Posted by David Haley   USA  (3,881 posts)  [Biography] bio   Moderator
Date Tue 27 Apr 2004 05:37 AM (UTC)  quote  ]
Message
Consider this:


Inventory:
   a sword
   a spear

< hp ma mv > inventory
Inventory:
   a sword
   a spear



Now consider the regular expression:
^Inventory:$(.*)^$

Or something to that effect. How do you know when to stop matching? Do you get the first bit? Or the whole thing?

In this case it's clear, the answer is that you only want the first. But that means your algorithm is non-greedy. What if you want it to be greedy?

Perl has a non-greedy specifier, ?, which could work. But Nick already mentioned that as the solution.

How would you propose, Shadowfyr, to deal with greedy vs. non-greedy regular expressions?

I mean no offense but it seems that you are operating under the "it's obvious what I meant" principle. As you are well aware, you can't design a system based on what the user meant; you can *only* design systems that do *exactly* what the user specifies. Otherwise you are sacrificing correctness.

David Haley aka Ksilyan
Head Programmer,
Legends of the Darkstone

http://david.the-haleys.org
[Go to top] top

Posted by Shadowfyr   USA  (1,774 posts)  [Biography] bio
Date Tue 27 Apr 2004 02:56 AM (UTC)  quote  ]

Amended on Tue 27 Apr 2004 02:58 AM (UTC) by Shadowfyr

Message
Umm.. When the termination string you put into the trigger tells it to or it fails to match something? lol Seriously, if someone makes one that is so sloppy it matches forever, why is it the fault of the algorythm or client that they messed up? Even Nick appears to be using a bailout (according to one post I read today) of about 100 lines, so the buffer can't be set to like 4000 lines. If the trigger is designed right, it should have an obvious point to stop, if not, telling to to look for 20 lines is convenient to have, maybe, but all your doing is specifically telling it to bail out of the matching 'before' whatever internal limit the client itself uses. But having it optional, when you need it is significantly different than using the wrong number and having to buffer 100 lines, when you actually only match 10, or worse, not matching 11, because you told it the bailout should be 10.

main {
__if (Schrodinger_Cat is Alive or version >= "XP"){
____if version = "Vista" then Performance /= Number_of_Cores;
____call Functional_Code();}
__else
____call Crash_Windows();}
[Go to top] top

Posted by David Haley   USA  (3,881 posts)  [Biography] bio   Moderator
Date Tue 27 Apr 2004 12:31 AM (UTC)  quote  ]
Message
Quote:

In that case, the *only* difference between my state machine and Nick's system is that mine wouldn't need to be told how many lines are needed before hand.


How do you know when to stop matching?

David Haley aka Ksilyan
Head Programmer,
Legends of the Darkstone

http://david.the-haleys.org
[Go to top] top

Posted by Shadowfyr   USA  (1,774 posts)  [Biography] bio
Date Mon 26 Apr 2004 11:16 PM (UTC)  quote  ]
Message
Yeah Nick. Already forgot that \z bit. lol

And Ksilyan, like I said, I failed to consider the way multilines need to match. If I had, I would have suggested returning the entire thing in a wildcard, just like Nick is doing. In that case, the *only* difference between my state machine and Nick's system is that mine wouldn't need to be told how many lines are needed before hand. The bahaviour would be basically identical in all respects. I am not sure exactly what you think I need to 'formalize' anything. I wasn't thinking carefully enough when suggesting that such a state machine based system should return each line and group of wildcards immediately. But now that I am thinking in those terms, it could have done both with the addition of one simple switch, which could have told it to treat each match as a seperate event when you needed that. As a rule, it isn't, as you point out, how multiline triggers normally work, so making the default behaviour return the result 'in total' would make more sense.

Given this, the real difference between the approaches is mine providing one option that Nick's can't, and the fact that his executes 'after' the lines are all available, while mine would process each individual line one after the other and return the result as soon as all criteria have been met. Mine would have a slight added overhead for all instances, the overhead for his grows for each line it needs to test against. But *both* could work exactly the same way with respect to 'how' and 'when' they return results. I just goofed up when initially thinking about them and what 'multiline' really means.

main {
__if (Schrodinger_Cat is Alive or version >= "XP"){
____if version = "Vista" then Performance /= Number_of_Cores;
____call Functional_Code();}
__else
____call Crash_Windows();}
[Go to top] top

The dates and times for posts above are shown in Universal Co-ordinated Time (UTC).

To show them in your local time you can join the forum, and then set the 'time correction' field in your profile to the number of hours difference between your location and UTC time.


6,760 views.

This is page 1, subject is 3 pages long: 1 2  3  [Next page]

[Reply to this subject]  Reply to this subject   [New subject]  Start a new subject   [Refresh] Refresh page

Go to topic:           Search the forum


[Go to top] top

[Home]

Written by Nick Gammon - 5K

Comments to: Gammon Software support
[RH click to get RSS URL] Forum RSS feed ( http://www.gammon.com.au/rss/forum.xml )

[Best viewed with any browser - 2K]    [Internet Contents Rating Association (ICRA) - 2K]    [Web site powered by FutureQuest.Net]