Notice: Any messages purporting to come from this site telling you that your password has expired, or that you need to verify your details, confirm your email, resolve issues, making threats, or asking for money, are
spam. We do not email users with any such messages. If you have lost your password you can obtain a new one by using the
password reset link.
Due to spam on this forum, all posts now need moderator approval.
Entire forum
➜ Programming
➜ General
➜ lpeg code translate to lpeg re
lpeg code translate to lpeg re
|
It is now over 60 days since the last post. This thread is closed.
Refresh page
Pages: 1 2
3
4
5
Posted by
| Albert Chan
(55 posts) Bio
|
Date
| Sat 20 Jan 2018 08:12 PM (UTC) |
Message
| i enjoyed lpeg tutorial: http://www.gammon.com.au/lpeg
however, i am puzzled about the lpeg re examples
how to translate lpeg upto function example in lpeg re ?
what is the lpeg re equivalent to lpeg p1 - p2 ? | Top |
|
Posted by
| Albert Chan
(55 posts) Bio
|
Date
| Reply #1 on Sat 20 Jan 2018 10:18 PM (UTC) |
Message
| to make my lpeg re questions more concrete, I was unable to
translate lua pattern "(.*)and(.*)" using lpeg re:
local C, P = lpeg.C, lpeg.P
-- my attempt for lua pettern "(.+)and(.*)"
local lpeg_pat = C((P(1) - 'and')^1) * 'and' * C(P(1)^0)
local re_pat = re.compile "{ (. ! 'and')* . } 'and' {.*}"
-- lua pattern "(.*)and(.*)
lpeg_pat = C((P(1) - 'and')^0) 'and' * C(P(1)^0)
-- what is lpeg re equivalent code ? | Top |
|
Posted by
| Nick Gammon
Australia (23,133 posts) Bio
Forum Administrator |
Date
| Reply #2 on Sat 20 Jan 2018 10:32 PM (UTC) |
Message
| It looks like you have to reverse the order. This works:
require "re"
target = "foo and bar"
local re_pat = re.compile "{ (!'and' .)*} 'and' {.*}"
print (lpeg.match (re_pat, target))
Output:
The pattern is basically saying:
{ <-- start of capture
( <-- start of group
!'and' <-- assert not matching 'and' without consuming input
. <-- consume one character
) <-- end of group
* <-- match zero or more of the preceding
} <-- end of capture
'and' <-- consume the 'and'
{.*} <-- capture rest of input
|
- Nick Gammon
www.gammon.com.au, www.mushclient.com | Top |
|
Posted by
| Nick Gammon
Australia (23,133 posts) Bio
Forum Administrator |
Date
| Reply #3 on Sat 20 Jan 2018 10:39 PM (UTC) |
Message
| I don't see how to implement "upto" in re, since I can't see how to pass arguments to functions. |
- Nick Gammon
www.gammon.com.au, www.mushclient.com | Top |
|
Posted by
| Albert Chan
(55 posts) Bio
|
Date
| Reply #4 on Sat 20 Jan 2018 11:24 PM (UTC) |
Message
| your solution of reversing the order is very nice !
you should consider put this trick to the tutorial,
to complement the lpeg upto function example.
with this insight, can i assert the following ?
P(1) - 'and' == -P('and') * 1 == re.compile "! 'and' ."
or, to generalize
P(1) - (P'and' + 'or' + 'not')
== -(P'and' + 'or' + 'not') * 1
== re.compile "!('and' / 'or' / 'not') ." | Top |
|
Posted by
| Nick Gammon
Australia (23,133 posts) Bio
Forum Administrator |
Date
| Reply #5 on Sat 20 Jan 2018 11:48 PM (UTC) |
Message
| Yes, I think you are right, although you still need to repeat that pattern. So 'upto' could be written:
function upto (what)
return C((-P(what) * P(1))^1) * P(what)
end -- upto
Instead of:
function upto (what)
return C((P(1) - P(what))^1) * P(what)
end -- upto
Those two versions also do capturing, which you can remove by deleting the 'C' character.
I'll try to add this to the LPEG web page explanation. |
- Nick Gammon
www.gammon.com.au, www.mushclient.com | Top |
|
Posted by
| Albert Chan
(55 posts) Bio
|
Date
| Reply #6 on Sun 21 Jan 2018 12:31 AM (UTC) |
Message
| FYI, both versions of upto generate exactly the same parse tree code
Great thanks to Sean Conner, who actually recompile a debug lpeg
version to go thru the parse tree code. He also get your reversed
order answer (trial and error with the debug parse tree)
his response is in lua mailing list jan 20, 2018 5:54pm | Top |
|
Posted by
| Albert Chan
(55 posts) Bio
|
Date
| Reply #7 on Sun 21 Jan 2018 01:34 AM (UTC) Amended on Sun 21 Jan 2018 01:35 AM (UTC) by Nick Gammon
|
Message
| Also noticed lpeg upto trick cannot translate lua pattern "(.*)and(.*)"
re.compile "{(! 'and' .)*} 'and' {.*}" correspond to lua pattern "(.-)and(.*)",
the non-greedy pattern.
your lpeg tutorial 2 examples return the same match only because
words are separated by spaces: %a+ is greedy, function upto is NOT.
:-( | Top |
|
Posted by
| Nick Gammon
Australia (23,133 posts) Bio
Forum Administrator |
Date
| Reply #8 on Sun 21 Jan 2018 01:39 AM (UTC) |
Message
| What are you expecting?
require "re"
target = "foo and bar"
print "====="
local re_pat = re.compile "{ (!'and' .)*} 'and' {.*}"
print (lpeg.match (re_pat, target))
print "---"
print (string.match (target, "(.*)and(.*)"))
print (string.match (target, "(.-)and(.*)"))
Output is:
=====
foo bar
---
foo bar
foo bar
That looks the same to me. For what input do you expect a difference, and what do you expect that difference to be? |
- Nick Gammon
www.gammon.com.au, www.mushclient.com | Top |
|
Posted by
| Nick Gammon
Australia (23,133 posts) Bio
Forum Administrator |
Date
| Reply #9 on Sun 21 Jan 2018 01:43 AM (UTC) |
Message
| Maybe here:
target = "foo and bar and whatever"
Output:
=====
foo bar and whatever
---
foo and bar whatever
foo bar and whatever
So yes, it looks non-greedy. |
- Nick Gammon
www.gammon.com.au, www.mushclient.com | Top |
|
Posted by
| Nick Gammon
Australia (23,133 posts) Bio
Forum Administrator |
Date
| Reply #10 on Sun 21 Jan 2018 03:10 AM (UTC) |
Message
| I worked it out with this grammar:
target = "foo and bar and whatever"
c = re.compile [[
parse <- {| {noDelim} lastDelim |} -- look for all up to the last delimiter followed by the last part
delim <- 'and' -- our delimiter
noDelim <- (!lastDelim .)* -- zero or more characters without the last delimiter
lastDelim <- delim {(!delim .)*} !. -- the delimiter without any more delimiters and then end of subject
]]
result = lpeg.match (c, target)
for k, v in ipairs (result) do
print (k, v)
end -- for
Output:
|
- Nick Gammon
www.gammon.com.au, www.mushclient.com | Top |
|
Posted by
| Albert Chan
(55 posts) Bio
|
Date
| Reply #11 on Sun 21 Jan 2018 04:50 AM (UTC) Amended on Sun 21 Jan 2018 04:52 AM (UTC) by Albert Chan
|
Message
| i have a simpler and faster lpeg re (about 2x speed), but the
last 'and' is appended to front string
pat = re.compile "{g <- . g / 'and'} {.*}"
= pat:match("this and that and this and more")
this and that and this and
more
anyway to remove the last "and" ? | Top |
|
Posted by
| Nick Gammon
Australia (23,133 posts) Bio
Forum Administrator |
Date
| Reply #12 on Sun 21 Jan 2018 05:48 AM (UTC) Amended on Sun 21 Jan 2018 06:02 AM (UTC) by Nick Gammon
|
Message
| By my measurements, yours is not 2x faster. In some cases it is slightly faster. Arguably, if it isn't providing the results you want, then the speed doesn't matter. You could always remove the trailing "and" with a string.sub, but that would take time. I made up a test bed:
require "re"
require "tprint"
c = re.compile [[
parse <- {| {noDelim} lastDelim |} -- look for all up to the last delimiter followed by the last part
delim <- 'and' -- our delimiter
noDelim <- (!lastDelim .)* -- zero or more characters without the last delimiter
lastDelim <- delim {(!delim .)*} !. -- the delimiter without any more delimiters and then end of subject
]]
pat = re.compile "{| {g <- . g / 'and'} {.*} |}" -- Albert Chan pattern
function showResults (result, start, finish)
if not result then
print ("no match")
else
tprint (result)
end -- if
print (string.format ("Time taken = %0.3f us", (finish - start) * 1e6))
end -- showResults
function test (which)
print (string.rep ("=", 20))
print ("Testing:", which)
print (string.rep ("-", 10))
print "Nick"
start = utils.timer ()
result = lpeg.match (c, which)
finish = utils.timer ()
showResults (result, start, finish)
print (string.rep ("-", 10))
print "Albert"
start = utils.timer ()
result = lpeg.match (pat, which)
finish = utils.timer ()
showResults (result, start, finish)
end -- test
tests = {
"foo and bar and whatever",
"foo and bar",
"XandY",
"foo",
"Xand",
"andY",
"and",
"",
}
for _, v in ipairs (tests) do
test (v)
end -- for
You will notice that the very case you were interested in (multiple instances of the word "and") your expression is almost 4 times as slow.
====================
Testing: foo and bar and whatever
----------
Nick
1="foo and bar "
2=" whatever"
Time taken = 11.733 us
----------
Albert
1="foo and bar and"
2=" whatever"
Time taken = 43.302 us
====================
Testing: foo and bar
----------
Nick
1="foo "
2=" bar"
Time taken = 5.029 us
----------
Albert
1="foo and"
2=" bar"
Time taken = 4.749 us
====================
Testing: XandY
----------
Nick
1="X"
2="Y"
Time taken = 4.749 us
----------
Albert
1="Xand"
2="Y"
Time taken = 4.749 us
====================
Testing: foo
----------
Nick
no match
Time taken = 3.073 us
----------
Albert
no match
Time taken = 2.794 us
====================
Testing: Xand
----------
Nick
1="X"
2=""
Time taken = 4.470 us
----------
Albert
1="Xand"
2=""
Time taken = 5.867 us
====================
Testing: andY
----------
Nick
1=""
2="Y"
Time taken = 4.749 us
----------
Albert
1="and"
2="Y"
Time taken = 4.470 us
====================
Testing: and
----------
Nick
1=""
2=""
Time taken = 5.029 us
----------
Albert
1="and"
2=""
Time taken = 4.470 us
====================
Testing:
----------
Nick
no match
Time taken = 3.073 us
----------
Albert
no match
Time taken = 3.073 us
I took the compile part out of the timing, because you should really only do that once, and the speed you are really interested in is execution speed (that is, match speed).
Having said all that, your pattern looks nice and elegant. :) |
- Nick Gammon
www.gammon.com.au, www.mushclient.com | Top |
|
Posted by
| Albert Chan
(55 posts) Bio
|
Date
| Reply #13 on Sun 21 Jan 2018 06:11 AM (UTC) |
Message
| my mistake.
i was comparing my pattern vs yours
but mine does multiple returns, while yours was saved in table
i am new with lpeg ...
how to convert your re pattern to do multiple returns ? | Top |
|
Posted by
| Nick Gammon
Australia (23,133 posts) Bio
Forum Administrator |
Date
| Reply #14 on Sun 21 Jan 2018 06:29 AM (UTC) Amended on Sun 21 Jan 2018 06:38 AM (UTC) by Nick Gammon
|
Message
| See my "parse" line above. You put the pattern inside these symbols:
Or to not do that, remove those symbols. |
- Nick Gammon
www.gammon.com.au, www.mushclient.com | Top |
|
The dates and times for posts above are shown in Universal Co-ordinated Time (UTC).
To show them in your local time you can join the forum, and then set the 'time correction' field in your profile to the number of hours difference between your location and UTC time.
147,257 views.
This is page 1, subject is 5 pages long: 1 2
3
4
5
It is now over 60 days since the last post. This thread is closed.
Refresh page
top