Notice: Any messages purporting to come from this site telling you that your password has expired, or that you need to verify your details, confirm your email, resolve issues, making threats, or asking for money, are
spam. We do not email users with any such messages. If you have lost your password you can obtain a new one by using the
password reset link.
Due to spam on this forum, all posts now need moderator approval.
Entire forum
➜ Programming
➜ General
➜ lpeg code translate to lpeg re
lpeg code translate to lpeg re
|
It is now over 60 days since the last post. This thread is closed.
Refresh page
Pages: 1
2 3
4
5
Posted by
| Albert Chan
(55 posts) Bio
|
Date
| Reply #15 on Sun 21 Jan 2018 07:02 AM (UTC) Amended on Mon 22 Jan 2018 06:41 AM (UTC) by Nick Gammon
|
Message
| redoing your benchmark on my machine, my version is slightly faster.
I guess it depends on many factors, machine setups ...
i now have lpeg re pattern that return without the extra "and"
lpeg re pattern :
"{ g <- . g / &'and' } 'and' {.*}"
| Top |
|
Posted by
| Albert Chan
(55 posts) Bio
|
Date
| Reply #16 on Sun 21 Jan 2018 01:37 PM (UTC) Amended on Mon 22 Jan 2018 06:41 AM (UTC) by Nick Gammon
|
Message
| To summarize what I have learn so far, upto function have 2 versions:
local C, P, V = lpeg.C, lpeg.P, lpeg.V
function upto_first(what) -- non-greedy version
return C( (-P(what) * 1)^0 ) * P(what)
end
function upto_last(what) -- greedy version
return C {1 * V(1) + #P(what)} * P(what)
end
| Top |
|
Posted by
| Albert Chan
(55 posts) Bio
|
Date
| Reply #17 on Sun 21 Jan 2018 05:26 PM (UTC) Amended on Tue 23 Jan 2018 07:44 PM (UTC) by Albert Chan
|
Message
| this is a response to reply #3: lpeg upto version in lpeg re:
function upto(what) -- non-greedy match
return string.format( [[ {(! %q .)*} %q ]], what, what)
end
pat = re.compile( upto('and') .. "{.*}" )
= pat:match("this and that and whatever")
this
that and whatever
| Top |
|
Posted by
| Albert Chan
(55 posts) Bio
|
Date
| Reply #18 on Mon 22 Jan 2018 01:33 AM (UTC) Amended on Mon 22 Jan 2018 06:43 AM (UTC) by Nick Gammon
|
Message
| discovered a way to combine reply #16 2 upto functions as one:
function upto(what, greedy)
local t1, t2 = #P(what), 1 * V(1)
if greedy then t1, t2 = t2, t1 end
return C {t1 + t2} * P(what)
end
function all() return C(P(1)^0) end
text = 'this or that of whatever'
= (upto('and', false) * all()):match(text) -- lua pattern "(.-)and(.*)
this
that and whatever
= (upto('and', true) * all()):match(text) -- lua pattern "(.*)and(.*)
this and that
whatever
| Top |
|
Posted by
| Albert Chan
(55 posts) Bio
|
Date
| Reply #19 on Mon 22 Jan 2018 04:00 PM (UTC) Amended on Tue 23 Jan 2018 07:45 PM (UTC) by Albert Chan
|
Message
| My previous upto have problem with long strings in greedy mode.
= upto('and', true):match( ('*and'):rep(50) )
backtrack stack overflow (current limit is 400)
...
Is this a lpeg bug ? only handle max string length of 199 ?
Above example only need 3 backtracks !
below is same upto function that avoid backtrack overflow:
function upto(what, greedy)
what = P(what)
local head = 1 - what
if greedy then head = what * -(head^0 * -1) + head^1 end
return C(head^0) * what
end
| Top |
|
Posted by
| Nick Gammon
Australia (23,133 posts) Bio
Forum Administrator |
Date
| Reply #20 on Mon 22 Jan 2018 09:15 PM (UTC) |
Message
| See LPEG reference
Quote:
lpeg.setmaxstack (max)
Sets a limit for the size of the backtrack stack used by LPeg to track calls and choices. (The default limit is 400.) Most well-written patterns need little backtrack levels and therefore you seldom need to change this limit; before changing it you should try to rewrite your pattern to avoid the need for extra space. Nevertheless, a few useful patterns may overflow. Also, with recursive grammars, subjects with deep recursion may also need larger limits.
You have a recursing grammar, so this can be an issue for you. |
- Nick Gammon
www.gammon.com.au, www.mushclient.com | Top |
|
Posted by
| Albert Chan
(55 posts) Bio
|
Date
| Reply #21 on Mon 22 Jan 2018 10:05 PM (UTC) Amended on Mon 22 Jan 2018 10:27 PM (UTC) by Albert Chan
|
Message
| retry recursive upto_last with bigger stack indeed solve the problem
lpeg.setmaxstack(201 * 2) -- max string length 200 ???
= upto_last('and'):match( ('*and'):rep(50) )
*and*and*and ... | Top |
|
Posted by
| Albert Chan
(55 posts) Bio
|
Date
| Reply #22 on Tue 23 Jan 2018 03:15 PM (UTC) Amended on Tue 23 Jan 2018 07:48 PM (UTC) by Albert Chan
|
Message
| the benchmark numbers in reply 12 might be flawed
my re pattern looks for 'and' from end-of-string, whereas your re grammar start from the beginning.
The timings will be affected by location of last 'and'
you might want to try benchmark with this non-backtract pattern:
(this new re pattern is fast and can handle long strings)
pat = re.compile "{g <- &('and' (!'and' .)* !.) / .g} 'and' {.*}"
= pat:match "this and that and whatever"
this and that
whatever | Top |
|
Posted by
| Nick Gammon
Australia (23,133 posts) Bio
Forum Administrator |
Date
| Reply #23 on Tue 23 Jan 2018 10:29 PM (UTC) |
Message
| I added a table capture ( "{| ... |}" ) to your pattern to I could check more accurately what the matches were. For the first match (with the two "and"s) your pattern is still slower. Perhaps if you post exactly how you are timing yours?
====================
Testing: foo and bar and whatever
----------
Nick
1="foo and bar "
2=" whatever"
Time taken = 11.454 us
----------
Albert
1="foo and bar "
2=" whatever"
Time taken = 43.860 us
====================
Testing: foo and bar
----------
Nick
1="foo "
2=" bar"
Time taken = 5.029 us
----------
Albert
1="foo "
2=" bar"
Time taken = 4.749 us
====================
Testing: XandY
----------
Nick
1="X"
2="Y"
Time taken = 4.749 us
----------
Albert
1="X"
2="Y"
Time taken = 5.587 us
====================
Testing: foo
----------
Nick
no match
Time taken = 3.073 us
----------
Albert
no match
Time taken = 3.632 us
====================
Testing: Xand
----------
Nick
1="X"
2=""
Time taken = 4.749 us
----------
Albert
1="X"
2=""
Time taken = 4.470 us
====================
Testing: andY
----------
Nick
1=""
2="Y"
Time taken = 5.029 us
----------
Albert
1=""
2="Y"
Time taken = 6.425 us
====================
Testing: and
----------
Nick
1=""
2=""
Time taken = 4.470 us
----------
Albert
1=""
2=""
Time taken = 4.749 us
====================
Testing:
----------
Nick
no match
Time taken = 3.073 us
----------
Albert
no match
Time taken = 3.352 us
I tried averaging out the results over 20 trials, and removing the table capture. This time I get somewhat faster results:
====================
Testing: foo and bar and whatever
----------
Nick
foo and bar
whatever
Time taken = 1.355 us
----------
Albert
foo and bar
whatever
Time taken = 2.500 us
====================
Testing: foo and bar
----------
Nick
foo
bar
Time taken = 0.726 us
----------
Albert
foo
bar
Time taken = 0.629 us
====================
Testing: XandY
----------
Nick
X
Y
Time taken = 0.615 us
----------
Albert
X
Y
Time taken = 0.545 us
====================
Testing: foo
----------
Nick
no match
Time taken = 0.461 us
----------
Albert
no match
Time taken = 0.405 us
====================
Testing: Xand
----------
Nick
X
Time taken = 0.601 us
----------
Albert
X
Time taken = 0.531 us
====================
Testing: andY
----------
Nick
Y
Time taken = 0.629 us
----------
Albert
Y
Time taken = 0.573 us
====================
Testing: and
----------
Nick
Time taken = 0.824 us
----------
Albert
Time taken = 1.257 us
====================
Testing:
----------
Nick
no match
Time taken = 0.475 us
----------
Albert
no match
Time taken = 0.405 us
(Time is average time).
Code to reproduce:
require "re"
require "tprint"
local ITERATIONS = 20
local result1, result2
print (string.rep ("*", 80))
c = re.compile [[
parse <- {noDelim} lastDelim -- look for all up to the last delimiter followed by the last part
delim <- 'and' -- our delimiter
noDelim <- (!lastDelim .)* -- zero or more characters without the last delimiter
lastDelim <- delim {(!delim .)*} !. -- the delimiter without any more delimiters and then end of subject
]]
pat = re.compile "{g <- &('and' (!'and' .)* !.) / .g} 'and' {.*}" -- Albert Chan pattern
function showResults (result1, result2, start, finish)
if not result1 then
print ("no match")
else
if type (result1) == 'table' then
tprint (result1)
else
print (result1)
print (result2)
end -- if
end -- if
print (string.format ("Time taken = %0.3f us", ((finish - start) / ITERATIONS) * 1e6))
end -- showResults
function test (which)
print (string.rep ("=", 20))
print ("Testing:", which)
print (string.rep ("-", 10))
print "Nick"
start = utils.timer ()
for i = 1, ITERATIONS do
result1, result2 = lpeg.match (c, which)
end -- for
finish = utils.timer ()
showResults (result1, result2, start, finish)
print (string.rep ("-", 10))
print "Albert"
start = utils.timer ()
for i = 1, ITERATIONS do
result1, result2 = lpeg.match (pat, which)
end -- for
finish = utils.timer ()
showResults (result1, result2, start, finish)
end -- test
tests = {
"foo and bar and whatever",
"foo and bar",
"XandY",
"foo",
"Xand",
"andY",
"and",
"",
}
for _, v in ipairs (tests) do
test (v)
end -- for
|
- Nick Gammon
www.gammon.com.au, www.mushclient.com | Top |
|
Posted by
| Albert Chan
(55 posts) Bio
|
Date
| Reply #24 on Tue 23 Jan 2018 11:27 PM (UTC) Amended on Wed 24 Jan 2018 02:19 AM (UTC) by Albert Chan
|
Message
| This was unexpected. Multiple returns benchmark suggest both patterns are similar.
my machine is pentium 3 866Mhz, WinXP, luajit 1.1.8, lpeg 1.0.1
"foo and bar and whatever"
my pentium your machine
iterations 1 million 20
time in us Albert Nick Albert Nick
return tables 14.4 15.9 43.9 11.5
return value 6.7 8.2 2.5 1.4
cost of table 7.7 7.7 41.4 10.1
i have since discovered a version without lookahead,
but for your really short string test, the timings is likely worse.
Algorithmetically this is faster, but there is an extra split call
pat = re.compile(
"(g <- 'and' / .g)+ => split",
{ split = function(s,i) return true, s:sub(1, i-4), s:sub(i) end }
)
| Top |
|
Posted by
| Nick Gammon
Australia (23,133 posts) Bio
Forum Administrator |
Date
| Reply #25 on Wed 24 Jan 2018 01:05 AM (UTC) |
Message
| Good questions. I've modified the test to summarize results.
With 100 iterations, and my test first:
Test Nick Albert
------------------------------------------------
foo and bar and whatever 0.933 1.011
foo and bar 0.629 0.548
XandY 0.799 0.494
foo 0.363 0.341
Xand 0.673 0.886
andY 0.503 0.615
and 0.483 0.455
0.413 0.467
And your test first:
Test Albert Nick
------------------------------------------------
foo and bar and whatever 1.109 0.914
foo and bar 0.556 0.626
XandY 0.489 0.525
foo 0.352 0.369
Xand 0.478 0.511
andY 0.478 0.520
and 0.472 0.503
0.338 0.344
Overall it seems faster the more iterations in testing, however it seems that for the ones where my grammar is faster, it is still faster, and vice-versa.
The differences seem less over more iterations.
With your newest version without lookahead:
Test Nick Albert
------------------------------------------------
foo and bar and whatever 0.967 1.193
foo and bar 0.584 0.765
XandY 0.489 0.701
foo 0.369 0.346
Xand 0.106 0.735
andY 0.517 1.137
and 0.508 0.729
0.508 0.335
I suspect the differences may be in how you are testing. Are you using MUSHclient or running a test using a standalone program? If you are using C or C++ with a more modern compiler than what I have, that might alter the results. |
- Nick Gammon
www.gammon.com.au, www.mushclient.com | Top |
|
Posted by
| Nick Gammon
Australia (23,133 posts) Bio
Forum Administrator |
Date
| Reply #26 on Wed 24 Jan 2018 01:05 AM (UTC) |
Message
| I see you are using luajit which in itself will probably give somewhat different results to me because of the way it works. |
- Nick Gammon
www.gammon.com.au, www.mushclient.com | Top |
|
Posted by
| Nick Gammon
Australia (23,133 posts) Bio
Forum Administrator |
Date
| Reply #27 on Wed 24 Jan 2018 01:10 AM (UTC) |
Message
| Testing with a longer string:
string.rep ("c", 1000) .. " and " .. string.rep ("a", 1000) .. " and " .. string.rep ("b", 1000)
I get timings:
Nick: 49.353 Albert: 22.584
Yours seems better for longer strings. |
- Nick Gammon
www.gammon.com.au, www.mushclient.com | Top |
|
Posted by
| Albert Chan
(55 posts) Bio
|
Date
| Reply #28 on Wed 24 Jan 2018 02:13 AM (UTC) Amended on Wed 24 Jan 2018 02:40 AM (UTC) by Albert Chan
|
Message
| My re patterns with split does no lookahead, but has a bigger big O constant
Bypassing lpeg match-time-capture overhead might make it faster:
do
local sub, match = string.sub, lpeg.match
local pat = re.compile "(g <- 'and' / .g)+"
function split_last_and(text)
local i = match(pat, text)
if i==nil then return nil end
return sub(text, 1, i-4), sub(text, i)
end
end
= split_last_and "this and that and whatever"
this and that
whatever | Top |
|
Posted by
| Nick Gammon
Australia (23,133 posts) Bio
Forum Administrator |
Date
| Reply #29 on Wed 24 Jan 2018 04:21 AM (UTC) |
Message
| I'm curious to know how your grammar works so well without overflowing the recursion stack. I have to assume that the grammar:
... does tail-recursion so that effectively turns the recursion into an iteration. Do you agree?
(In other words, the right-recursion into g can be replaced by a tail call, which simply goes to the start of g without adding to the call stack).
In assembler the equivalent would be:
foo equ * # start of foo
# do stuff, and under certain conditions:
jsr foo # call foo again
ret # leave foo
This can be replaced by:
foo equ * # start of foo
# do stuff, and under certain conditions:
jmp foo # call foo again
The first version adds to the call stack for each recursion, the second version does not, but is functionally equivalent. |
- Nick Gammon
www.gammon.com.au, www.mushclient.com | Top |
|
The dates and times for posts above are shown in Universal Co-ordinated Time (UTC).
To show them in your local time you can join the forum, and then set the 'time correction' field in your profile to the number of hours difference between your location and UTC time.
147,265 views.
This is page 2, subject is 5 pages long:
1
2 3
4
5
It is now over 60 days since the last post. This thread is closed.
Refresh page
top