Notice: Any messages purporting to come from this site telling you that your password has expired, or that you need to verify your details, confirm your email, resolve issues, making threats, or asking for money, are
spam. We do not email users with any such messages. If you have lost your password you can obtain a new one by using the
password reset link.
Due to spam on this forum, all posts now need moderator approval.
Entire forum
➜ Programming
➜ General
➜ lpeg code translate to lpeg re
lpeg code translate to lpeg re
|
It is now over 60 days since the last post. This thread is closed.
Refresh page
Pages: 1
2
3
4
5
Posted by
| Albert Chan
(55 posts) Bio
|
Date
| Reply #60 on Sun 11 Feb 2018 04:38 PM (UTC) Amended on Mon 12 Feb 2018 02:40 PM (UTC) by Albert Chan
|
Message
| Just wanted to show a faster and more controlled way to do gsub in re
unlike string.gsub, re.gsub replaces all matches,
that means re.gsub cannot do this:
t = 'this and that and whatever'
=string.gsub(t, 't%w*', string.upper, 2)
THIS and THAT and whatever
here is a lpeg re pattern that will:
sub2 = re.compile("{~ (>('t'%w*) -> upper)^-2 .* ~}", {upper=string.upper})
=sub2:match(t)
THIS and THAT and whatever
I used my patched '>' matching prefix, above literally translate to this:
sub2 = re.compile("{~ (g <- ('t'%w*) -> upper / .[^t]* g)^-2 .* ~}", {upper=string.upper})
| Top |
|
Posted by
| Albert Chan
(55 posts) Bio
|
Date
| Reply #61 on Sun 11 Feb 2018 07:11 PM (UTC) Amended on Mon 12 Feb 2018 10:11 PM (UTC) by Albert Chan
|
Message
| On second thought, re.gsub can do limited replacements using helper function
function upto(rep, n)
local f = rep
if type(f) ~= 'function' then f = function() return rep end end
return function (s)
if n <= 0 then return end -- no replace
n = n - 1
return f(s)
end
end
= re.gsub(t, "'t'%w*", upto(string.upper, 2))
THIS and THAT and whatever | Top |
|
Posted by
| Albert Chan
(55 posts) Bio
|
Date
| Reply #62 on Mon 12 Feb 2018 07:25 PM (UTC) Amended on Mon 12 Feb 2018 10:17 PM (UTC) by Albert Chan
|
Message
| Above function upto() for demonstration only
there are many problems with it ...
upto generated function cannot be memoized in re.gsub
even if it can, generated function is "dead" after n replacements.
And, because it returned function, back reference will not work
= re.gsub(t, "%w+", "(%0)")
(this) (and) (that) (and) (whatever)
= re.gsub(t, "%w+", upto("(%0)", 1)) -- this failed to recognize %0
(%0) and that and whatever
-- this gsub style will handle it all
= re.match(t, "{~ (g <- %w+ -> '(%0)' / .%W* g)? .* ~}")
(this) and that and whatever
| Top |
|
Posted by
| Albert Chan
(55 posts) Bio
|
Date
| Reply #63 on Wed 14 Feb 2018 01:46 AM (UTC) Amended on Thu 15 Feb 2018 03:40 AM (UTC) by Albert Chan
|
Message
| my previous lpeg re pattern for greedy search may fail if the pattern is repeated
(I learn this from http://www.inf.puc-rio.br/~roberto/docs/peg.pdf, page 11)
-- greedy search for 'xuxu', my old way will not work
= re.match( 'xuxuxui', "(g <- 'xuxu' / .g)+")
5
-- with known pattern, we can change the search:
= re.match('xuxuxui', "(g <- 'xu'^+2 / .g)+ ")
7
However, it maybe hard to recognize a repeated pattern, say %a%u
And, how to change the search ? Is %a%u+ correct ?
This is Roberto's solution to repeat sub-patterns problem
(the published pattern is flawed, it had * instead of +, which return 1 for no match)
= re.match( 'xuxuxui', "(g <- &'xuxu' . / .g)+")
4
returned position is 1 pass last occurence, but is correct
final position = 4 - 1 + fixedlen('xuxu') = 3 + 4 = 7
I have an improvement to his pattern, without the position arithmetic
(note: the loop never check first position, but it is ok)
= re.match( 'xuxuxui', "(. (g <- &'xuxu' / .g))* 'xuxu' ")
7
| Top |
|
Posted by
| Albert Chan
(55 posts) Bio
|
Date
| Reply #64 on Wed 14 Feb 2018 02:20 PM (UTC) Amended on Sat 17 Feb 2018 06:27 PM (UTC) by Albert Chan
|
Message
| what I learned so far to translate lua pattern "(.*)" .. z .. "(.*)" to lpeg re:
Note: pattern using my patched lpeg: https://github.com/achan001/LPeg-anywhere
this work with all z, even with repeated sub-patterns, say 'andand' (1)
pat1 = re.compile("{(. >&%z)*} %z {.*}", {z=z})
if text is short (short enough not to overflow backtack stack), we can do true greedy
pat2 = re.compile("{g <- .g / &%z} %z {.*}", {z=z})
my patched lpeg can optimize above and use less backtrack stack (2)
pat3 = re.compile("{g <- .[^%z]* g / &%z} %z {.*}", {z=z})
it can even turn into a tail call, without worrying about stack (3)
pat4 = re.compile("{.* %b <&%z} %z {.*}", {z=z})
it does not have to match from end-of-string (4)
pat5 = re.compile("{(>%z)* %b <&%z} %z {.*}", {z=z})
(1) '>' is for forward match: >%pat == (g <- %pat / . [^%pat]* g)
(2) with my patched lpeg: [^%pat] == non-head-chars of %pat
(3) '<' is for backward match: <%pat == (g <- %pat / %b g)
(4) the loop (>%z)* move just beyond the correct position | Top |
|
The dates and times for posts above are shown in Universal Co-ordinated Time (UTC).
To show them in your local time you can join the forum, and then set the 'time correction' field in your profile to the number of hours difference between your location and UTC time.
146,031 views.
This is page 5, subject is 5 pages long:
1
2
3
4
5
It is now over 60 days since the last post. This thread is closed.
Refresh page
top