[Home] [Downloads] [Search] [Help/forum]


Register forum user name Search FAQ

Gammon Forum

[Folder]  Entire forum
-> [Folder]  Programming
. -> [Folder]  General
. . -> [Subject]  lpeg functions

lpeg functions

It is now over 60 days since the last post. This thread is closed.     [Refresh] Refresh page


Pages: 1 2  

Posted by Albert Chan   (55 posts)  [Biography] bio
Date Mon 29 Jan 2018 02:04 AM (UTC)
Message
lpeg manual mentioned P(true) and P(false). why are these needed ?

x true = x
true x = x

so P(true) will always be optimized away.

At first, I thought P(false) is to retain captures, then switched to alternative.

(lpeg codes A with captures ...) * P(false) + (lpeg codes B ...)

But it does not. All the captures are gone.

Above just optimized to (lpeb codes B ...)

The only reason lpeg does not do this optimization is due to its
match-time-capture function.

I cannot think of 1 case where P(boolean) is needed.

May that were the reason lpeg re doesn't have them as variables.
[Go to top] top

Posted by Nick Gammon   Australia  (22,975 posts)  [Biography] bio   Forum Administrator
Date Reply #1 on Mon 29 Jan 2018 03:56 AM (UTC)

Amended on Mon 29 Jan 2018 03:57 AM (UTC) by Nick Gammon

Message
The only way I can explain that is from the LPEG documentation:

Quote:


patt1 * patt2

Returns a pattern that matches patt1 and then matches patt2, starting where patt1 finished. The identity element for this operation is the pattern lpeg.P(true), which always succeeds.


Note the reference to "the identity element for this operation". With that as a clue, and looking up identity elements, eg.

Algebraic patterns — Identity element

It appears that, useful or not, lpeg.P(true) and lpeg.P(false) are useful in situation where you may (sometimes) want to make something that always succeeds. Perhaps you are writing a function to combine THIS and THAT, and sometimes THAT needs to always match. Being able to return lpeg.P (true) gives you a useful pattern that you can use in that case.

- Nick Gammon

www.gammon.com.au, www.mushclient.com
[Go to top] top

Posted by Nick Gammon   Australia  (22,975 posts)  [Biography] bio   Forum Administrator
Date Reply #2 on Mon 29 Jan 2018 04:11 AM (UTC)
Message
As an analogy, say we had this Lua code:


a = b * 1


Now you might say that multiplying by one (the identity operation) is not particularly useful. However what about:


a = b * foo ()


Where 'foo' is a function that returns a number, and sometimes that number is 1.

Now it is useful for the writer of the function 'foo' to have a way of expressing "identity operation" (ie. do nothing, or "always succeed").

Conversely you might it want to fail in which case you return zero. And the equivalent to that in LPEG would be lpeg.P (false).

- Nick Gammon

www.gammon.com.au, www.mushclient.com
[Go to top] top

Posted by Nick Gammon   Australia  (22,975 posts)  [Biography] bio   Forum Administrator
Date Reply #3 on Mon 29 Jan 2018 04:22 AM (UTC)
Message
Maybe this is a good example, I'm not sure.

I want to match "foo" or "bar" or nothing at all followed by "nick". So:


lpeg.match (C ((P"foo" + P"bar" + P(true))) * "nick", "foonick")


That matches foo OR bar OR (true) and then checks for nick.

Results:


foonick  --> captures "foo"
barnick  --> captures "bar"
nick     --> captures ""
others   --> fail

- Nick Gammon

www.gammon.com.au, www.mushclient.com
[Go to top] top

Posted by Albert Chan   (55 posts)  [Biography] bio
Date Reply #4 on Mon 29 Jan 2018 10:48 AM (UTC)

Amended on Mon 05 Feb 2018 10:46 PM (UTC) by Albert Chan

Message
since above pattern, literally translated to lpeg re:
"{ 'foo' / 'bar' / '' }"

lpeg.re do have lpeg.P(true) as variable ... it is ''
P(true) will be optimized away anyway, so above is same as
"{ 'foo' / 'bar' / }"

To push my understanding a bit furthur ...
expr ^ 0 == P { expr * V(1) + P(true) }

However, with current lpeg implementation, RHS is less efficient

Tried push even furthur to represent P(false) as lpeg re !''

The pcode are different, but they are equivalent, since not true == false
[Go to top] top

Posted by Albert Chan   (55 posts)  [Biography] bio
Date Reply #5 on Mon 29 Jan 2018 12:22 PM (UTC)

Amended on Mon 29 Jan 2018 01:08 PM (UTC) by Albert Chan

Message
Just wanted to share how to use lpeg.P(function)

this was one of my lpeg re pattern for regex "(.*)and(.*)", using =>

pat = re.compile( 
    "(g <- 'and' / .g)+ => split", 
    {split = function(s, i) return s:sub(1, i-4), s:sub(i) end}
)

Redo above using lpeg.P(function). Just search and replace "=> " to "%"

pat = re.compile( 
    "(g <- 'and' / .g)+ %split", 
    {split = function(s, i) return s:sub(1, i-4), s:sub(i) end}
)

The split function will implicitly cast as lpeg.P object. Although unnecessary, we can rewrite split explicitly

{split = lpeg.P( function(s, i) return s:sub(1, i-4), s:sub(i) end )}
[Go to top] top

Posted by Albert Chan   (55 posts)  [Biography] bio
Date Reply #6 on Tue 30 Jan 2018 01:28 PM (UTC)

Amended on Wed 31 Jan 2018 03:01 AM (UTC) by Albert Chan

Message
Found a real use for P(true) in re.lua

local function mult (p, n)
  local np = mm.P(true)
  while n >= 1 do
    if n%2 >= 1 then np = np * p end
    p = p * p
    n = n/2
  end
  return np
end

So, mult(p, 0) return P(true), which will optimized away

function name mult is misleading, it should be named re_pow

P.S.
I recently patched lpeg.B so it can move back from current position (upto 255 bytes)
%b = lpeg.B(-1) = move back 1 byte

To optimize %b ^ n to lpeg.B(-n), all I need is 1 line above P(true)

if p == Predef.b then return mm.B(-n) end
[Go to top] top

Posted by Albert Chan   (55 posts)  [Biography] bio
Date Reply #7 on Tue 30 Jan 2018 03:00 PM (UTC)
Message
snippet from re.lua:

local m = require"lpeg"

-- 'm' will be used to parse expressions, and 'mm' will be used to
-- create expressions; that is, 're' runs on 'm', creating patterns
-- on 'mm'
local mm = m

why the need for both m and mm ?
[Go to top] top

Posted by Albert Chan   (55 posts)  [Biography] bio
Date Reply #8 on Tue 30 Jan 2018 06:52 PM (UTC)
Message
Tried to understand meaning of lpeg.B(-patt) (hint, it is not -lpeg.B(patt))

... it is lpeg.P(-patt) !

lpeg.B(-patt) does nothing, except to make its argument a lpeg object

Worse, this is just based on my informal tests, it might do something nasty !

The best I can say, lpeg.B(-patt) is undefined
[Go to top] top

Posted by Nick Gammon   Australia  (22,975 posts)  [Biography] bio   Forum Administrator
Date Reply #9 on Tue 30 Jan 2018 10:15 PM (UTC)
Message
Not sure. Both of these match:


print (lpeg.match (P(3) * B(-P('foo')) * 'bar', 'foobar'))  --> 7
print (lpeg.match (P(3) * B( P('foo')) * 'bar', 'foobar'))  --> 7

- Nick Gammon

www.gammon.com.au, www.mushclient.com
[Go to top] top

Posted by Nick Gammon   Australia  (22,975 posts)  [Biography] bio   Forum Administrator
Date Reply #10 on Tue 30 Jan 2018 10:18 PM (UTC)
Message
Albert Chan said:

why the need for both m and mm ?


Not sure, unless it is for documentation.

- Nick Gammon

www.gammon.com.au, www.mushclient.com
[Go to top] top

Posted by Albert Chan   (55 posts)  [Biography] bio
Date Reply #11 on Tue 30 Jan 2018 11:31 PM (UTC)

Amended on Tue 30 Jan 2018 11:58 PM (UTC) by Albert Chan

Message
both of your examples confirm my finding: B(-patt) = P(-patt), which is silly.

foo, bar, foobar, = P'foo', P'bar', P'foobar'

-- B match back characters: (foo == "foo")
= (3 * B(foo) * bar):match "foobar" => 7

-- B is actually P, matching: (foo ~= "bar")
= (3 * P(-foo) * bar):match "foobar" => 7
= (3 * B(-foo) * bar):match "foobar" => 7

both return the same position, but doing different things ... try these:

= (3 * B( foobar) * bar):match "foobar" => nil
= (3 * B(-foobar) * bar):match "foobar" => 7

-> these are undefined behavior, can change in the future
-> NEVER use negation argument for lpeg.B
[Go to top] top

Posted by Albert Chan   (55 posts)  [Biography] bio
Date Reply #12 on Wed 31 Jan 2018 12:10 AM (UTC)

Amended on Wed 31 Jan 2018 01:50 AM (UTC) by Albert Chan

Message
find a bug in re.lua

pat = re.compile "[^a]^2"  -- ok
pat = re.compile "[a]^2"   -- bad
.\re.lua: attempt to perform arithmetic on local 'p' (a string value)
[Go to top] top

Posted by Albert Chan   (55 posts)  [Biography] bio
Date Reply #13 on Wed 31 Jan 2018 02:15 AM (UTC)

Amended on Wed 31 Jan 2018 11:31 PM (UTC) by Albert Chan

Message
Above class range bug turns out to be an example of using P(false) !
There are many ways to fix the bug:

1. P(false) way
make sure more than 1 items -> P(false) + item = P(item)
-> so add a m.Cc(false) * in front of item
-> this guaranteed folding run at least once.

2. complement (likely more efficient)
since complement function always run, return c == '^' and any - p or m.P(p)
-> this also guaranteed class range is lpeg object

3. patch mult(p, n) (somewhat patchy ...)
all other suffixes ^+n ^-n ? * + use lpeg __pow function, which convert p to lpeg object implicitly.
-> add 1 line in mult(p,n) : p = m.P(p)
[Go to top] top

Posted by Nick Gammon   Australia  (22,975 posts)  [Biography] bio   Forum Administrator
Date Reply #14 on Wed 31 Jan 2018 04:03 AM (UTC)
Message
Albert Chan said:

The best I can say, lpeg.B(-patt) is undefined


I think I can explain what is happening, possibly it's a bug.

Try this:


print (lpeg.match (P(3) * B( -P('foo')) * 3, 'foobar'))  --> 7
print (lpeg.match (P(3) * B( -P('foo')) * 3, 'foofoo'))  --> nil


In other words, the B(-P'foo') contains an assertion (not a match) (B(P'foo') would be a match) and since it doesn't consume any characters the lpeg.B does not move backwards any characters.

If you print the tree for:


P(3) * B( P('foo')) * 3


You get:


seq
  any
  seq
    any
    seq
      any
      seq
        behind 3
          seq
            char 'f'
            seq
              char 'o'
              char 'o'
        seq
          any
          seq
            any
            any



You will notice that the lpeg.B generates "behind 3" because it knows it has to go back the length of 'foo' (and then try matching).

However this:


P(3) * B( -P('foo')) * 3


Generates:


seq
  any
  seq
    any
    seq
      any
      seq
        behind 0
          not
            seq
              char 'f'
              seq
                char 'o'
                char 'o'
        seq
          any
          seq
            any
            any


Now it generates "behind 0" (so it is matching the current position, not 3 back) and checking if that equals 'foo' (and then negating the result).

I'm not sure if this is a bug or not. You could argue that an assertion like -P'foo' should not backtrack because it doesn't move the position of the match pointer.

Judging by the generated opcodes, "behind 3" just moves the current pointer backwards, and as the documentation says:

Quote:

Pattern patt must match only strings with some fixed length ...


That means that, if it matches, then the pointer is guaranteed to be back where it started (by consuming input). And since assertions don't consume input, the pointer is not moved backwards.

- Nick Gammon

www.gammon.com.au, www.mushclient.com
[Go to top] top

The dates and times for posts above are shown in Universal Co-ordinated Time (UTC).

To show them in your local time you can join the forum, and then set the 'time correction' field in your profile to the number of hours difference between your location and UTC time.


56,844 views.

This is page 1, subject is 2 pages long: 1 2  [Next page]

It is now over 60 days since the last post. This thread is closed.     [Refresh] Refresh page

Go to topic:           Search the forum


[Go to top] top

Quick links: MUSHclient. MUSHclient help. Forum shortcuts. Posting templates. Lua modules. Lua documentation.

Information and images on this site are licensed under the Creative Commons Attribution 3.0 Australia License unless stated otherwise.

[Home]


Written by Nick Gammon - 5K   profile for Nick Gammon on Stack Exchange, a network of free, community-driven Q&A sites   Marriage equality

Comments to: Gammon Software support
[RH click to get RSS URL] Forum RSS feed ( https://gammon.com.au/rss/forum.xml )

[Best viewed with any browser - 2K]    [Hosted at HostDash]