[Home] [Downloads] [Search] [Help/forum]


Register forum user name Search FAQ

Gammon Forum

[Folder]  Entire forum
-> [Folder]  Programming
. -> [Folder]  General
. . -> [Subject]  Question about Regex library issue

Question about Regex library issue

It is now over 60 days since the last post. This thread is closed.     [Refresh] Refresh page


Posted by Tarcas   (2 posts)  [Biography] bio
Date Mon 11 Sep 2017 01:00 AM (UTC)

Amended on Mon 11 Sep 2017 01:02 AM (UTC) by Tarcas

Message
Hi Nick,
I'd like to start by thanking you profusely for creating the regex library. I see that this was some years ago, but it looks like it should solve my problem nicely.

Unfortunately, I have run into an issue. Probably my fault, but I can't figure out how. I have an expression that I'm searching for:

char result=ms.Match ("([rw])(%d+)(%w)");

and I'm returning the list of captures per your example code:

      if (result == REGEXP_MATCHED)
        {
           Serial.println("Matched.");
           for (int j = 0; j < 3; j++)
          {
            Serial.print ("Capture number: ");
            Serial.println (j, DEC);
            Serial.print ("Text: '");
            Serial.print (ms.GetCapture (buf, j));
            Serial.println ("'");
          }
          // matching offsets in ms.capture
        }

It works great with one small issue that I'm trying to solve. The %d+ in the middle captures (or at least prints) nothing if the matched expression is at the beginning of the line. If I insert a space or one or more random characters before the text that I want matched, it works fine and I get my digits.

I've tried changing the expression to match the beginning of the line, and it didn't help my issue. I've also tried changing the %d+ to a %w+ thinking that maybe digits caused it, but there was no change (except that it would match on letters as well as digits, of course.)
Everything I've done does correctly print the leading "r" or "w" and the trailing alphanumeric character. Just not the digits in the middle if the string is at the beginning of the line.

Short of prefixing an extra character every time I send a code to match, I'm not sure where to go with this. Is this some gotcha with captures that I've just never heard of?

example output:

Attempting match on: r55t
r55t
Matched.
Capture number: 0
Text: 'r'
Capture number: 1
Text: ''
Capture number: 2
Text: 't'
Attempting match on:  r55t
 r55t
Matched.
Capture number: 0
Text: 'r'
Capture number: 1
Text: '55'
Capture number: 2
Text: 't'

I would welcome your insights.
[Go to top] top

Posted by Nick Gammon   Australia  (22,973 posts)  [Biography] bio   Forum Administrator
Date Reply #1 on Sat 16 Sep 2017 06:40 AM (UTC)

Amended on Sat 16 Sep 2017 06:42 AM (UTC) by Nick Gammon

Message
It would have helped to provide the full code. I could reproduce your issue with this code:


#include <Regexp.h>

void setup ()
{
  Serial.begin (115200);
  Serial.println ();

  // match state object
  MatchState ms;

  // what we are searching (the target)
  char buf [10] = "r55t";
  ms.Target (buf);  // set its address
  Serial.println (buf);

  char result=ms.Match ("([rw])(%d+)(%w)");
  
 if (result == REGEXP_MATCHED)
    {
       Serial.println("Matched.");
       for (int j = 0; j < 3; j++)
      {
        Serial.print ("Capture number: ");
        Serial.println (j, DEC);
        Serial.print ("Text: '");
        Serial.print (ms.GetCapture (buf, j));
        Serial.println ("'");
      }
      // matching offsets in ms.capture
    }
  else
    Serial.println ("No match");
        
    
}  // end of setup  

void loop () {}


However what you are doing wrongly here (assuming you did something similar) is to use "buf" for two different purposes.

According to the documentation:

Quote:

After a successful match, this copies the matching string from the target buffer to another memory location, with a null-terminator.


(My emphasis)

However by using "buf" here for the source text, and also for the location to copy the string to, you are corrupting the source text with the capture.

If you use a different buffer for the capture text, it works:



#include <Regexp.h>

void setup ()
{
  Serial.begin (115200);
  Serial.println ();

  // match state object
  MatchState ms;

  // what we are searching (the target)
  char buf [10] = "r55t";
  ms.Target (buf);  // set its address
  Serial.println (buf);

  char result=ms.Match ("([rw])(%d+)(%w)");
  
 if (result == REGEXP_MATCHED)
    {
       Serial.println("Matched.");
       for (int j = 0; j < 3; j++)
      {
        Serial.print ("Capture number: ");
        Serial.println (j, DEC);
        Serial.print ("Text: '");
        char captureBuf [10];
        Serial.print (ms.GetCapture (captureBuf, j));
        Serial.println ("'");
      }
      // matching offsets in ms.capture
    }
  else
    Serial.println ("No match");
        
    
}  // end of setup  

void loop () {}

- Nick Gammon

www.gammon.com.au, www.mushclient.com
[Go to top] top

Posted by Tarcas   (2 posts)  [Biography] bio
Date Reply #2 on Sat 16 Sep 2017 12:30 PM (UTC)
Message
Thank you very much! That is exactly what I did. Now I'm confused about why I was able to get the 3rd capture if the first capture overwrote the data and prevented the 2nd from coming through.
[Go to top] top

Posted by Nick Gammon   Australia  (22,973 posts)  [Biography] bio   Forum Administrator
Date Reply #3 on Mon 18 Sep 2017 05:03 AM (UTC)
Message
Well, the buffer would have had in it:


r55t


After getting capture "r" the buffer would now have:


r(null)5t


When attempting to get the second capture (offset 1) it would hit the 0x00 byte written there when getting the "r" and found an empty string.

However when getting the "t" we are past the corrupted part, so the "t" is still there.

- Nick Gammon

www.gammon.com.au, www.mushclient.com
[Go to top] top

The dates and times for posts above are shown in Universal Co-ordinated Time (UTC).

To show them in your local time you can join the forum, and then set the 'time correction' field in your profile to the number of hours difference between your location and UTC time.


12,391 views.

It is now over 60 days since the last post. This thread is closed.     [Refresh] Refresh page

Go to topic:           Search the forum


[Go to top] top

Quick links: MUSHclient. MUSHclient help. Forum shortcuts. Posting templates. Lua modules. Lua documentation.

Information and images on this site are licensed under the Creative Commons Attribution 3.0 Australia License unless stated otherwise.

[Home]


Written by Nick Gammon - 5K   profile for Nick Gammon on Stack Exchange, a network of free, community-driven Q&A sites   Marriage equality

Comments to: Gammon Software support
[RH click to get RSS URL] Forum RSS feed ( https://gammon.com.au/rss/forum.xml )

[Best viewed with any browser - 2K]    [Hosted at HostDash]