Message
| I found the Lua manual a little confusing when it described the find, and find-and-replace functions in the Lua string library, so I am doing my own documentation. :)
string.find
This is your basic "string within a string" finder. You pass it a source string, a pattern to look for, and an optional starting point (defaulting to 1).
eg.
print (string.find ("mend both legs at once", "legs")) --> 11 14
This example returns the start and end points of the word "legs" (columns 11 to 14).
print (string.find ("mend both legs at once", "goats")) --> nil
A return value of nil means the pattern was not found. Since nil is considered false in an if test you can simply write something like this:
test = "mend both legs at once"
if string.find (test, "legs") then
print "found!"
else
print "not found"
end -- if
You can also specify a starting column, if you want to skip part of the initial string:
print (string.find ("mend both legs at once", "legs", 5)) --> 11 14
print (string.find ("mend both legs at once", "legs", 15)) --> nil
The first example still returned exactly the same numbers, since "legs" was found past column 5. However supplying column 15 meant it wasn't found.
Plain matches
You can also specify a fourth argument, the "plain" argument (true or false). If true, then the search pattern is considered a plain pattern, not a regular expression. To specify this you must also give the third argument (start position).
print (string.find ("I see %a here", "%a")) --> 1 1
print (string.find ("I see %a here", "%a", 1, true)) --> 7 8
In the first case we are searching for %a, but %a is a special pattern (see below) meaning "all letters". Hence it matched on column 1, since that had a letter in it.
In the second case we have turned "plain match" on (and also specified column 1 as the starting position). Now it matches literally %a at column 7.
The "plain" argument would be very handy for situations where you let the user specify a search pattern, where the thing they are searching for is quite likely to contain periods, brackets, question marks, and so on.
Returning captured strings
If you set up "captures" (see below under "Patterns") the captured string(s) are also returned:
print (string.find ("mend both legs at once", "(l..s)")) --> 11 14 legs
print (string.find ("sword hits Nick", "(%a+) hits (%a+)")) --> 1 15 sword Nick
If you are mainly interested in what was captured (if anything) rather than where it is, you can use a dummy variable (like _ ) to discard the columns and simply retrieve the captured data:
_, _, what = string.find ("You are struck (glancing)", "(%b())")
print (what) --> (glancing)
Note for Lua 5.1
Under Lua 5.1, you can use string.match which only returns the matching text (and not the columns), so this example could be written as:
what = string.match ("You are struck (glancing)", "(%b())")
print (what) --> (glancing)
Patterns
Before I move on, let's look at looking for other patterns. We can use regular expressions inside find and replace calls, however these are the Lua patterns, not the ones MUSHclient users are accustomed to.
print (string.find ("mend both legs at once", "l..s")) --> 11 14
In this example the "." character matches any single character.
Now let's try a repeated sequence:
print (string.find ("balls bells bills", "b.+s")) --> 1 17
A problem here is that we don't necessarily want the match to span the entire line. This is called a "greedy" match, as it matched as much as it could. By using "-" instead of "+" we have a non-greedy match.
print (string.find ("balls bells bills", "b.-s")) --> 1 5
The standard patterns you can search for are:
. --- (a dot) represents all characters.
%a --- represents all letters.
%c --- represents all control characters.
%d --- represents all digits.
%l --- represents all lowercase letters.
%p --- represents all punctuation characters.
%s --- represents all space characters.
%u --- represents all uppercase letters.
%w --- represents all alphanumeric characters.
%x --- represents all hexadecimal digits.
%z --- represents the character with representation 0.
Important - the uppercase versions of the above represent the complement of the class. eg. %U represents everything except uppercase letters, %D represents everything except digits.
There are some "magic characters" (such as %) that have special meanings. These are:
^ $ ( ) % . [ ] * + - ?
If you want to use those in a pattern (as themselves) you must precede them by a % symbol.
eg. %% would match a single %
As with normal MUSHclient regular expressions you can build your own pattern classes by using square brackets, eg.
[abc] ---> matches a, b or c
[a-z] ---> matches lowercase letters (same as %l)
[^abc] ---> matches anything except a, b or c
[%a%d] ---> matches all letters and digits
[%a%d_] ---> matches all letters, digits and underscore
[%[%]] ---> matches square brackets (had to escape them with %)
The repetition characters are:
+ ---> 1 or more repetitions (greedy)
* ---> 0 or more repetitions (greedy)
- ---> 0 or more repetitions (non greedy)
? ---> 0 or 1 repetition only
The standard "anchor" characters apply:
^ ---> anchor to start of subject string
$ ---> anchor to end of subject string
You can also use round brackets to specify "captures", similar to normal MUSHclient regular expressions:
You see (.*) here
Here, whatever matches (.*) becomes the first pattern.
You can also refer to matched substrings (captures) later on in an expression:
print (string.find ("You see dogs and dogs", "You see (.*) and %1")) --> 1 21 dogs
print (string.find ("You see dogs and cats", "You see (.*) and %1")) --> nil
This example shows how you can look for a repetition of a word matched earlier, whatever that word was ("dogs" in this case).
As a special case, an empty capture string returns as the captured pattern, the position of itself in the string. eg.
print (string.find ("You see dogs and cats", "You .* ()dogs .*")) --> 1 21 9
What this is saying is that the word "dogs" starts at column 9.
Finally you can look for nested "balanced" things (such as parentheses) by using %b, like this:
print (string.find ("I see a (big fish (swimming) in the pond) here",
"%b()")) --> 9 41
After %b you put 2 characters, which indicate the start and end of the balanced pair. If it finds a nested version it keeps processing until we are back at the top level. In this case the matching string was "(big fish (swimming) in the pond)".
string.gsub
The simple use of gsub is to replace one thing by another, eg.
print (string.gsub ("nick eats fish", "fish", "chips")) --> nick eats chips 1
The "1" at the end is the 2nd result returned from gsub, which tells us how many substitutions it did. eg.
print (string.gsub ("fish eats fish", "fish", "chips")) --> chips eats chips 2
Of course, since the matching string can be a pattern we can do something like replace all vowels with a dot:
print (string.gsub ("nick eats fish", "[AEIOUaeiou]", ".")) --> n.ck ..ts f.sh 4
Here we see that 4 vowels have been replaced.
We can also set up a capture (with round brackets) and refer to the captured data in the replacement string:
print (string.gsub ("nick eats fish", "([AEIOUaeiou])", "(%1)")) --> n(i)ck (e)(a)ts f(i)sh 4
In this case we are putting all vowels into brackets.
We can discard the replacement string, and simply use string.gsub to count things for us:
_, n = string.gsub ("nick eats fish", "[AEIOUaeiou]", "")
print (n) --> 4
In this case we use the short variable name "_" as a dummy variable, and concentrate on the 2nd returned result "n", which is the count of substitutions.
Replacement function
Next we can pass a function to gsub rather than a simple string. In this case the function is called for each matched instance in the source string. Starting simply:
function f (s)
print ("found " .. s)
end -- f
string.gsub ("Nick is taking a walk today", "%a+", f)
Output
found Nick
found is
found taking
found a
found walk
found today
In the above example I am searching for one or more alphabetic characters (words in other words), and for each one found the function "f" is called, which prints the found word.
Since Lua supports anonymous inline functions the above example can be written more shortly:
string.gsub ("Nick is taking a walk today", "%a+",
function (s)
print ("found " .. s)
end
)
This has the same output, but saves having to define a function "f" in advance.
Given this capability, we can start getting fancy, by doing a lookup inside the supplied function. For each call of the function, you are expected to return a replacement value for the matching string. So I will make an example that does a table lookup, and replaces "nice" by "windy", and "walk" by "stroll".
replacements = {
["nice"] = "windy",
["walk"] = "stroll",
}
s = "a nice long walk"
result = string.gsub (s, "%a+",
function (str)
return replacements [str] or str
end
)
print (result) --> a windy long stroll
This example looks up in the replacements table, and if a match is found returns that, otherwise (by using the short-circuit boolean evaluation) returns the original string instead.
Note for Lua 5.1
Under Lua 5.1 you can simply provide a table of target/replacement strings, so the gsub could be written more neatly like this:
result = string.gsub (s, "%a+", replacements)
Replacement count
Finally we can supply a fourth argument, the maximum number of replacements we want done (which might be one, of course). For example:
print (string.gsub ("I see a see saw", "see", "view")) --> I view a view saw 2
In this case all instances of "see" have become "view". Now let's limit it to the first one:
print (string.gsub ("I see a see saw", "see", "view", 1)) --> I view a see saw 1
string.gfind
Finally, gfind offers a way of looping over a source string, and doing something with the each matching instance, assuming we don't want to actually modify the string.
For example, to take a string and build each word in it into a table:
words = {}
for w in string.gfind ("nick takes a stroll", "%a+") do
table.insert (words, w)
end -- for
tprint (words)
Output
1=nick
2=takes
3=a
4=stroll
Effectively gfind is an iterator that can be used in a for loop.
By using captures we can return more than one thing from gfind, so we can write something like this which might be used to decode configuration parameters:
config = {}
for key, value in string.gfind ("name=nick, height=100",
"(%a+)=([%a%d]+)") do
config [key] = value
end -- for
tprint (config)
Output
height=100
name=nick
Note for Lua 5.1
The function string.gfind has been renamed string.gmatch under Lua 5.1. |
- Nick Gammon
www.gammon.com.au, www.mushclient.com | Top |
|