Register forum user name Search FAQ

string.find

Summary

Searches a string for a pattern

Prototype

st, en, cap1, cap2, cap3 = string.find (str, pattern, index, plain)


Description

Find the first match of the regular expression "pattern" in "str", starting at position "index".

The starting position (index) is optional, and defaults to 1 (the start of the string).

If found, returns the start and end position, and any captures as additional results.

If not found, returns nil.

If "plain" is true, the search string is plain text, not a regular expression. (The "plain" argument is optional, and defaults to false).




Also see:


  • string.match which operates in a similar way, but does not return the start and end positions
  • string.gmatch which iterates over a string, allowing you to take action on each match (eg. on each word)
  • string.gsub which lets you make replacements on matching elements (for example, replace one word with another, or make certain things all upper-case)




Patterns

The standard patterns (character classes) you can search for are:


 . --- (a dot) represents all characters. 
%a --- all letters. 
%c --- all control characters. 
%d --- all digits. 
%l --- all lowercase letters. 
%p --- all punctuation characters. 
%s --- all space characters. 
%u --- all uppercase letters. 
%w --- all alphanumeric characters. 
%x --- all hexadecimal digits. 
%z --- the character with hex representation 0x00 (null). 
%% --- a single '%' character.
%1 --- captured pattern 1.
%2 --- captured pattern 2 (and so on).
%f[s]  transition from not in set 's' to in set 's'.
%b()   balanced nested pair ( ... ( ... ) ... ) 


Important! - the uppercase versions of the above represent the complement of the class. eg. %U represents everything except uppercase letters, %D represents everything except digits.

Also important! If you are using string.find (or string.match etc.) in MUSHclient, and inside "send to script" in a trigger or alias, then the % sign has special meaning there (it is used to identify wildcards, for example, %1 is wildcard 1). Thus the % signs in string.find need to be doubled or they won't work properly (so use %%d instead of %d in "send to script"). This does not apply if you are scripting in a script file, because the expansion of wildcards does not apply there.



Magic characters

There are some "magic characters" (such as %) that have special meanings. These are:


^ $ ( ) % . [ ] * + - ? 


If you want to use those in a pattern (as themselves) you must precede them by a % symbol.

eg. %% would match a single % (also see note above about "send to script")

In practice, it is safe to put % in front of any non-alphanumeric character. If in doubt, put a % in front of a special character.



Quotes and backslashes

The arguments to string.find (and string.match, etc.) are just normal Lua strings. Thus, to put a backslash or quote inside such a string you still need to "escape" it with a backslash in the usual way.

eg. string.find (str, "\\") -- find a single backslash



Sets

You can build your own pattern classes (sets) by using square brackets, eg.


[abc] ---> matches a, b or c
[a-z] ---> matches lowercase letters (same as %l)
[^abc] ---> matches anything except a, b or c
[%a%d] ---> matches all letters and digits
[%a%d_] ---> matches all letters, digits and underscore
[%[%]] ---> matches square brackets (had to escape them with %)


You can use pattern classes in the form %x in the set. If you use other characters (like periods and brackets, etc.) they are simply themselves.

You can specify a range of character inside a set by using simple characters (not pattern classes like %a) separated by a hyphen. For example, [A-Z] or [0-9]. These can be combined with other things. For example [A-Z0-9] or [A-Z,.].

The end-points of a range must be given in ascending order. That is, [A-Z] would match upper-case letters, but [Z-A] would not match anything.

A hyphen at the start or end of a set is itself (matches a hyphen).

You can negate a set by starting it with a "^" symbol, thus [^0-9] is everything except the digits 0 to 9. The negation applies to the whole set, so [^%a%d] would match anything except letters or digits. In anywhere except the first position of a set, the "^" symbol is simply itself.

Inside a set (that is a sequence delimited by square brackets) the only "magic" characters are:


] ---> to end the set, unless preceded by %
% ---> to introduce a character class (like %a), or magic character (like "]")
^ ---> in the first position only, to negate the set (eg. [^A-Z)
- ---> between two characters, to specify a range (eg. [A-F])


Thus, inside a set, characters like "." and "?" are just themselves.



Repetition

The repetition characters, which can follow a character, class or set, are:


+  ---> 1 or more repetitions (greedy)
*  ---> 0 or more repetitions (greedy)
-  ---> 0 or more repetitions (non greedy)
?  ---> 0 or 1 repetition only


A "greedy" match will match on as many characters as possible, a non-greedy one will match on as few as possible.



Anchor to start and/or end of string

The standard "anchor" characters apply:


^  ---> anchor to start of subject string (must be the very first character)
$  ---> anchor to end of subject string   (must be the very last character)


For example:


^You see     ---> string must start with "You see"
experience$  ---> string must end with "experience"
^Tick$       ---> string must be exactly "Tick" with no other characters




Captures

You can also use round brackets to specify "captures":


You see (.*) here


Here, whatever matches (.*) becomes the first capture.

You can also refer to matched substrings (captures) later on in an expression:


print (string.find ("You see dogs and dogs", "You see (.*) and %1")) --> 1 21 dogs
print (string.find ("You see dogs and cats", "You see (.*) and %1")) --> nil


This example shows how you can look for a repetition of a word matched earlier, whatever that word was ("dogs" in this case).

As a special case, an empty capture string returns as the captured pattern, the position of itself in the string. eg.


print (string.find ("You see dogs and cats", "You .* ()dogs .*")) --> 1 21 9


What this is saying is that the word "dogs" starts at column 9.

There is a limit of 32 captures that can be returned.



Balanced sequences

Finally you can look for nested "balanced" things (such as parentheses) by using %b, like this:


print (string.find ("I see a (big fish (swimming) in the pond) here",
       "%b()"))  --> 9 41


After %b you put 2 characters, which indicate the start and end of the balanced pair. If it finds a nested version it keeps processing until we are back at the top level. In this case the matching string was "(big fish (swimming) in the pond)".



Frontier patterns

A "frontier" (or boundary) pattern is used to assert a transition from one set of characters to another (eg. non-letters to letters, or non-digits to digits). This can be useful to detect words, such as "log" but omit "blog" or "logging".

A frontier is specified as %f[set] and matches on a transition from not-in-set to in-set. For example, to match "log" on its own:


print (string.find ("There is a log here", "%f[%a]log%f[%A]"))   --> 12 14 
print (string.find ("There is a blog here", "%f[%a]log%f[%A]"))  --> nil
print (string.find ("There is logging here", "%f[%a]log%f[%A]")) --> nil


The first frontier ("%f[%a]") matches on the transition from not-letters to letters. The second frontier ("%f[%A]") matches on letters to not-letters. Effectively this gives you a word boundary match.



Examples


print (string.find ("the quick brown fox", "quick")) --> 5 9
print (string.find ("the quick brown fox", "(%a+)")) --> 1 3 the
print (string.find ("the quick brown fox", "(%a+)", 10)) --> 11 15 brown
print (string.find ("the quick brown fox", "fruit")) --> nil


See Also ...

Lua functions

string.byte - Converts a character into its ASCII (decimal) equivalent
string.char - Converts ASCII codes into their equivalent characters
string.dump - Converts a function into binary
string.format - Formats a string
string.gfind - Iterate over a string (obsolete in Lua 5.1)
string.gmatch - Iterate over a string
string.gsub - Substitute strings inside another string
string.len - Return the length of a string
string.lower - Converts a string to lower-case
string.match - Searches a string for a pattern
string.rep - Returns repeated copies of a string
string.reverse - Reverses the order of characters in a string
string.sub - Returns a substring of a string
string.upper - Converts a string to upper-case

Topics

Lua base functions
Lua bc (big number) functions
Lua bit manipulation functions
Lua coroutine functions
Lua debug functions
Lua io functions
Lua LPEG library
Lua math functions
Lua os functions
Lua package functions
Lua PCRE regular expression functions
Lua script extensions
Lua string functions
Lua syntax
Lua table functions
Lua utilities
Regular Expressions
Scripting
Scripting callbacks - plugins

(Help topic: lua=string.find)

Documentation contents page


Search ...

Enter a search string to find matching documentation.

Search for:   

Information and images on this site are licensed under the Creative Commons Attribution 3.0 Australia License unless stated otherwise.