Register forum user name Search FAQ

Gammon Forum

Notice: Any messages purporting to come from this site telling you that your password has expired, or that you need to verify your details, confirm your email, resolve issues, making threats, or asking for money, are spam. We do not email users with any such messages. If you have lost your password you can obtain a new one by using the password reset link.
 Entire forum ➜ MUSHclient ➜ Lua ➜ string matching

string matching

It is now over 60 days since the last post. This thread is closed.     Refresh page


Posted by Nobody   (38 posts)  Bio
Date Tue 10 Oct 2006 07:29 AM (UTC)
Message
I'm probably being ridiculously stupid, but is there any way to ignore case when using the lua inbuilt string routines?
Top

Posted by Nick Gammon   Australia  (23,099 posts)  Bio   Forum Administrator
Date Reply #1 on Tue 10 Oct 2006 09:53 AM (UTC)
Message
Well, a simple way is to simply convert the strings to upper (or lower) case first.

eg.

a = "nick"
b = "NICK"

a = a:upper()
b = b:upper()

print (a == b) --> true

- Nick Gammon

www.gammon.com.au, www.mushclient.com
Top

Posted by Nobody   (38 posts)  Bio
Date Reply #2 on Wed 11 Oct 2006 02:16 AM (UTC)
Message
yeah I already thought of that. The problem though, is that I'm doing substring finding, and substring matching and replacing, and I dont want to screw up the original texts that I'm using for comparing and contrasting.

Looks like I just have to write my own lua functions that can handle this properly. :-\
Top

Posted by Nick Gammon   Australia  (23,099 posts)  Bio   Forum Administrator
Date Reply #3 on Wed 11 Oct 2006 04:50 AM (UTC)
Message
Well, if you want to see if a string matches another, case-insensitive, do this:


s = "Nick Gammon is here"
print (string.match (s:upper (), "NICK"))  --> NICK


This hasn't changed the source string. Or if you want to pull out the matching stuff, but you aren't sure what it is:


s = "I see Nick Gammon is here"
st, en = string.find (s:upper (), "N..K")  --> find where it matches
print (string.sub (s, st, en))  --> Nick


Here I forced the string to upper case, did an upper case match, found the start and end column, and then pulled out the original string from the source string.

- Nick Gammon

www.gammon.com.au, www.mushclient.com
Top

Posted by Nobody   (38 posts)  Bio
Date Reply #4 on Wed 11 Oct 2006 07:44 AM (UTC)
Message
While I appreciate your efforts, it's not going to work out that simply. Some of it will, some of it won't, and it's the stuff that won't that's going to cause me alot of problems.

I've further noticed that the string functions in lua are pretty limited and disappointing. For example, when using string.find with a regular expression, how do I access the results and the data?

ie. string.find("A long sword", "A long (.*)$")

I understand the return value will be 3 items, the start position, the end position, and the data within the brackets. What I want is a way to reference the data within the brackets.

Now, the simple solution is to simply give it to 3 variables, like so:

nil, nil, item = string.find(...)

This seems like a painful and convoluted way of doing it.

I was looking through google for more help, when I stumbled across a very interesting site (well, cpan) that told how to access lua variables and scripts from within perl. This is an ideal solution for me, because I already have perl scripts that can do all the processing I want. I'm trying to load alot of data into a lua table, but I can do it in perl easily (but lua sux for handling text - sorry, but it does). If I could load the data into a perl hash, convert it into a lua table (within perl) and then dump those tables to a text file which can be read in (in binary form) by a pure lua script, it'd save me about 3 years of work getting lua to behave how I want. Is there some kind of marshalling code in lua tables that would work?
Top

Posted by Nick Gammon   Australia  (23,099 posts)  Bio   Forum Administrator
Date Reply #5 on Wed 11 Oct 2006 08:16 AM (UTC)
Message
It isn't quite that bad. To answer your first question, you use string.match which is new in Lua 5.1:


a = string.match ("A long sword", "A long (.*)$")
print (a) --> sword


Or, if you are using an older version of Lua, for some reason:


_, _, a = string.find ("A long sword", "A long (.*)$")
print (a) --> sword


In this case the variable "_" is a temporary that you don't care about the value of.

I can't believe that you will have huge problems getting Lua to do what you want. For instance, I just did the new spellchecker for MUSHclient using Lua.

If you give a concrete example of what you are trying to do, it might help.

The string matching is really quite powerful. Take this for example:


a, b = string.match ("A long sword", "A (.*) (.*)$")
print (a) --> long
print (b) --> sword


Here you are getting two matches back from a single string.match.


- Nick Gammon

www.gammon.com.au, www.mushclient.com
Top

Posted by Nobody   (38 posts)  Bio
Date Reply #6 on Wed 11 Oct 2006 09:54 AM (UTC)
Message
Thanks again Nick, learning a new language is always frustrating.

At the moment, I'm doing 2 things. The first is my replacement of items with items and their stats:

a long sword.

gets changed to:

a long sword [10d10 +1000hp]

kind of thing. I have a database of items which I'm loading into a lua table (this loading is actually the cause of alot of the problems - more on that later). This database has "a long sword" as its description and it matches this against the lines from the mud. Where it fails is "A long sword", which is why I asked about case-less matching. The reason I didnt want to use the stuff you mentioned above, is because I wanted to maintain case (and not have to change all the stuff in my database to upper or lower case). A solution (which I havent been able to figure out) is if there's some kind of regexp you can make that checks for a string, but ignores the case of the first letter. I haven't been able to think of one, but regexps arent really my strong point.

The second problem related to matching text is in loading the database files. Example database file:

Item 'sword of doom'
Damage: 10d10
Affects HITROLL by +1000
Flags: BLINDNESS
Causes: Disease

Item 'shield of stoppage'
Gives +100 ARMOUR
Flags: AWESOMENESS

Using lua to load this datafile, I end up with a long chain of lines like this:

_,_,a = string.find(line, "Item '(.*)'$")
if (a) then
item["title"] = a
end

Now, that is to fill in 1 part of a database (the title). I'd need a similar section for damage, flags, and so on. I added them up, and there are about 30 possible things that can be included in the database file, which means I'll have to daisy chain about 20 of them one after the other like that. And the complicated part is when they have multiple lines of Affects (for example) and I have to check to see I'm not overwriting anything before putting it into a secondary affects section. It just seems like a horribly convoluted way of doing it.

If you can think of any way of improving and/or streamlining this, please let me know.

Thanks.
Top

Posted by Nick Gammon   Australia  (23,099 posts)  Bio   Forum Administrator
Date Reply #7 on Wed 11 Oct 2006 08:52 PM (UTC)
Message
It isn't Lua that is the problem. You just need to think through a neat way of doing it. Here is an example that solves that problem for you. In a single pass it process that sort of file, building it into an in-memory database (a Lua table). Here is the code (I ran it in the Immediate window of MUSHclient):


-- the database
db = {}

local multiples = {}

for _, v in ipairs {
    "Affects", "SomeOtherThing"
            } do multiples [v] = true end
            
function add_line (s)
  local name = string.match (s, "^Item%s+'(.*)'$")

  -- if not nil, we have started a new one
  if name then
    item = { Item = name }  -- new table of attributes
    db [name:lower()] = item -- add to database
    return
  end -- found new item
  
  -- get stuff like: Damage: 10d10
  local attribname, attrib = string.match (s, "^(%w+):?%s+(.*)$")
  
  if not attrib then
    return
  end -- nothing? maybe blank line
  
  -- things like affects need a secondary table as there may be multiples
  
  if multiples [attribname] then
    item [attribname] = item [attribname] or {}  -- make a sub table
    table.insert (item [attribname], attrib)     -- add to table
  else
    item [attribname] = attrib
  end -- if

end -- add_line

for line in io.lines ("test.txt") do
  add_line (line)
end -- for


-- test it

require "tprint"

print (string.rep ("-", 50))
tprint (db)



You need to change the filename from "test.txt" to whatever it is.

Your database looks a bit strange, I wasn't sure if that was typos or not. I tried it on this:


Item 'sword of doom'
Damage: 10d10
Affects: HITROLL by +1000
Affects: DAMAGE by -50
Flags: BLINDNESS
Causes: Disease

Item 'shield of stoppage'
Gives: +100 ARMOUR
Affects: HP by +30
Affects: MANA by -50
Affects: ARMOR by -100
Flags: AWESOMENESS


Note that in my case I put a colon after everything (except the Item line). You had it after Damage and Flags but not Affects and Gives. If I guessed wrong, just change the regexp slightly. Actually I did that now, I made the colon optional.

Anyway, the code "knows" about things that might have multiples, see the "multiples" table above.

For those things it builds a sub-table, which can have any number of entries.

Running that on that test data gives these results:


"sword of doom":
  "Flags"="BLINDNESS"
  "Item"="sword of doom"
  "Affects":
    1="HITROLL by +1000"
    2="DAMAGE by -50"
  "Causes"="Disease"
  "Damage"="10d10"
"shield of stoppage":
  "Flags"="AWESOMENESS"
  "Gives"="+100 ARMOUR"
  "Item"="shield of stoppage"
  "Affects":
    1="HP by +30"
    2="MANA by -50"
    3="ARMOR by -100"


I have indexed each item by the lower-case name, so given an item, you can always look it up by forcing the item name to lower case. For example:


tprint (db [string.lower ("SWORD OF DOOM")])

--> gives

"Item"="BLINDNESS"
"Name"="sword of doom"
"Affects":
  1="HITROLL by +1000"
  2="DAMAGE by -50"
"Causes"="Disease"
"Damage"="10d10"


The original name (with whatever its capitalization is) is also inside the table under "Item").

- Nick Gammon

www.gammon.com.au, www.mushclient.com
Top

Posted by Nobody   (38 posts)  Bio
Date Reply #8 on Thu 12 Oct 2006 11:07 AM (UTC)
Message
Thanks for the code, I havent implemented it yet (been busy all today) but I think I can step through it and follow what's happening.

The thing I wanted to show with my wacky item database was that the lines in it aren't always uniform. Some are, but alot arent, they aren't always going to have the 'predicatable' parts (if you get my meaning). In fact, some of them (instead of having the stats of the item) give a little message that gives hints to it's prowess, but which are otherwise a message that could contain anything (including multiple lines) and I was trying to show that a program which parses this file needs to be able to handle these things.

I'll probably try and implement it tomorrow when I have time, and if I have further difficulties I'll be sure to ask here. Thanks again for the code.

Incidentally, did you have any thoughts on my first problem (the case-less matching of text and replacing)?
Top

Posted by Tsunami   USA  (204 posts)  Bio
Date Reply #9 on Thu 12 Oct 2006 07:17 PM (UTC)
Message
I was having a similar problem a little while ago. Given a chunk of text, I wanted to insert linebreak characters every x (80) characters, but preserving words. I found a way around my problem which didn't necessitate having to do this, but before I found it, I was still looking into Lua string manipulation a bit, and accomplishing something like this seemed quite hard.
Top

Posted by Nick Gammon   Australia  (23,099 posts)  Bio   Forum Administrator
Date Reply #10 on Thu 12 Oct 2006 11:27 PM (UTC)
Message
Quote:

... did you have any thoughts on my first problem (the case-less matching of text and replacing)


It depends on what you are trying to do. I gave an example near the top of this thread of doing a string.find on the strings converted to upper-case, this tells you if they match or not. If you want the exact matching string (before the case conversion) you can get the columns, like in my example.

Using regular expressions, you can match on "Door" or "door" by matching on: "[Dd]oor".


- Nick Gammon

www.gammon.com.au, www.mushclient.com
Top

Posted by Onoitsu2   USA  (248 posts)  Bio
Date Reply #11 on Fri 13 Oct 2006 08:36 AM (UTC)
Message

    function nocase (s)
      s = string.gsub(s, "%a", function (c)
            return string.format("[%s%s]", string.lower(c),
                                           string.upper(c))
          end)
      return s
    end -- nocase

    print(nocase("Hi there!"))
      -->  [hH][iI] [tT][hH][eE][rR][eE]!


I found that on some website I ran across, it takes and creates the caseless pattern for you, then you can take what in this case has been sent to the print function, and send that to a variable, that can be used in the string function as the pattern.

if is from the online version of the first edition of:
'Programming in Lua' by Roberto Ierusalimschy

Hope that helps a little :)
You can look at it at http://www.lua.org/pil/

Laterzzz,
Onoitsu2
Top

Posted by Mahony   (27 posts)  Bio
Date Reply #12 on Mon 08 Sep 2014 08:39 PM (UTC)
Message
I'm trying to search a string and I need to ignore case like grep -i in linux does. So when I'm looking for dog or DOg I would like to find dog Dog DOG...
Is it possible with string.match somehow? More simple way than writing/using a function nocase...
Thank you
Top

Posted by Nick Gammon   Australia  (23,099 posts)  Bio   Forum Administrator
Date Reply #13 on Mon 08 Sep 2014 08:48 PM (UTC)
Message
Force the entire string to upper or lower case and then search that. Make a copy of you don't want to destroy the original.

eg.


target = "I see a DoG here"

print (string.match (target:lower (), "dog"))


Since we force "target" to be lower case, then we just search for lower-case "dog".

- Nick Gammon

www.gammon.com.au, www.mushclient.com
Top

The dates and times for posts above are shown in Universal Co-ordinated Time (UTC).

To show them in your local time you can join the forum, and then set the 'time correction' field in your profile to the number of hours difference between your location and UTC time.


54,597 views.

It is now over 60 days since the last post. This thread is closed.     Refresh page

Go to topic:           Search the forum


[Go to top] top

Information and images on this site are licensed under the Creative Commons Attribution 3.0 Australia License unless stated otherwise.