Search FAQ

Gammon Forum

Notice: Any messages purporting to come from this site telling you that your password has expired, or that you need to verify your details, confirm your email, resolve issues, making threats, or asking for money, are spam. We do not email users with any such messages. If you have lost your password you can obtain a new one by using the password reset link.
 Entire forum ➜ MUSHclient ➜ Tips and tricks ➜ Algorithm for sorting strings more "naturally"

Algorithm for sorting strings more "naturally"

Posting of new messages is disabled at present.

Refresh page


Posted by Nick Gammon   Australia  (23,122 posts)  Bio   Forum Administrator
Date Sat 07 Jun 2008 01:02 AM (UTC)

Amended on Sat 07 Jun 2008 01:58 AM (UTC) by Nick Gammon

Message
Here's a nifty idea I found on this web page:

http://www.davekoelle.com/alphanum.html

The idea is to solve the problem of sorting lists into a way that humans are going to read. For example, filenames, or lists of parts.

Using standard sorting, a list of file names might be sorted like this:


Z9A.doc
z1.doc
z10.doc
z100.doc
z101.doc
z102.doc
z11.doc
z12.doc
z13.doc
z14.doc
z15.doc
z16.doc
z17.doc
z18.doc
z19.doc
z2.doc
z20.doc
z3.doc
z4.doc
z5.doc
z6.doc
z7.doc
z8.doc
z9.doc


Whilst this is in "string" sort order, it doesn't make a heap of sense, because for one thing the capitalized file name is a long way from the lower-case file name, and for another, files z1.doc and z2.doc are not next to each other.

Using the algorithm below (saved as a Lua module) you get more natural results:


z1.doc
z2.doc
z3.doc
z4.doc
z5.doc
z6.doc
z7.doc
z8.doc
z9.doc
Z9A.doc
z10.doc
z11.doc
z12.doc
z13.doc
z14.doc
z15.doc
z16.doc
z17.doc
z18.doc
z19.doc
z20.doc
z100.doc
z101.doc
z102.doc


The sort function works by breaking each string into "chunks" of either strings or numbers and then comparing each chunk correctly (numeric compare for numbers, string compare for strings).

To use it, save between the lines as alphanum.lua in your MUSHclient Lua directory, and then use it as per the example below.


-- alphanum.lua
--
-- Adapted somewhat from: http://www.davekoelle.com/files/alphanum.lua
-- Also see: http://www.davekoelle.com/alphanum.html
--
-- Implements a sort function that does a more "human readable" sort order.
-- It breaks the sort strings into "chunks" and then compares each one naturally,
-- depending on whether it is a string or a number (eg. z9.doc compares less than z20.doc)
-- It also does a case-insensitive compare (so "nick" and "Nick" come out together).

--[[
Example:

require "alphanum"

t={
"z18.doc","z19.doc","z2.doc","z16.doc","z17.doc",
"z1.doc","z101.doc","z102.doc","z11.doc","z12.doc",
"z13.doc","z14.doc","z15.doc","z20.doc","z3.doc",
"z4.doc","z5.doc","z6.doc","z10.doc","z100.doc",
"z7.doc","z8.doc","z9.doc", "Z9A.doc",
}

table.sort(t, alphanum)

for i=1, #t do
   print(t[i])
end

--]]


local function chunkString(str)
local c = {}
  for a, b in str:gmatch("(%d*)(%D*)") do
   if a ~= "" then c[#c+1] = tonumber(a) end
   if b ~= "" then c[#c+1] = b end
  end
  return c
end

function alphanum (a, b)
   local achunks = chunkString(a)
   local bchunks = chunkString(b)
   
   for i = 1, math.min (#achunks, #bchunks) do
      local as, bs = achunks [i], bchunks [i]
      
      -- if one is a string, make them both strings
      if type (as) == "string" or type (bs) == "string"  then 
        as = (tostring (as)):upper ()
        bs = (tostring (bs)):upper ()
      end -- at least one is a string
      -- if they are equal, move onto the next chunk
      if as ~= bs then 
        return as < bs 
      end
   end
   
   -- still equal? the one with fewer chunks compares less
   return #achunks < #bchunks
end

return alphanum

- Nick Gammon

www.gammon.com.au, www.mushclient.com
Top

Posted by Nick Gammon   Australia  (23,122 posts)  Bio   Forum Administrator
Date Reply #1 on Sat 07 Jun 2008 01:27 AM (UTC)

Amended on Sat 07 Jun 2008 02:34 AM (UTC) by Nick Gammon

Message
The problem with the above version is it is a little inefficient - the "chunking" is carried out each time a comparison is made, which can be something like 800 times for a table of 70 items.

This version below, which needs to be called slightly differently, does the chunking once only, and then looks up the chunked information in a temporary table.

Note that you now need to pass the table down as an argument to alphanum, so it can build the temporary table of chunks (line in bold).


-- alphanum.lua
--
-- Adapted somewhat from: http://www.davekoelle.com/files/alphanum.lua
-- Also see: http://www.davekoelle.com/alphanum.html
--
-- Implements a sort function that does a more "human readable" sort order.
-- It breaks the sort strings into "chunks" and then compares each one naturally,
-- depending on whether it is a string or a number (eg. z9.doc compares less than z20.doc)
-- It also does a case-insensitive compare (so "nick" and "Nick" come out together).

--[[
Example:

require "alphanum"

t={
"z18.doc","z19.doc","z2.doc","z16.doc","z17.doc",
"z1.doc","z101.doc","z102.doc","z11.doc","z12.doc",
"z13.doc","z14.doc","z15.doc","z20.doc","z3.doc",
"z4.doc","z5.doc","z6.doc","z10.doc","z100.doc",
"z7.doc","z8.doc","z9.doc", "Z9A.doc",
}

table.sort(t, alphanum (t))

for i=1, #t do
   print(t[i])
end

--]]

function alphanum (t)
  assert (type (t) == "table", "Must pass table to be sorted to alphanum")
  
  local function chunkString(str)
  local c = {}
  for a, b in tostring (str):gmatch("(%d*)(%D*)") do
     if a ~= "" then c[#c+1] = tonumber(a) end
     if b ~= "" then c[#c+1] = b end
    end
    return c
  end

  local chunks = {}
  -- build temporary table of the keys
  for i=1, #t do
   chunks [t [i]] = chunkString (t [i])
  end

  return function (a, b) -- return our sort comparison function
  
   -- lookup chunked information from previously-built table
   local achunks = chunks [a]
   local bchunks = chunks [b]
   
   for i = 1, math.min (#achunks, #bchunks) do
      local as, bs = achunks [i], bchunks [i]
      
      -- if one is a string, make them both strings
      if type (as) == "string" or type (bs) == "string"  then 
        as = (tostring (as)):upper ()
        bs = (tostring (bs)):upper ()
      end -- at least one is a string

      -- if they are equal, move onto the next chunk
      if as ~= bs then 
        return as < bs 
      end  -- if
   end  -- for each chunk
   
   -- still equal? the one with fewer chunks compares less
   return #achunks < #bchunks

  end  -- sort function

end -- alphanum

return alphanum

- Nick Gammon

www.gammon.com.au, www.mushclient.com
Top

The dates and times for posts above are shown in Universal Co-ordinated Time (UTC).

To show them in your local time you can join the forum, and then set the 'time correction' field in your profile to the number of hours difference between your location and UTC time.


10,712 views.

Posting of new messages is disabled at present.

Refresh page

Go to topic:           Search the forum


[Go to top] top

Information and images on this site are licensed under the Creative Commons Attribution 3.0 Australia License unless stated otherwise.