Message
| I recently upgraded my network storage system (where you store files on a network drive rather than on individual PCs). The advantage of such a system is that you can access a file (eg. a photo) from any PC on the network, whether it is Mac, Windows or Linux.
However during the upgrade process I have been copying tens of thousands of files from the old folders to the new ones, and occasionally (after, like, an hour of copying) the copy will fail with some sort of error message.
When this happens the annoying thing is not knowing which files have been copied, and which ones still need copying, without manually inspecting hundreds of directories.
Hence, a quick Lua utility was born. (I know, there are no doubt many nice GUI utilities that will do this, but where is the fun if you can't re-invent the wheel from time to time?).
I wanted this to run under stand-alone Lua (that is, not under MUSHclient), so the first thing was to make a "scan the directory" utility. This reads a disk directory and returns a table of entries, of each file or folder in it.
This utility already exists in the MUSHclient source:
http://www.gammon.com.au/scripts/doc.php?lua=utils.readdir
Thus, I got the source and pulled out the relevant bits. I also left in the code to calculate MD5 hashes, in case one day I wanted to actually hash each file and check they were absolutely identical.
The directory scanner was placed in a file lua_utils.c as follows:
// To compile: gcc -mno-cygwin -shared -o lua_utils.dll lua_utils.c md5.c -llua
#define LUA_BUILD_AS_DLL
#define LUA_LIB
#include "lua.h"
#include "lauxlib.h"
#include "md5.h"
#include <io.h>
#include <errno.h>
typedef unsigned char UC;
// MD5 128-bit hashing algorithm
// see: http://www.cr0.net:8040/code/crypto/md5/
static int utils_md5 (lua_State *L)
{
unsigned char digest [16];
// get text to hash
size_t textLength;
const char * text = luaL_checklstring (L, 1, &textLength);
md5_context ctx;
md5_starts (&ctx);
md5_update (&ctx, (UC *) text, textLength);
md5_finish (&ctx, digest);
lua_pushlstring (L, digest, sizeof digest);
return 1; // number of result fields
} // end of utils_md5
// make number table item
static void MakeNumberTableItem (lua_State *L, const char * name, const double n)
{
lua_pushstring (L, name);
lua_pushnumber (L, n);
lua_rawset(L, -3);
}
// make boolean table item
static void MakeBoolTableItem (lua_State *L, const char * name, const int b)
{
if (b)
{
lua_pushstring (L, name);
lua_pushboolean (L, b != 0);
lua_rawset(L, -3);
}
}
static int getdirectory (lua_State *L)
{
// get directory name (eg. C:\mushclient\*.doc)
size_t dirLength;
const char * dirname = luaL_checklstring (L, 1, &dirLength);
struct _finddatai64_t fdata;
int h = _findfirsti64 (dirname, &fdata); // get handle
if (h == -1L) // no good?
{
lua_pushnil (L);
switch (errno)
{
case EINVAL: lua_pushliteral (L, "Invalid filename specification"); break;
default: lua_pushliteral (L, "File specification could not be matched"); break;
}
return 2; // return nil, error message
}
lua_newtable(L); // table of entries
do
{
lua_pushstring (L, fdata.name); // file name (will be key)
lua_newtable(L); // table of attributes
// inside this new table put the file attributes
MakeNumberTableItem (L, "size", (double) fdata.size);
if (fdata.time_create != -1) // except FAT
MakeNumberTableItem (L, "create_time", fdata.time_create);
if (fdata.time_access != -1) // except FAT
MakeNumberTableItem (L, "access_time", fdata.time_access);
MakeNumberTableItem (L, "write_time", fdata.time_write);
MakeBoolTableItem (L, "archive", fdata.attrib & _A_ARCH);
MakeBoolTableItem (L, "hidden", fdata.attrib & _A_HIDDEN);
MakeBoolTableItem (L, "normal", fdata.attrib & _A_NORMAL);
MakeBoolTableItem (L, "readonly", fdata.attrib & _A_RDONLY);
MakeBoolTableItem (L, "directory", fdata.attrib & _A_SUBDIR);
MakeBoolTableItem (L, "system", fdata.attrib & _A_SYSTEM);
lua_rawset(L, -3); // set key of table item (ie. file name)
} while (_findnexti64 ( h, &fdata ) == 0);
_findclose (h);
return 1; // one table of entries
} // end of getdirectory
// table of operations
static const struct luaL_reg utilslib [] =
{
{"md5", utils_md5},
{"readdir", getdirectory},
{NULL, NULL}
};
// register library
LUALIB_API int luaopen_utils(lua_State *L)
{
luaL_register (L, "utils", utilslib);
return 1;
}
The comment on the first line shows what to type under Cygwin to compile this file and get a DLL.
Next we need the actual Lua utility to scan the directories and report on what it finds:
-- Directory scanner
-- Author: Nick Gammon
-- Date: 6 February 2008
ORIGINAL_ROOT = "x:/"
ORIGINAL_PATH = ""
COPY_ROOT = "y:/"
COPY_PATH = ORIGINAL_PATH
RESULTS_FILE = "results.txt"
assert (package.loadlib ("lua_utils.dll", "luaopen_utils")) ()
-- root is not stored (eg. z:/)
-- path is directory under root (eg. documents)
-- store is table to put results in
function process_dir (root, path, store)
local filecount, foldercount, bytes = 0, 1, 0 -- we have one folder here
print (" -->", root .. path, "...")
-- don't add slash to empty name
local path_with_slash = path
if path ~= "" then
path_with_slash = path .. "/"
end -- if slash needed
local t = assert (utils.readdir (root .. path_with_slash .. "*"))
for k, v in pairs (t) do
if k ~= "." and k ~= ".." then
-- recurse if directory
if v.directory then
local a, b, c = process_dir (root, path_with_slash .. k, store)
filecount = filecount + a
foldercount = foldercount + b
bytes = bytes + c
else
store [path_with_slash .. k] = v.size
filecount = filecount + 1
bytes = bytes + v.size
end -- not directory
end -- not special directories
end -- each file
return filecount, foldercount, bytes
end -- function process_dir
function show_table (t, heading, f)
f:write (string.rep ("-", 70), "\n")
f:write (heading, "\n")
f:write "\n"
if next (t) == nil then
f:write " (none)\n"
else
for k, v in ipairs (t) do
f:write (" " .. v .. "\n")
end -- for loop
end -- if empty
f:write "\n"
end -- show_table
local tstart = os.time ()
local original = {}
local copy = {}
-- do original files
local original_count, original_folders, original_size = process_dir (ORIGINAL_ROOT, ORIGINAL_PATH, original)
-- do my supposed copy
local copy_count, copy_folders, copy_size = process_dir (COPY_ROOT, COPY_PATH, copy)
-- check all OK
local not_in_copy = {}
local not_in_original = {}
local wrong_size = {}
for k, v in pairs (original) do
if copy [k] then
-- check size same
if v ~= copy [k] then
table.insert (wrong_size, k)
end -- wrong size
-- remove from both tables - this file exists in both places
original [k] = nil
copy [k] = nil
else
-- found in original list but not in the copy
table.insert (not_in_copy, k)
end
end -- for loop
-- any left over were in the copy but weren't in the original
for k, v in pairs (copy) do
table.insert (not_in_original, k)
end -- for loop
print "\n\nScanning done.\n\n"
print ("Original file count =", original_count)
print ("Original folder count =", original_folders)
print ("Bytes in original files =", original_size)
print ("Copy file count =", copy_count)
print ("Copy folder count =", copy_folders)
print ("Bytes in copies =", copy_size)
print ("\nDifference in file count =", original_count - copy_count)
print ("\nDifference in folder count =", original_folders - copy_folders)
print ("\nDifference in file sizes =", original_size - copy_size, "(bytes)")
print "\n\nSorting ...\n\n"
-- get in order to make scanning easier
table.sort (not_in_copy)
table.sort (not_in_original)
table.sort (wrong_size)
-- file for the results
local f = io.output (RESULTS_FILE)
f:write ("Analysis of original directory: ", ORIGINAL_ROOT, ORIGINAL_PATH, "\n")
f:write ("Original directory had ",
original_count, " files, ",
original_folders, " folders, ",
original_size, " bytes.\n\n")
f:write ("Compared to copy directory: ", COPY_ROOT, COPY_PATH, "\n")
f:write ("Copy directory had ",
copy_count, " files, ",
copy_folders, " folders, ",
copy_size, " bytes.\n")
print "\n\nAnalyzing...\n\n"
show_table (not_in_copy , "Files not in the copy (" .. COPY_ROOT .. COPY_PATH .. "):", f)
show_table (not_in_original , "Files not in the original (" .. ORIGINAL_ROOT .. ORIGINAL_PATH .. "):", f)
show_table (wrong_size , "Files which are different sizes:", f)
f:close () -- close that file now
print ("Done. Results in file:", RESULTS_FILE)
local tend = os.time ()
print ("Time taken for scan = " .. os.difftime (tend , tstart) .. " second(s).")
I saved this as dirscan.lua. (To use it, just type: lua dirscan.lua)
There are a few constants in upper case at the start of this file. These control what directories are scanned. The "root" ones (ORIGINAL_ROOT and COPY_ROOT) are intended to be the parts of the file system that will be different (eg. x:/somedir/somefile and y:/somedir/somefile). In this case the directory and file names are the same, but the x: and y: indicate we are looking at different drives.
The next part (ORIGINAL_PATH) is the directory to start in (eg. "documents", "music", "photos" etc.).
Finally RESULTS_FILE is the name of the file to write results to. A file is used in case the output is so lengthy it scrolls off the screen and disappears.
What the utility does is first scan ORIGINAL_ROOT/ORIGINAL_PATH and build a list of every file in it by recursing when it hits a subdirectory. This is stored in a table (original).
Then it scans COPY_ROOT/COPY_PATH and builds a second table (copy). During the scan it counts files, folders and file sizes.
Once both scans are finished it goes through the table of original files (by name) and checks that each one is in the "copy" table. It also checks the file size is the same.
The names of files that are not present in the copy are saved in another table (ready for sorting later on). Each matching file is deleted from both tables, ready for a check on which files are present in the copy but not the original.
Then a second scan is done of the copy table to see if some files are in the copy directory tree but not the original.
Finally a report is done, showing which files are only on one side or the other, or are the wrong sizes. The reports are sorted into alphabetic order, to make it obvious if a whole lot of files from a single directory are missing.
I found this a quick way of verifying a copy had been done without missing anything, or in the case of an error message, to work out where to start copying from.
|
- Nick Gammon
www.gammon.com.au, www.mushclient.com | Top |
|