Posted by
| Nick Gammon
Australia (23,122 posts) Bio
Forum Administrator |
Message
| Introduction
I've been finding recently that finding a particular file amongst the hundreds (or thousands) of source files, plugins, world files, Lua files, and so on, can be quite tedious.
There are times when I want to find something like a mapper but can't remember its exact name. Or if I can remember the name, can't remember where I put the file.
Some editors (such as Crimson Editor) have a "find in files" capability, but this can tend to return a lot of results (try searching for the word "define" in C source). And even if the results are reasonably relevant you have to open one file after another to find the one you really want. Plus you can't do things like "I want two words which are fairly close together".
Hence was born this plugin, the "source scanner".
Installation
Grab the latest copy from here:
https://github.com/nickgammon/plugins/blob/master/Source_scanner.xml
(To download, right-click on the Raw button on the GitHub page, and save the file Source_scanner.xml to your plugins directory.
You may need to install the windows_utils.dll file available from here:
http://www.gammon.com.au/files/mushclient/lua5.1_extras/windows_utils.zip
That is used to open the text editor and bring its window to the front.
What it does
The source (file) scanner has two modes of operation.
- Scan a directory tree, indexing all files matching certain file types (eg. C files, XML files, Lua files) based on the file suffix. This takes a few seconds.
- Query the resulting database for a boolean match. This is pretty fast.
The scanning process builds an SQLite3 database using the FTS (Full Text Search) type of table. This is the same method used by places like Google which let you find one or more words amongst many files.
As an example, first I index my Plugins folder (by typing "index"):
I get a confirmation:
Loaded 38 files in 0.1 seconds.
I try searching for "setvariable" ...
This illustrates that files with setvariable in it (not case-sensitive) are shown (by name) and also a snippet highlighting the searched-for word.
The snippet is very handy because you can quickly see which file you really want to edit. If you click on the file name (in blue) it opens the appropriate file in your desired editor (in my case, Crimson Editor).
Boolean searches
The full power comes from being able to specify boolean operations (eg. AND and OR).
Examples (from the help):
cat AND dog --> both words
cat dog --> both words, the "AND" is implied
fish OR bicycle --> one or the other
cat NOT food --> one word but not the other
bite NEAR me --> one near the other (within 10 words)
disk NEAR/3 drive --> one within 3 words of the other
"trouble brewing" --> exact phrase
chip* --> prefix query, matches chip, chips, chipping etc.
fish NOT (bacon OR eggs) --> brackets can be used to clarify groupings
The words AND / OR / NEAR / NOT must be in upper case or they just match those words literally.
Note: words with underscores (eg. BUFFER_LENGTH) should be quoted because they are treated as two words.
find name <wildcard> --> filter on file name, not contents
Here is another example, showing looking through some Arduino source files for the word "spi" near the word "transfer" and also with the word "begin" in the file:
f spi NEAR transfer begin
You can see that all the words we are looking for are highlighted.
Configuration
Near the start of the plugin are various configuration options. For example, you can search for files with two or more words in them, or one word but not another one. You can also just search for file names, if you know the name, but can't remember what directory you put it into.
You probably want to change this option:
-- file types, separate by spaces, commas, semicolons, whatever.
-- We assume suffixes are alphanumeric
SUFFIXES = "cpp,c,h,xml,lua"
That controls what file types are included in the database. You might add "html" for example, for web pages. Or "mcl" for "MUSHclient worlds".
Another useful option to change is the number of tokens shown in a snippet. For example, changing from the default of 7 to 20, like this:
-- number of tokens to display around the snippet
SNIPPET_SIZE = -20
Now the same search as before shows a lot more detail around the chosen words:
That has its good and bad points. Good to see more detail, bad because it takes more room in the output window.
If you aren't using Crimson Editor, you could change these lines:
-- viewer program
TEXT_VIEWER = "C:\Program Files\Crimson Editor\cedt.exe"
EDITOR_WINDOW_NAME = "Crimson Editor"
to:
TEXT_VIEWER = "C:\Windows\notepad.exe"
EDITOR_WINDOW_NAME = "Notepad"
Or to some other text editor of your choice.
Indexing
Indexing is done by typing the alias "index" which brings up a directory picker (illustrated above). Navigate to the "top" folder you want to index and click OK. If you look at the status bar of MUSHclient you will see each file name as it is indexed. This should only take a few seconds.
If you index, any existing data is discarded. This lets you re-index for searching a totally different folder, or take into account changes you might have made to your source.
Search for name
If your first search word is "name" then it looks in the file names rather than the file contents, eg.
If you happen to want to search for files with "name" in them you could always quote "name".
How word searching works
The FTS algorithm used by SQLite3 indexes by words (not partial words). However the tokenization breaks words apart at punctuation, so that, for example BUFFER_SIZE is really considered two words, BUFFER and SIZE.
You can work around that somewhat by quoting words like BUFFER_SIZE (eg. "BUFFER_SIZE").
[EDIT] Found a way of avoiding that problem, see next post.
You can also search for word prefixes, that is SETVAR* would match SETVARIABLE. However you can't find the middle of a word (eg. *VAR* won't work as expected).
File types
This (fairly simple) plugin is designed for text files, not Word files, Powerpoint presentations, PDF files and so on. It just reads the files in as pure text, so would be most suited to .C, .CPP, .H, .XML, .LUA, .TXT and similar file types.
The MUSHclient world files are just straight text, so you could index those if you wanted to (.MCL files).
|
- Nick Gammon
www.gammon.com.au, www.mushclient.com | Top |
|