Notice: Any messages purporting to come from this site telling you that your password has expired, or that you need to verify your details, confirm your email, resolve issues, making threats, or asking for money, are
spam. We do not email users with any such messages. If you have lost your password you can obtain a new one by using the
password reset link.
Entire forum
➜ MUSHclient
➜ Suggestions
➜ Spell checker improvements?
Spell checker improvements?
|
It is now over 60 days since the last post. This thread is closed.
Refresh page
Pages: 1
2
3
Posted by
| David Haley
USA (3,881 posts) Bio
|
Date
| Reply #30 on Wed 11 Oct 2006 12:16 AM (UTC) |
Message
|
Quote: One thing that bugs the heck out of me is that most won't let you specify, in the cases of completely new worlds, what type of word it is. For example, you might want it to be smart enough to tell that 'ing' is a valid addon for 'frack', but not 'smurf' and more to the point, that 'smurf' is 's', not 'es' when plural, not to mention the simple fact that it is something that "should" be allowed to be plural. This always has bugged me about user dictionaries. At this point you are no longer talking about a spell-checker, but about a linguistics model for the language. That's a whole different beast. Yes, it's annoying to have to add both the singular and plural forms of an unusual word, but I don't want the spell-checker playing guessing games about plurals, either. And if it's going to ask me, for every word I add, to define the word's linguistic properties, e.g. its plural form, that would probably be enough of a bother for me to want to just add the plural manually when I need to.
Nick, out of curiosity, are you using the tries in this new version? Or straight Lua tables? |
David Haley aka Ksilyan
Head Programmer,
Legends of the Darkstone
http://david.the-haleys.org | Top |
|
Posted by
| Nick Gammon
Australia (23,102 posts) Bio
Forum Administrator |
Date
| Reply #31 on Wed 11 Oct 2006 01:21 AM (UTC) |
Message
| I am using straight tables, because of my timings that showed, if I indexed by sound (thus getting thousands of tries) that there was no real space saving. |
- Nick Gammon
www.gammon.com.au, www.mushclient.com | Top |
|
Posted by
| Nick Gammon
Australia (23,102 posts) Bio
Forum Administrator |
Date
| Reply #32 on Wed 11 Oct 2006 01:24 AM (UTC) |
Message
| Thesaurus
I have an experimental thesaurus going now. I found a open-source thesaurus file, and by reading that into a large Lua table, can look up synonyms, eg. for "garbage":
1 bilge
2 bilgewater
3 bones
4 carrion
5 chaff
6 crap
7 culm
8 deadwood
9 debris
10 detritus
11 dishwater
12 ditchwater
13 draff
14 dregs
15 dross
16 dust
17 filings
18 filth
19 gash
20 hogwash
21 husks
22 junk
23 kelter
24 leavings
25 lees
26 litter
27 muck
28 offal
29 offscourings
30 orts
31 parings
32 potsherds
33 rags
34 raspings
35 refuse
36 riffraff
37 rubbish
38 rubble
39 scourings
40 scrap iron
41 scraps
42 scum
43 scurf
44 sewage
45 sewerage
46 shards
47 shavings
48 slack
49 slag
50 slop
51 slops
52 slough
53 stubble
54 sweepings
55 swill
56 tares
57 trash
58 wastage
59 waste
60 waste matter
61 wastepaper
62 weeds
The thesaurus took 4 seconds to load, not too bad I suppose if you needed it badly enough.
Of course, for something like that, loading the whole thing into a database would probably be sensible, as it can just sit there until you need to look something up. Perhaps the same could be said for the spellchecker too, but that would get worked more heavily, it might be slow. |
- Nick Gammon
www.gammon.com.au, www.mushclient.com | Top |
|
Posted by
| Zeno
USA (2,871 posts) Bio
|
Date
| Reply #33 on Wed 11 Oct 2006 02:08 AM (UTC) |
Message
| The new spellchecker acts pretty odd. For example, the old one would see "testa" and suggest "test" first which is exactly what I wanted.
The new one takes "testa" and suggests "Tuesday" first. "test" isn't even on the list, you have to scroll down to see it. |
Zeno McDohl,
Owner of Bleached InuYasha Galaxy
http://www.biyg.org | Top |
|
Posted by
| Nick Gammon
Australia (23,102 posts) Bio
Forum Administrator |
Date
| Reply #34 on Wed 11 Oct 2006 02:41 AM (UTC) |
Message
| Yes, I need some sort of algorithm for "edit distance". Given a list of suggested words, it would be helpful to order them into the most likely ones first. |
- Nick Gammon
www.gammon.com.au, www.mushclient.com | Top |
|
Posted by
| Nick Gammon
Australia (23,102 posts) Bio
Forum Administrator |
Date
| Reply #35 on Wed 11 Oct 2006 04:37 AM (UTC) |
Message
| I found an "edit distance" algorithm which will be incorported in version 3.82.
Now if you enter "testa" it suggests, in this order:
testy
test
taste
tasty
theist
teased
testier
doest
twist
dust
toasty
dusty
toast
Tuesday
I also used the edit distance to omit words that were more than 4 away from the original, although you could change that number. That omits some of the more bizarre suggestions. |
- Nick Gammon
www.gammon.com.au, www.mushclient.com | Top |
|
Posted by
| David Haley
USA (3,881 posts) Bio
|
Date
| Reply #36 on Wed 11 Oct 2006 07:17 AM (UTC) |
Message
| How does the new distance algorithm determine proximity? Looks like you're not using what is called 'edit distance', which is aka. Levenstein distance, I think. (It's where you add one point for a replacement, deletion or addition of letter.) That's the basic algorithm that the trie uses.
Quote: I am using straight tables, because of my timings that showed, if I indexed by sound (thus getting thousands of tries) that there was no real space saving. Oh. The trie wasn't really supposed to be used by the thousands. Rather the idea was to have a single trie that contains a whole bunch of words. Admittedly this doesn't help you, unless the trie recursive walks incorporate whatever word distance algorithm you are using.
Also the original suggestion for the trie was for the auto-complete feature, for which all you really want is a compact way of representing a lot of words, which tries are (usually) pretty good at. |
David Haley aka Ksilyan
Head Programmer,
Legends of the Darkstone
http://david.the-haleys.org | Top |
|
Posted by
| Nick Gammon
Australia (23,102 posts) Bio
Forum Administrator |
Date
| Reply #37 on Wed 11 Oct 2006 07:28 AM (UTC) |
Message
| A slight tweaking makes it put words of the same edit distance in alphabetic order, giving this result for "testa":
test
testy
taste
tasty
doest
dust
dusty
teased
testier
theist
toast
toasty
Tuesday
twist
|
- Nick Gammon
www.gammon.com.au, www.mushclient.com | Top |
|
Posted by
| Nick Gammon
Australia (23,102 posts) Bio
Forum Administrator |
Date
| Reply #38 on Wed 11 Oct 2006 07:30 AM (UTC) |
Message
|
Quote:
How does the new distance algorithm determine proximity? Looks like you're not using what is called 'edit distance', which is aka. Levenstein distance, I think.
Glad you asked. :)
I am using the Levenshtein Distance Algorithm.
See: http://www.merriampark.com/ldcpp.htm
Why do you not think I am using it? |
- Nick Gammon
www.gammon.com.au, www.mushclient.com | Top |
|
Posted by
| David Haley
USA (3,881 posts) Bio
|
Date
| Reply #39 on Wed 11 Oct 2006 08:55 AM (UTC) |
Message
| Oh; well, you were getting results that looked pretty good, and in my experience, Levenshtein doesn't always give good results. :-)
As a note, the trie implements Levenshtein edit distance recursively, by using an intelligent walk of the tree, and so should be fairly fast. I'm not sure how you're getting the list of words, but you might want to compare speeds with the trie. |
David Haley aka Ksilyan
Head Programmer,
Legends of the Darkstone
http://david.the-haleys.org | Top |
|
Posted by
| Nick Gammon
Australia (23,102 posts) Bio
Forum Administrator |
Date
| Reply #40 on Wed 11 Oct 2006 10:00 AM (UTC) |
Message
| The first pass pulls out words that match the metaphone, the second pass effectively discards ones too far away in edit distance, then they are displayed in edit distance order. |
- Nick Gammon
www.gammon.com.au, www.mushclient.com | Top |
|
Posted by
| David Haley
USA (3,881 posts) Bio
|
Date
| Reply #41 on Wed 11 Oct 2006 03:48 PM (UTC) |
Message
| Oh, that explains it, then. :) I was pretty sure that your results were too good to be just normal edit distance. |
David Haley aka Ksilyan
Head Programmer,
Legends of the Darkstone
http://david.the-haleys.org | Top |
|
Posted by
| WizardsEye
(24 posts) Bio
|
Date
| Reply #42 on Sun 28 Mar 2010 11:54 PM (UTC) |
Message
| Sorry to have questions on this old subject, but was this ever implemented? I'm horrid at spelling and if it could do some automatic changing for me, That would be GREAT! | Top |
|
Posted by
| Nick Gammon
Australia (23,102 posts) Bio
Forum Administrator |
Date
| Reply #43 on Mon 29 Mar 2010 03:39 AM (UTC) |
Message
| The current spellchecker in the client uses this algorithm. |
- Nick Gammon
www.gammon.com.au, www.mushclient.com | Top |
|
The dates and times for posts above are shown in Universal Co-ordinated Time (UTC).
To show them in your local time you can join the forum, and then set the 'time correction' field in your profile to the number of hours difference between your location and UTC time.
108,414 views.
This is page 3, subject is 3 pages long:
1
2
3
It is now over 60 days since the last post. This thread is closed.
Refresh page
top