Search FAQ

Gammon Forum

Notice: Any messages purporting to come from this site telling you that your password has expired, or that you need to verify your details, confirm your email, resolve issues, making threats, or asking for money, are spam. We do not email users with any such messages. If you have lost your password you can obtain a new one by using the password reset link.
 Entire forum ➜ MUSHclient ➜ Suggestions ➜ Spell checker improvements?

Spell checker improvements?

Posting of new messages is disabled at present.

Refresh page


Pages: 1  2  3 

Posted by David Haley   USA  (3,881 posts)  Bio
Date Reply #30 on Wed 11 Oct 2006 12:16 AM (UTC)
Message
Quote:
One thing that bugs the heck out of me is that most won't let you specify, in the cases of completely new worlds, what type of word it is. For example, you might want it to be smart enough to tell that 'ing' is a valid addon for 'frack', but not 'smurf' and more to the point, that 'smurf' is 's', not 'es' when plural, not to mention the simple fact that it is something that "should" be allowed to be plural. This always has bugged me about user dictionaries.
At this point you are no longer talking about a spell-checker, but about a linguistics model for the language. That's a whole different beast. Yes, it's annoying to have to add both the singular and plural forms of an unusual word, but I don't want the spell-checker playing guessing games about plurals, either. And if it's going to ask me, for every word I add, to define the word's linguistic properties, e.g. its plural form, that would probably be enough of a bother for me to want to just add the plural manually when I need to.




Nick, out of curiosity, are you using the tries in this new version? Or straight Lua tables?

David Haley aka Ksilyan
Head Programmer,
Legends of the Darkstone

http://david.the-haleys.org
Top

Posted by Nick Gammon   Australia  (23,122 posts)  Bio   Forum Administrator
Date Reply #31 on Wed 11 Oct 2006 01:21 AM (UTC)
Message
I am using straight tables, because of my timings that showed, if I indexed by sound (thus getting thousands of tries) that there was no real space saving.

- Nick Gammon

www.gammon.com.au, www.mushclient.com
Top

Posted by Nick Gammon   Australia  (23,122 posts)  Bio   Forum Administrator
Date Reply #32 on Wed 11 Oct 2006 01:24 AM (UTC)
Message
Thesaurus

I have an experimental thesaurus going now. I found a open-source thesaurus file, and by reading that into a large Lua table, can look up synonyms, eg. for "garbage":


1	bilge
2	bilgewater
3	bones
4	carrion
5	chaff
6	crap
7	culm
8	deadwood
9	debris
10	detritus
11	dishwater
12	ditchwater
13	draff
14	dregs
15	dross
16	dust
17	filings
18	filth
19	gash
20	hogwash
21	husks
22	junk
23	kelter
24	leavings
25	lees
26	litter
27	muck
28	offal
29	offscourings
30	orts
31	parings
32	potsherds
33	rags
34	raspings
35	refuse
36	riffraff
37	rubbish
38	rubble
39	scourings
40	scrap iron
41	scraps
42	scum
43	scurf
44	sewage
45	sewerage
46	shards
47	shavings
48	slack
49	slag
50	slop
51	slops
52	slough
53	stubble
54	sweepings
55	swill
56	tares
57	trash
58	wastage
59	waste
60	waste matter
61	wastepaper
62	weeds


The thesaurus took 4 seconds to load, not too bad I suppose if you needed it badly enough.

Of course, for something like that, loading the whole thing into a database would probably be sensible, as it can just sit there until you need to look something up. Perhaps the same could be said for the spellchecker too, but that would get worked more heavily, it might be slow.

- Nick Gammon

www.gammon.com.au, www.mushclient.com
Top

Posted by Zeno   USA  (2,871 posts)  Bio
Date Reply #33 on Wed 11 Oct 2006 02:08 AM (UTC)
Message
The new spellchecker acts pretty odd. For example, the old one would see "testa" and suggest "test" first which is exactly what I wanted.

The new one takes "testa" and suggests "Tuesday" first. "test" isn't even on the list, you have to scroll down to see it.

Zeno McDohl,
Owner of Bleached InuYasha Galaxy
http://www.biyg.org
Top

Posted by Nick Gammon   Australia  (23,122 posts)  Bio   Forum Administrator
Date Reply #34 on Wed 11 Oct 2006 02:41 AM (UTC)
Message
Yes, I need some sort of algorithm for "edit distance". Given a list of suggested words, it would be helpful to order them into the most likely ones first.

- Nick Gammon

www.gammon.com.au, www.mushclient.com
Top

Posted by Nick Gammon   Australia  (23,122 posts)  Bio   Forum Administrator
Date Reply #35 on Wed 11 Oct 2006 04:37 AM (UTC)
Message
I found an "edit distance" algorithm which will be incorported in version 3.82.

Now if you enter "testa" it suggests, in this order:


testy
test
taste
tasty
theist
teased
testier
doest
twist
dust
toasty
dusty
toast
Tuesday


I also used the edit distance to omit words that were more than 4 away from the original, although you could change that number. That omits some of the more bizarre suggestions.

- Nick Gammon

www.gammon.com.au, www.mushclient.com
Top

Posted by David Haley   USA  (3,881 posts)  Bio
Date Reply #36 on Wed 11 Oct 2006 07:17 AM (UTC)
Message
How does the new distance algorithm determine proximity? Looks like you're not using what is called 'edit distance', which is aka. Levenstein distance, I think. (It's where you add one point for a replacement, deletion or addition of letter.) That's the basic algorithm that the trie uses.

Quote:
I am using straight tables, because of my timings that showed, if I indexed by sound (thus getting thousands of tries) that there was no real space saving.
Oh. The trie wasn't really supposed to be used by the thousands. Rather the idea was to have a single trie that contains a whole bunch of words. Admittedly this doesn't help you, unless the trie recursive walks incorporate whatever word distance algorithm you are using.

Also the original suggestion for the trie was for the auto-complete feature, for which all you really want is a compact way of representing a lot of words, which tries are (usually) pretty good at.

David Haley aka Ksilyan
Head Programmer,
Legends of the Darkstone

http://david.the-haleys.org
Top

Posted by Nick Gammon   Australia  (23,122 posts)  Bio   Forum Administrator
Date Reply #37 on Wed 11 Oct 2006 07:28 AM (UTC)
Message
A slight tweaking makes it put words of the same edit distance in alphabetic order, giving this result for "testa":


test
testy
taste
tasty
doest
dust
dusty
teased
testier
theist
toast
toasty
Tuesday
twist


- Nick Gammon

www.gammon.com.au, www.mushclient.com
Top

Posted by Nick Gammon   Australia  (23,122 posts)  Bio   Forum Administrator
Date Reply #38 on Wed 11 Oct 2006 07:30 AM (UTC)
Message
Quote:

How does the new distance algorithm determine proximity? Looks like you're not using what is called 'edit distance', which is aka. Levenstein distance, I think.


Glad you asked. :)

I am using the Levenshtein Distance Algorithm.

See: http://www.merriampark.com/ldcpp.htm

Why do you not think I am using it?

- Nick Gammon

www.gammon.com.au, www.mushclient.com
Top

Posted by David Haley   USA  (3,881 posts)  Bio
Date Reply #39 on Wed 11 Oct 2006 08:55 AM (UTC)
Message
Oh; well, you were getting results that looked pretty good, and in my experience, Levenshtein doesn't always give good results. :-)

As a note, the trie implements Levenshtein edit distance recursively, by using an intelligent walk of the tree, and so should be fairly fast. I'm not sure how you're getting the list of words, but you might want to compare speeds with the trie.

David Haley aka Ksilyan
Head Programmer,
Legends of the Darkstone

http://david.the-haleys.org
Top

Posted by Nick Gammon   Australia  (23,122 posts)  Bio   Forum Administrator
Date Reply #40 on Wed 11 Oct 2006 10:00 AM (UTC)
Message
The first pass pulls out words that match the metaphone, the second pass effectively discards ones too far away in edit distance, then they are displayed in edit distance order.

- Nick Gammon

www.gammon.com.au, www.mushclient.com
Top

Posted by David Haley   USA  (3,881 posts)  Bio
Date Reply #41 on Wed 11 Oct 2006 03:48 PM (UTC)
Message
Oh, that explains it, then. :) I was pretty sure that your results were too good to be just normal edit distance.

David Haley aka Ksilyan
Head Programmer,
Legends of the Darkstone

http://david.the-haleys.org
Top

Posted by WizardsEye   (24 posts)  Bio
Date Reply #42 on Sun 28 Mar 2010 11:54 PM (UTC)
Message
Sorry to have questions on this old subject, but was this ever implemented? I'm horrid at spelling and if it could do some automatic changing for me, That would be GREAT!
Top

Posted by Nick Gammon   Australia  (23,122 posts)  Bio   Forum Administrator
Date Reply #43 on Mon 29 Mar 2010 03:39 AM (UTC)
Message
The current spellchecker in the client uses this algorithm.

- Nick Gammon

www.gammon.com.au, www.mushclient.com
Top

The dates and times for posts above are shown in Universal Co-ordinated Time (UTC).

To show them in your local time you can join the forum, and then set the 'time correction' field in your profile to the number of hours difference between your location and UTC time.


109,990 views.

This is page 3, subject is 3 pages long:  [Previous page]  1  2  3 

Posting of new messages is disabled at present.

Refresh page

Go to topic:           Search the forum


[Go to top] top

Information and images on this site are licensed under the Creative Commons Attribution 3.0 Australia License unless stated otherwise.