Register forum user name Search FAQ

Gammon Forum

Notice: Any messages purporting to come from this site telling you that your password has expired, or that you need to verify your details, confirm your email, resolve issues, making threats, or asking for money, are spam. We do not email users with any such messages. If you have lost your password you can obtain a new one by using the password reset link.

Due to spam on this forum, all posts now need moderator approval.

 Entire forum ➜ SMAUG ➜ SMAUG coding ➜ String Hashing and load_helps

String Hashing and load_helps

It is now over 60 days since the last post. This thread is closed.     Refresh page


Posted by Boborak   USA  (228 posts)  Bio
Date Fri 24 Jan 2003 04:07 AM (UTC)
Message
While skimming through db.c I noticed that the helpfile keywords and the text itself was being loaded into the hash tables. My understanding of hashtables is that they are only truly benefitical if the string is allocated frequently. Like a mob's name. Is there a particular reason why the help file keywords and the text is loaded into the hash tables? Afterall, what is likelyhood that those particular strings would be allocated again?

As a test I used the 'memory hash' command to get these stats:

(w/ help hashing)
Hash strings allocated: 11246 Total links : 56616
String bytes allocated: 1475140 Bytes saved : 802275
Unique (wasted) links : 7388 Hi-Link count: 27932

(wo/ hashing)
Hash strings allocated: 9823 Total links : 55276
String bytes allocated: 1084862 Bytes saved : 813839
Unique (wasted) links : 5965 Hi-Link count: 28006

I not only decreased my unique links but I increased my saved bytes. Anyone have a good reason why this shouldn't be done?

All I changed was this bit of code in load_helps:

pHelp->level = fread_number( fp );
pHelp->keyword = fread_string_nohash( fp );
if ( pHelp->keyword[0] == '$' )
break;
pHelp->text = fread_string_nohash( fp );

Changing both 'fread_string_nohash' from their original 'fread_string'
Top

Posted by Meerclar   USA  (733 posts)  Bio
Date Reply #1 on Fri 24 Jan 2003 05:18 AM (UTC)
Message
The benefit with helpfiles would be in processor load and response times. Usually helpfiles are meant to be a near instant response, which they will be if theyre already loaded into the hash tables. Otherwise there is a potentially sizeable delay in the lookup responses.

Meerclar - Lord of Cats
Coder, Builder, and Tormenter of Mortals
Stormbringer: Rebirth
storm-bringer.org:4500
www.storm-bringer.org
Top

Posted by Boborak   USA  (228 posts)  Bio
Date Reply #2 on Fri 24 Jan 2003 05:49 AM (UTC)
Message
Explain "loaded". Regardless of if they are being loaded into the hashtables or not, they are not being re-read from disk. The only thing the string hashtables do is prevent duplicates, and potentially speed the memory allocation process. This should have little to NO affect on the cpu at all. In fact it could possibly improve it, considering there are less unique links to weed through on each hashtable allocation. If a particular string only needs to be allocated once (for example a players 'normally' unique password) then hashing is a waste due to the small but no less significant overhead of the hashing struct. However something like the name of a clan, which is likely allocated many times over in all sorts of places benefit from string hashing due to the fact that the memory space is already allocated. Correct me if I'm wrong someone, but it seems pretty clear to me that's how things work.
Top

Posted by Samson   USA  (683 posts)  Bio
Date Reply #3 on Fri 24 Jan 2003 12:55 PM (UTC)
Message
Well, I grew curious upon seeing this and decided to see what kind of difference it made in overal allocation. So here's my results on my CVS port:

Helpfiles hashed:

Hash statistics:
Hash strings allocated: 9665 Total links : 36910
String bytes allocated: 1511238 Bytes saved : 300044
Unique (wasted) links : 7586 Hi-Link count: 10294

samson 19563 0.6 1.1 11516 8712 pts/3 RN 04:50 0:01 ../src/afkmud 9500


Helpfiles NOT hashed:

Hash statistics:
Hash strings allocated: 7308 Total links : 34484
String bytes allocated: 841334 Bytes saved : 316439
Unique (wasted) links : 5270 Hi-Link count: 10290

samson 19535 0.6 1.1 11500 8740 pts/3 RN 04:53 0:01 ../src/afkmud 9500

The last line in each is the system usage each boot up generated. As you can see, no real difference in that, but the hash table usage is significantly improved by not hashing your helps.

As for lookup times, I wasn't able to see a difference in that either. Ram is ram, you'll hardly be able to perceive a nanosecond level difference :)
Top

Posted by Boborak   USA  (228 posts)  Bio
Date Reply #4 on Fri 24 Jan 2003 05:55 PM (UTC)
Message
After this, I looked around at other things that may be similar. The only other thing that I've thought of changing are player bios/descs. They are usually quite unique yet are being loaded into the hashtables as well. Though I think you could probably live with that overhead, which works out to about 16 additional bytes per connected player, for hashing.
Top

Posted by Meerclar   USA  (733 posts)  Bio
Date Reply #5 on Fri 24 Jan 2003 07:24 PM (UTC)
Message
Something to remember with the hashing tables.... the original code was written to support 1000s of users on a *nix platform for weeks at a time with ease. With that much userload, having the helps referenced in the hashing tables actually makes sense, but for the 20-30 most muds seem to cap at for userload anymore it doesnt make a whole lot of difference.

Meerclar - Lord of Cats
Coder, Builder, and Tormenter of Mortals
Stormbringer: Rebirth
storm-bringer.org:4500
www.storm-bringer.org
Top

Posted by Nick Gammon   Australia  (23,140 posts)  Bio   Forum Administrator
Date Reply #6 on Fri 24 Jan 2003 09:25 PM (UTC)
Message
Quote:

While skimming through db.c I noticed that the helpfile keywords and the text itself was being loaded into the hash tables. My understanding of hashtables is that they are only truly benefitical if the string is allocated frequently. Like a mob's name. Is there a particular reason why the help file keywords and the text is loaded into the hash tables? Afterall, what is likelyhood that those particular strings would be allocated again?

...

Explain "loaded". Regardless of if they are being loaded into the hashtables or not, they are not being re-read from disk. The only thing the string hashtables do is prevent duplicates, and potentially speed the memory allocation process. This should have little to NO affect on the cpu at all.


You have the wrong end of the stick here. Hash tables are not intended to save memory or speed up the allocation process, and preventing duplicates is merely a side-effect.

The point of hashing, where you have many of something (eg. helps) is to speed up subsequent finding of the hashed data. For example the helps consist of over 1,000 topics, without hashing you would require a linear seach, requiring an average of 500 string compares to find a particular one.

However with hashed helps you would hash the word you are looking for (an arithmetic calculation that should take few CPU cycles), then index straight into the hash table to find the target help.

In other words, you are trading off memory usage for extra speed. Remember these programs were written in the days when CPUs were fairly slow.

I would not expect them to be re-read from disk - the purpose of hashing is not to speed the allocation process, but the search process.

- Nick Gammon

www.gammon.com.au, www.mushclient.com
Top

Posted by Boborak   USA  (228 posts)  Bio
Date Reply #7 on Fri 24 Jan 2003 11:12 PM (UTC)
Message
I understand that. I understand the use of modulus to index the array of links. I understand that 100 links is far quicker to whip through, then 100,000 links. What I don't understand is how exactly is smaug utilizing this for individual strings and ultimately help files? I can clearly see how the hash tables are utilized for mobs, rooms, commands and objects and the like, and I clearly understand that not hashing those items would not be smart. But take this loop for example(act_info.c):

for ( pHelp = first_help; pHelp; pHelp = pHelp->next )

This loop.. regardless of string hashing or not, it still loops through all help files and still compares the arguments with each keyword, using str_cmp (via is_name). So where's the indexing coming in? It's not.

I realize I may have made my explanation of 'hash tables' a bit vauge. I'm specifically talking about the hashing of individual strings. Specifically the use of the functions in 'hashstr.c'. How is this being utilized to create a 'faster' search? If that was so, every comparison of 'if string = anotherstring' would have to utilize a modulus value of sorts. My understanding of the hashtable utilized for strings gives no benefits beyond that of memory saving and prevention of duplicates. Take a look for yourself.
Top

Posted by Boborak   USA  (228 posts)  Bio
Date Reply #8 on Fri 24 Jan 2003 11:19 PM (UTC)
Message
I should also point out my original post was in regards to string hashing only, and the fact that the keywords and such have no need to be in a hashtable unless it was utilized differently.
Top

Posted by Nick Gammon   Australia  (23,140 posts)  Bio   Forum Administrator
Date Reply #9 on Sat 25 Jan 2003 01:08 AM (UTC)
Message
(laughs) - Yes, I see it now. :)

I made the assumption - and we all know how you shouldn't make assumptions - that if they hashed the helps they would have done a hashed lookup.

I withdraw my comments in regards to the SMAUG help, although my general explanation was good for why you would, in general, hash the keys for a large list.

Seems like the SMAUG implementors read about how useful hash tables were, but didn't quite understand how to use them.

I also agree there isn't much point to hashing the help text itself, can't see how useful that is.

- Nick Gammon

www.gammon.com.au, www.mushclient.com
Top

Posted by Boborak   USA  (228 posts)  Bio
Date Reply #10 on Sat 25 Jan 2003 01:13 AM (UTC)
Message
Heh. I kinda thought that's what had happened. I knew I wasn't *THAT* crazy ;-) Anywho, I've actually decided that I may go ahead and hash them properly and see what kind of difference it may make. Although it may not really be perceivable. It may actually do what was originally intended.
Top

Posted by Boborak   USA  (228 posts)  Bio
Date Reply #11 on Sat 25 Jan 2003 02:30 AM (UTC)
Message
Also, if you notice, they ARE using the hashed strings, but they are using them to see if another link to that pointer should be made, or if a new one should be created. In regards to the functions within hashstr.c I don't think they ever intended that particular set of functions to speed 'searching' beyond it's own uses.
Top

Posted by Nick Gammon   Australia  (23,140 posts)  Bio   Forum Administrator
Date Reply #12 on Sat 25 Jan 2003 10:32 PM (UTC)
Message
OK, it is a form of reference counting, I suppose there is some sense to that.

- Nick Gammon

www.gammon.com.au, www.mushclient.com
Top

The dates and times for posts above are shown in Universal Co-ordinated Time (UTC).

To show them in your local time you can join the forum, and then set the 'time correction' field in your profile to the number of hours difference between your location and UTC time.


25,781 views.

It is now over 60 days since the last post. This thread is closed.     Refresh page

Go to topic:           Search the forum


[Go to top] top

Information and images on this site are licensed under the Creative Commons Attribution 3.0 Australia License unless stated otherwise.