Notice: Any messages purporting to come from this site telling you that your password has expired, or that you need to "verify" your details, making threats, or asking for money, are
spam. We do not email users with any such messages. If you have lost your password you can obtain a new one by using the
password reset link.
Entire forum
➜ Programming
➜ General
➜ Self-managed hashed strings
Self-managed hashed strings
|
It is now over 60 days since the last post. This thread is closed.
Refresh page
Pages: 1 2
3
4
Posted by
| David Haley
USA (3,881 posts) Bio
|
Date
| Sun 13 Feb 2005 02:39 AM (UTC) |
Message
| I've seen a lot of trouble caused by hash strings in SMAUG. The main issue is that there is no type-safety between hashed strings and non-hashed strings, so if you mismatch create/dispose/stralloc/strfree, consequences can be disastrous.
A while ago, I wrote some C++ classes to fix this issue. Basically, it's just a small library to implement managed hashed strings. You assign a value to the string, and it automatically takes care of entering it into the hash table, or only incrementing reference count if it's already present.
While it's not quite presentable for public use at the moment, it would be very easy to make it so. Is anybody interested in this? It's in C++ so it wouldn't be useful to most SMAUG coders unless they feel like moving to C++ - a good thing to do even if you don't use this code - but I know there are at least a few people who do program in C++. Let me know if you're interested and I'll put in a nice little package. :) |
David Haley aka Ksilyan
Head Programmer,
Legends of the Darkstone
http://david.the-haleys.org | Top |
|
Posted by
| Greven
Canada (835 posts) Bio
|
Date
| Reply #1 on Sun 13 Feb 2005 02:52 AM (UTC) |
Message
| I'm fairly certain that I have a clean use of STRALLOC/str_dup, however I would love to have a copy to install. I EVENTUALLY plan to release my code, and if new coders are using it, it would be a great thing to have. Even some safety if I'm not payain attention, god knows thats a regular occurance :) |
Nobody ever expects the spanish inquisition!
darkwarriors.net:4848
http://darkwarriors.net | Top |
|
Posted by
| Zeno
USA (2,871 posts) Bio
|
Date
| Reply #2 on Sun 13 Feb 2005 03:49 AM (UTC) |
Message
| I'd love to use it, I plan on converting to C++ before we go beta. As you can tell, I've had some problems. ;) |
Zeno McDohl,
Owner of Bleached InuYasha Galaxy
http://www.biyg.org | Top |
|
Posted by
| David Haley
USA (3,881 posts) Bio
|
Date
| Reply #3 on Sun 13 Feb 2005 07:28 AM (UTC) |
Message
| Alrighty. :) I'll package it up and post it in a day or two. |
David Haley aka Ksilyan
Head Programmer,
Legends of the Darkstone
http://david.the-haleys.org | Top |
|
Posted by
| Nick Cash
USA (626 posts) Bio
|
Date
| Reply #4 on Sun 13 Feb 2005 07:51 AM (UTC) |
Message
| Just to add to these posts, I'd like to see it as well. While I don't plan to move my MUD over to C++, I do other things in C++. This could definitely come in handy :) |
~Nick Cash
http://www.nick-cash.com | Top |
|
Posted by
| Samson
USA (683 posts) Bio
|
Date
| Reply #5 on Sun 13 Feb 2005 04:29 PM (UTC) |
Message
| I too would be interested in seeing this as I am also in the process of converting to C++ and would love nothing more than to say goodbye to STRALLOC/str_dup :) | Top |
|
Posted by
| David Haley
USA (3,881 posts) Bio
|
Date
| Reply #6 on Mon 14 Feb 2005 07:01 AM (UTC) |
Message
| OK, I'm almost done reworking all of this. I wrote the code about two years ago and I've learned an awful lot since. I've reorganized a lot of this code, namely to separate the conceptual notion of a shared string from a hash-table implementation - the hash table is now just a subclass of the shared string manager. This way, you can use whatever implementation of the string manager you want, if you don't like the hash table for some reason. In principle, you could even use a linked-list implementation, but that'd be kind of silly...
I've also used a neat little trick with templates, so that you can create shared strings that use different managers but without having to set a string's manager - it sets itself automatically based on its type.
Basically, you do something like this:
StringManager * gSharedStrManager;
typedef SharedString<&gSharedStrManager> shared_str;
int main()
{
gSharedStrManager = new HashTable;
shared_str s = "hello";
shared_str s2 = "there";
shared_str s3 = "hello";
// at this point, there are only two entries in the string manager
return 0;
}
Of course, if you ever allocate a shared_str without having created its manager, you'll be up a creek without a paddle. :-)
I plan on having a preview (read: undoc'ed) version in a day or two and a documented version a day or so after that. |
David Haley aka Ksilyan
Head Programmer,
Legends of the Darkstone
http://david.the-haleys.org | Top |
|
Posted by
| David Haley
USA (3,881 posts) Bio
|
Date
| Reply #7 on Sat 19 Feb 2005 01:05 AM (UTC) |
Message
| I haven't forgotten about this - I've just been busy with midterms + deadline at work. I'll have it up shortly... |
David Haley aka Ksilyan
Head Programmer,
Legends of the Darkstone
http://david.the-haleys.org | Top |
|
Posted by
| Zeno
USA (2,871 posts) Bio
|
Date
| Reply #8 on Sat 19 Feb 2005 01:21 AM (UTC) |
Message
| No need to rush, I don't need it anytime soon. ;) |
Zeno McDohl,
Owner of Bleached InuYasha Galaxy
http://www.biyg.org | Top |
|
Posted by
| David Haley
USA (3,881 posts) Bio
|
Date
| Reply #9 on Wed 23 Feb 2005 06:55 PM (UTC) |
Message
| Here is that preview version I was talking about. I only have Visual Studio build files at the moment but it should be pretty easy to stick it into a Unix project.
http://david.the-haleys.org/tmp/shared-str-v0_9.zip
I will be uploading a more complete version with Unix makefiles, documentation etc. shortly. It'll also include a more complete testing package. In the mean time, comments, criticism or suggestions would be most appreciated. :) |
David Haley aka Ksilyan
Head Programmer,
Legends of the Darkstone
http://david.the-haleys.org | Top |
|
Posted by
| David Haley
USA (3,881 posts) Bio
|
Date
| Reply #10 on Sun 27 Feb 2005 11:17 PM (UTC) |
Message
| I am changing the license to a slightly modified BSD license. I will update the license in the 1.0 release, which will be when I finish the documentation. |
David Haley aka Ksilyan
Head Programmer,
Legends of the Darkstone
http://david.the-haleys.org | Top |
|
Posted by
| Raz
(32 posts) Bio
|
Date
| Reply #11 on Thu 03 Mar 2005 12:51 AM (UTC) |
Message
| Umm...
From sharedstr_manager.h
size_t refCount_; //!< How many times this string is shared
//Among other similar instances
This will break under a conforming compiler. size_t is in the namespace std under C++. You are not allowed to use it without the appropiate scoping.
virtual void dumpTable(std::ostringstream & os) const = 0; // for debugging: dump whole table to os.
Why? Why not just simply use a std::ostream? It still allows a std::ostringstream to be passed to it.
From sharedstr_hastable.h:
protected:
//...
inline size_t hash(const std::string & str) const;
Which will break if a subclass ever tries to use this function. Do not declare a function inline unless you intend to have the code readily available for all files.
And then I turn to your class in shardstr.hpp.
Essentially you have provided a very strange interface for the programmer. You force the programmer to retain the manager variables that he or she creates. Personally, I do not find this very appealing. A better solution would be to have a more class-based solution where each class has a internal static storage so your shared strings can instantate many instances of the class but all the instances would still reference the same shared strings. This is very similar to the STL allocator design. |
-Raz
C++ Wiki: http://danday.homelinux.org/dan/cppwiki/index.php | Top |
|
Posted by
| David Haley
USA (3,881 posts) Bio
|
Date
| Reply #12 on Thu 03 Mar 2005 04:49 AM (UTC) |
Message
| Thank you for your comments, Raz.
Quote: This will break under a conforming compiler. size_t is in the namespace std under C++. You are not allowed to use it without the appropiate scoping. size_t is not in the std namespace under c++! It is defined in the global namespace in std.io which is included via including the string header file.
Quote: Why? Why not just simply use a std::ostream? Because I wasn't thinking when I wrote that. :-)
Quote: Which will break if a subclass ever tries to use this function. Do not declare a function inline unless you intend to have the code readily available for all files. Umm... what?
For starters, there is no reason for a subclass to reimplement the hash function.
Secondly, what do you mean, it will break it?
Quote: Essentially you have provided a very strange interface for the programmer. You force the programmer to retain the manager variables that he or she creates. Personally, I do not find this very appealing. What if you want to keep track of the manager publicly to access its statistics? The whole point of this template argument was precisely to keep track of the manager variables - which, incidentally, you only have to store once in a typedef.
Quote: A better solution would be to have a more class-based solution where each class has a internal static storage so your shared strings can instantate many instances of the class but all the instances would still reference the same shared strings. This is very similar to the STL allocator design. What if you wanted to have different kinds of shared strings (using e.g. different hash functions), depending on the specific kind of strings you are sharing? |
David Haley aka Ksilyan
Head Programmer,
Legends of the Darkstone
http://david.the-haleys.org | Top |
|
Posted by
| Raz
(32 posts) Bio
|
Date
| Reply #13 on Fri 04 Mar 2005 12:30 AM (UTC) |
Message
|
Quote: size_t is not in the std namespace under c++!
Wrong. size_t is included within the std namespace.
Quote: It is defined in the global namespace in std.io which is included via including the string header file.
Well, no. It is not defined in cstdio. It is defined in other files, such as cstddef or cstring. Those files may be included by cstdio, but it is not portable.
Anyhow, I'm wrong for other reasons. It seems that C++ kept that the borrowed C types would be available in the global namespace as well as the std namespace. Something I didn't realize.
Quote: For starters, there is no reason for a subclass to reimplement the hash function.
I never made such a claim.
Quote: Secondly, what do you mean, it will break it?
Since you declared the hash function inline without providing the definition in a header file (or other suitable file), it would be impossible for potential subclasses to use the hash function. However, this is only a problem if you intended on that class to be subclasses (which I think I thought it was).
Quote: What if you want to keep track of the manager publicly to access its statistics? The whole point of this template argument was precisely to keep track of the manager variables - which, incidentally, you only have to store once in a typedef.
Your design isn't the only solution to that. You could easily do that with my allocator-like design. The internal static class could keep statisitics which the allocator can access when the programmer needs them.
Quote: What if you wanted to have different kinds of shared strings (using e.g. different hash functions), depending on the specific kind of strings you are sharing?
That's the beauty of subclassing: you're not limited to the number of children you make. Each hash function could easily have its own subclass. You could even make the design so generic that minimal typing would be necessary for each subclass. |
-Raz
C++ Wiki: http://danday.homelinux.org/dan/cppwiki/index.php | Top |
|
Posted by
| David Haley
USA (3,881 posts) Bio
|
Date
| Reply #14 on Fri 04 Mar 2005 01:33 AM (UTC) |
Message
|
Quote: Wrong. size_t is included within the std namespace. You were saying that it's in the std namespace and that a 'standards-compliant' compiler would fail. Well, it won't- it's in the global namespace.
Quote: It is not defined in cstdio It is for the VS header files. For the g++ header files, it is defined via cstddef which includes stddef.h.
Quote: The internal static class could keep statisitics which the allocator can access when the programmer needs them. It seems that you suggest that instead of dragging around a manager, you drag around an allocator. I'm not sure what the gain is, since you felt it 'clumsy' to drag around a manager - which incidentally I disagree with.
Quote: That's the beauty of subclassing: you're not limited to the number of children you make. Each hash function could easily have its own subclass. You could even make the design so generic that minimal typing would be necessary for each subclass. You're just shifting the problem. Instead of storing the hash function etc. in a manager, you're making the programmer subclass the shared string class and then you use an allocator to deal with it instead. Personally, I would find it a bother to have to subclass off of the shared string all the time, which is one reason why I didn't do it that way. IMHO it is much cleaner to have a single, generic shared string type without need for subclasses where you can plug in a single, generic type of manager, and to use different hash functions all you need to do is call the set-hash-function method on the manager.
In any case it seems that this is a matter of personal preference. I'd be curious to hear arguments in the absolute about one being 'better' than the other. I don't think you're silly for wanting to do it that way, I just don't like it. I take it that you don't like my approach either, so I'd like to hear if you feel it's a matter of preference or if you have some kind of argument what one is simply better than the other in the absolute. |
David Haley aka Ksilyan
Head Programmer,
Legends of the Darkstone
http://david.the-haleys.org | Top |
|
The dates and times for posts above are shown in Universal Co-ordinated Time (UTC).
To show them in your local time you can join the forum, and then set the 'time correction' field in your profile to the number of hours difference between your location and UTC time.
113,876 views.
This is page 1, subject is 4 pages long: 1 2
3
4
It is now over 60 days since the last post. This thread is closed.
Refresh page
top