Nick Gammon said: So according to that, a 40-bit hash only covers a 20-bit space (1,048,576 items) so perhaps you are right that by increasing to 48 bits we get to hash 2^24 items without too many problems (16,777,216 items).
I would be tempted to use all characters from the md5 hash. "If it's good enough for git..." I was going to say that compression would probably help here, but maybe it wouldn't be too useful after all because hashes are not likely to be very repetitive. Even so, I'm much more worried about unintentional collisions than I am about sending a few more bytes.
Even with server-side detection of collided hashes, it's still only over one session. A MUD could have many sessions with reboots etc., so I'm not sure that would be enough.
That said, we cannot avoid collisions without removing the hashing entirely. Adding information like the type is dividing the space of probabilities into smaller spaces, but it's not going to fully solve the problem.
A nice feature of Nick's approach is that it is agnostic to the instance/model difference. You simply have attributes that change a lot and attributes that change less often. As soon as you start talking about instances and models, you must make assumptions about the relationship between these two. And as soon as you make these assumptions, you considerably reduce portability to other MUDs.
Here is an idea that can help sanity checking. You can include a (small) checksum of some sort in both the volatile and mostly-static sections. This checksum would need to be constructed from truly immutable attributes, because it would have to be the same for all objects that share the hashed attributes. So, you might have something like the object type and file on disk that it comes from.
Now, when you receive the hashed attributes, you simply compare the checksums. If they're equal, you can be pretty darn sure you got the right thing. If they're unequal, you know immediately that you had a hash collision.
Note the difference with simply cutting up hash spaces into sub-spaces based on type. In this case, we have a more active collision detection method. In order to have a collision, three things must happen at the same time:
1- The two attribute specification strings must hash to the same code.
2- The two things must have the same type.
3- The two things must come from the same file on disk.
Yes, we're not truly fixing the problem, but we're introducing much more active collision detection, and by verifying those two extra things (type and storage medium address) we're pinpointing to a much larger degree where these things are from. |