[Home] [Downloads] [Search] [Help/forum]

Gammon Software Solutions forum

See www.mushclient.com/spam for dealing with forum spam. Please read the MUSHclient FAQ!

[Folder]  Entire forum
-> [Folder]  MUSHclient
. -> [Folder]  General
. . -> [Subject]  How do I make an Autocleric?

Home  |  Users  |  Search  |  FAQ
Username:
Register forum user name
Password:
Forgotten password?
(New message)
Subject: How do I make an Autocleric?
Name:
Your forum user name.
Register forum user name
Password:
Your forum password.
Forgotten password?
Message:
Message to be posted (in English, please).
Forum codes:
Check this if your message uses 'forum codes' or templates (auto-detected for new posts).
Forum codes Templates

Save this message ...


Subject review (reverse sequence)

Pages: 1  2 3  

Posted by David Haley   USA  (3,881 posts)  [Biography] bio   Moderator
Date Wed 07 Mar 2007 05:40 AM (UTC)  quote  ]
Message
Oh, that is interesting. I guess it's what makes the most sense, too. :-)

I'm guessing that lua_Number is closer to 8 bytes, since the default number size is a double.

I wonder if their hashing function simply operates on a sequence of bytes. After all, it appears that all of their values are 4 to 8 bytes (the pointer sizes depend on 32 vs 64 bit).

One of these days, I'd like to really look through the Lua implementation and see what it's like under the hood...

David Haley aka Ksilyan
Head Programmer,
Legends of the Darkstone

http://david.the-haleys.org
[Go to top] top

Posted by Nick Gammon   Australia  (18,772 posts)  [Biography] bio   Forum Administrator
Date Wed 07 Mar 2007 03:39 AM (UTC)  quote  ]
Message
Interestingly, I think you will find that all Lua types can be readily hashed because they are stored internally as a union. See the source for this:


/*
** Union of all Lua values
*/
typedef union {
  GCObject *gc;
  void *p;
  lua_Number n;
  int b;
} Value;



Depending on the size of lua_Number, this value is going to be a small size (say around 4 bytes, maybe slightly more). Lua internally stores strings in such a way that identical strings are stored in the same place. Thus if the *address* of two strings is the same, the strings are the same. Similarly for tables, which are really a pointer to the table. And, userdata is simply an address. Once you take the type of data into account, in addition to the value, you have a small field that needs to be hashed to index into a table.

- Nick Gammon

www.gammon.com.au, www.mushclient.com
[Go to top] top

Posted by David Haley   USA  (3,881 posts)  [Biography] bio   Moderator
Date Sat 03 Mar 2007 12:13 PM (UTC)  quote  ]
Message
They use a hash function, not an ordering. I'm guessing that since Lua has a relatively small number of pure data types, it's possible to code in advance the hash function for all those types. But right, a straight comparison is definitely not what they use.

David Haley aka Ksilyan
Head Programmer,
Legends of the Darkstone

http://david.the-haleys.org
[Go to top] top

Posted by Shaun Biggs   USA  (644 posts)  [Biography] bio
Date Sat 03 Mar 2007 08:08 AM (UTC)  quote  ]
Message
Quote:
This is indeed a pretty interesting design decision on their part, mainly because it requires that whatever you are storing be sortable/comparable. If you're inserting pointers, then everything is fine because you can compare pointer addresses. String objects also come with the comparison operator (lexicographical comparison). But if you're trying to key on a new data structure (or store it in a set) you need to have your own comparison.

I'm pretty sure the Lua developers came up with on their own. Comparison by strings is definitely not what is going on, because you can have functions as keys and values for tables in Lua.

It is much easier to fight for one's ideals than to live up to them.
[Go to top] top

Posted by David Haley   USA  (3,881 posts)  [Biography] bio   Moderator
Date Fri 02 Mar 2007 09:13 PM (UTC)  quote  ]
Message
Quote:
Interestingly, the Standard Template Library (STL), when storing similar things (in a map or set), does not actually use hashes but a B-Tree, which has the interesting property that regardless of the order in which you put things, they come out in sequence (alphabetic order for strings).
This is indeed a pretty interesting design decision on their part, mainly because it requires that whatever you are storing be sortable/comparable. If you're inserting pointers, then everything is fine because you can compare pointer addresses. String objects also come with the comparison operator (lexicographical comparison). But if you're trying to key on a new data structure (or store it in a set) you need to have your own comparison.

Of course, you have the same problem with hash functions, in that it's unclear how you would go about hashing an unknown data structure without additional information about it.

So I'm guessing the btree choice had to do not only with wanting the properties of binary trees (or not), but also with how to actually implement a binary tree versus a hash table. As soon as you stop storing pointers you need to worry about your "keying" function, and it's probably easier to define an ordering on things than to define a hash function -- because, in part, a hash function has to be much more carefully crafted in order to distribute elements more or less equally over the buckets and not have (too many) collisions.

David Haley aka Ksilyan
Head Programmer,
Legends of the Darkstone

http://david.the-haleys.org
[Go to top] top

Posted by Nick Gammon   Australia  (18,772 posts)  [Biography] bio   Forum Administrator
Date Fri 02 Mar 2007 07:55 PM (UTC)  quote  ]
Message
Oh, and how is the Autocleric going?

- Nick Gammon

www.gammon.com.au, www.mushclient.com
[Go to top] top

Posted by Nick Gammon   Australia  (18,772 posts)  [Biography] bio   Forum Administrator
Date Fri 02 Mar 2007 07:52 PM (UTC)  quote  ]

Amended on Fri 02 Mar 2007 07:53 PM (UTC) by Nick Gammon

Message
Quote:

... all anyone had to do, which Nick did quite handilly, is explain that a hash was being used, not a linear address space ...


I don't really want to get bogged down here with claims and counter claims about what people know, or don't know.

The forum is intended to be an informational and educational tool, and thus I am happy to clarify things like that.

It is easy - if you have come from a history, as I have, of using simpler languages (if you can call assembler simple) - to assume that language implementors still do things the same way as they always have. Which in a sense they do, deep down.

You can see from the fact that what look like arrays in Lua, but support negative subscripts, must really use something else.

Interestingly, the Standard Template Library (STL), when storing similar things (in a map or set), does not actually use hashes but a B-Tree, which has the interesting property that regardless of the order in which you put things, they come out in sequence (alphabetic order for strings).

If you expected the keys to be hashed you would be surprised by that.

It shows that taking an interest in the implementation details can clarify some confusion, when the behaviour of the overall algorithm is not quite what you expect.

- Nick Gammon

www.gammon.com.au, www.mushclient.com
[Go to top] top

Posted by David Haley   USA  (3,881 posts)  [Biography] bio   Moderator
Date Fri 02 Mar 2007 07:15 PM (UTC)  quote  ]
Message
Quote:
but also the fact that, while sections of memory might go unused, the chunks don't "necessarilly", I assume, need to be fixed sizes, so you use "less" space for data that doesn't need fixed sizes in it. Unless I am completely wrong about that.
Well... :-)

What you're really talking about here is the difference between pointers and storing the actual data. If I had an array of pointers, (hint: a list of buckets) then I don't suffer from data elements that are larger than others in the sense that I don't have to preallocate all necessary space.

But this has nothing to do with hash tables per se.

One advantage of hash tables over arrays regarding storage efficiency is that if you have a very sparse array you are wasting lots of space. E.g., if all you have is a[1], a[50], a[102] then it would be wasteful to store than in an array of size 102 because you'd have 99 empty slots. If you had a hash table with a reasonably sized (or dynamically growing) bucket list you would be using a lot less.

Quote:
Though, I admit I don't know if hashes help to solve that problem or not, but it just seems reasonable that, to save on space, you could/would do something like that.
Yes, it's a reasonable thing to do, which is what I was referring to just above here regarding pointers. Still it is not a property of hash tables and is rather the difference between storing pointers to data versus the actual data.

David Haley aka Ksilyan
Head Programmer,
Legends of the Darkstone

http://david.the-haleys.org
[Go to top] top

Posted by Shadowfyr   USA  (1,774 posts)  [Biography] bio
Date Fri 02 Mar 2007 07:09 PM (UTC)  quote  ]
Message
Actually, I think I got it wrong. It wasn't 64k, but... Well, it had to do with the limitations of the hardware. Each "bank" was only 0000-FFFF, or "640k", but you generally only really had access to one bank of the two, and a number of memory sections where dedicated to video display, the stack, the softswitches used to control the hardware and the permanent ROM functions. That left you with 3-4 usable "sections", non-connected with each other, none of them bigger than about maybe 300k? And some of that space had to be taken up by the debugger and reload code for the Pascal system when you exited your application.

As for hashes. Sorry, partly meant lookup speed, but also the fact that, while sections of memory might go unused, the chunks don't "necessarilly", I assume, need to be fixed sizes, so you use "less" space for data that doesn't need fixed sizes in it. Unless I am completely wrong about that. Needing to pre-allocate something like 200 strings, all of them 255 bytes in length, because you can't just store the lookup address (and maybe length), then use only the space needed to store *that* specific string.... Gets rather inefficient in terms of memory use. Though, I admit I don't know if hashes help to solve that problem or not, but it just seems reasonable that, to save on space, you could/would do something like that.

main {
__if (Schrodinger_Cat is Alive or version >= "XP"){
____if version = "Vista" then Performance /= Number_of_Cores;
____call Functional_Code();}
__else
____call Crash_Windows();}
[Go to top] top

Posted by Shaun Biggs   USA  (644 posts)  [Biography] bio
Date Fri 02 Mar 2007 06:06 PM (UTC)  quote  ]

Amended on Fri 02 Mar 2007 07:44 PM (UTC) by Nick Gammon

Message
Quote:
Your extra explanations was hardly necessary though, nor was the email I got from Shaun, which also covered the same thing. BTW, how are you supposed to correctly reply to those, since the dang things reply address ends up being the "forum" instead of the senders?

There is a spot if you click on someone's name which says "Send an email to _____" which is what I used to email you in the first place. Sorry about dealing with things like that, but I was trying to avoid this whole thread, as I said in the email. Nick said he didn't want the whole 0 vs. 1 thing cluttering up his boards when there was already a link to a very long discussion on Lua's site.

Quote:
Oh, and I am well aware of what linked lists are too. I tried, unsuccessfully, one time to make one that could be both sorted and randomized at the same time. I strongly suspect that the useless Pascal compiler I was using was taking up too much memory on the machine and the 64k limit on the old Apple IIs was only compounding the problem. It *ran*, but large parts of the list got "lost" somehow... Wasn't my best success story. lol

I did the same in Pascal during high school on my 286.. how I miss that 12mhz and a whole meg of ram and HUGE 40 mb hard drive. Anyway, I wound up having to make a doubled list. I had the data, the next and previous. Then a record that contained the id and the next and previous for that set. It worked really well until I attempted to delete something, where it would reboot my computer. I never could figure out where I was going wrong.

As for the 64k limit, I believe that was a Pascal issue. Pascal was designed as a teaching language, and has a few oddities because of that. If I remember right, there was a limit in how large a source file could be. You simply have to break your code up into smaller files. I forget what that the syntax for that is, but when you want to access the extra file, you put it in your Uses section. Doing that, I made a game based around the doors game "the Pit" for my CS final my senior year which took up around 150k of source, and then several data files. Had I actually known known some of the things I know, it would probably have been about half that size.

It is much easier to fight for one's ideals than to live up to them.
[Go to top] top

Posted by David Haley   USA  (3,881 posts)  [Biography] bio   Moderator
Date Fri 02 Mar 2007 10:20 AM (UTC)  quote  ]
Message
Quote:
Most languages tend to require, even with strings, which often have fixed limits on there allowed length, that the discrete "chunks" be a known size, so that the index number is "actually" a multiplier to the linear location of the *actual* memory address of the data.
You are compounding high-level language semantics and implementation details. C or assembly working a certain way means next to nothing about how other languages present the data to you and what they require chunks or what-have-you to look like.

I don't understand how can you make claims about what "most languages [...] require" when you only know a few small number to begin with (and not the most representative languages, to boot) much less implementation details of e.g. interpreted languages. It's like your claim about "most languages" considering it an error to have expressions as statements. You really shouldn't go out on limbs like that...

Quote:
I freely admit I didn't know the implimentation differed from what "most" languages consider to be "arrays". So sue me.
Well, I'm not sure what basis you have to be making claims about "most" languages to begin with... so yes, I will in fact sue you. :-) You have a tendency to make claims about things outside the scope of what you truly know, if you don't mind me saying so.

Quote:
In other words, while ones like Lua uses hashes, because they are more efficient in some ways at storage,
Hashes are not used for storage efficiency. In fact they are not necessarily the most storage-efficient data structure! Hashes are used for the extremely fast look-up time. (Of course, assuming that your hashing algorithm is good, and that you have enough buckets and/or few enough elements to avoid collision.)

Quote:
BTW, how are you supposed to correctly reply to those, since the dang things reply address ends up being the "forum" instead of the senders?
I thought those emails had as a return value the sender's email address; at least, the ones I received did. In any case you can probably just use the forum's email facility to send an email back, if you care to do so.

David Haley aka Ksilyan
Head Programmer,
Legends of the Darkstone

http://david.the-haleys.org
[Go to top] top

Posted by Shadowfyr   USA  (1,774 posts)  [Biography] bio
Date Fri 02 Mar 2007 09:17 AM (UTC)  quote  ]
Message
You don't need to explain it more. As I said in my own later post, in cases where you are using hash systems, the index is just a place holder. If its numeric you can, in theory, get a ordered list out of it. But it could just as easilly be colors, sizes, types of bugs, or some sequence of letters, like a-z, aa-az, ba-bz, etc. The "starting" symbol is completely irrelevant.

What I meant about the machine level stuff is "linear" tables, where the index is some specific number of records from the "start" of the memory its sitting in. Most languages tend to require, even with strings, which often have fixed limits on there allowed length, that the discrete "chunks" be a known size, so that the index number is "actually" a multiplier to the linear location of the *actual* memory address of the data. In other words, while ones like Lua uses hashes, because they are more efficient in some ways at storage, though maybe more costly in terms of some overhead, (a bit enough hash will be bigger than the processor can keep in its L1 or L2 caches for fast retrieval, for example), most languages have not used liked lists or hashes for arrays, but instead structured them like:

0000: [data chunk #1, fixed size, two bytes]
0002: [data chunk #2, fixed size, two bytes]
0004: [data chunk #3, fixed size, two bytes]
0006: [data chunk #4, fixed size, two bytes]

The location is literally n * 2 bytes into the arrays memory in such cases. The 3 position is either actually #4, or its (n-1) * 2, depending on if you use 1 or 0 to start the indexing.

See, all anyone had to do, which Nick did quite handilly, is explain that a hash was being used, not a linear address space. I freely admit I didn't know the implimentation differed from what "most" languages consider to be "arrays". So sue me. Your extra explanations was hardly necessary though, nor was the email I got from Shaun, which also covered the same thing. BTW, how are you supposed to correctly reply to those, since the dang things reply address ends up being the "forum" instead of the senders?

Oh, and I am well aware of what linked lists are too. I tried, unsuccessfully, one time to make one that could be both sorted and randomized at the same time. I strongly suspect that the useless Pascal compiler I was using was taking up too much memory on the machine and the 64k limit on the old Apple IIs was only compounding the problem. It *ran*, but large parts of the list got "lost" somehow... Wasn't my best success story. lol

main {
__if (Schrodinger_Cat is Alive or version >= "XP"){
____if version = "Vista" then Performance /= Number_of_Cores;
____call Functional_Code();}
__else
____call Crash_Windows();}
[Go to top] top

Posted by David Haley   USA  (3,881 posts)  [Biography] bio   Moderator
Date Thu 01 Mar 2007 10:41 PM (UTC)  quote  ]
Message
Quote:
Dang modern programmers.. Don't know a damn thing about how the machine actually works... ;) lol
Dang newbies, don't know a damn thing about how languages are actually implemented internally or any of the formal reasoning for why things work the way they do. *wry smile*

Nick has started explaining this and I will explain it some more. If your array is implemented as a hash table, then the key is for all intents and purposes not a number, even if it is "0" or "1" etc.

The implementation of a hash table will convert your key, using some magic, to a hash value. That hash value will then give you the bucket of the hash table. For simplicity's sake we will assume that there are no collisions in the hash table.

Now, there is no guarantee that the bucket assigned to key "1" by the hash table is the number "1" or "0" or any such thing. For all you know the bucket could be 792.

So, starting your array by zero or one makes absolutely no difference in terms of performance from the Lua programmer's side. You still have to go through the hash function in C, which will convert the key to a bucket. The bucket, of course, is a memory address, which can then be dereferenced immediately.

Therefore, I maintain my original statement and refute your response.

Your mistake is to have not understood the difference between a language's semantics, its implementation in some programming language e.g. C, and finally the machine code underlying it all. You did not realize that the implementation of a data structure can change, sometimes dramatically, the procedure to look up a given key. And finally I think you did not sufficiently mark the difference between a table key and an index into memory. The two are quite different, especially in a higher-level language like Lua the intention of which is to remove oneself from implementation details and establish a formal semantics independent of machine-level specifics.

And finally, I think you were a little premature in your supposition that I know nothing about how things actually work at the machine level. :-) Writing some stuff in hex and assembly doesn't necessarily make your point any better, especially when by focusing on the leaf, you missed the tree, much less the whole forest, as it were.



Besides, as Nick pointed out, even if you were translating numbers to indices -- as Lua does for the special part of the table that Nick mentioned -- the cost of subtracting one from a number is, for almost everybody and for almost all purposes, completely irrelevant when compared to other costs. I can think of rather few cases when you would actually care about squeezing every last machine instruction's worth of time out of your program, and in most of those cases, you would be somewhat silly to be doing it in Lua to begin with.



Anyhow. Enough of that (for now at least).
Quote:
On the other hand, to count how many people are attending the party (as far as we know), then a simple test for "anything that is not 'no' or 'nil'" gives us the answer. This is effectively what the Lua "if" test does.


That's a very good point Nick. It's true that in almost all cases, a value of "I don't know" is basically the same as "not yes". And usually, that is what we want. What it comes down to is the nature of the question asked.

David Haley aka Ksilyan
Head Programmer,
Legends of the Darkstone

http://david.the-haleys.org
[Go to top] top

Posted by Nick Gammon   Australia  (18,772 posts)  [Biography] bio   Forum Administrator
Date Thu 01 Mar 2007 08:37 PM (UTC)  quote  ]
Message
Quote:

Lua doesn't start arrays with 0 either ...


Following on from this, really Lua doesn't start arrays anywhere in particular, as I have shown, you can have negative subscripts.

It is probably more accurate to say that some Lua table access functions, for example table.foreachi, and the ipairs function, work by accessing item 1 onwards in a table, even if other values (like 0, -10, -2343) exist.

They are defined to do that, in other words, the function does that by definition, not because the table "starts at one".

Here is an excerpt from the Lua reference manual (see http://www.lua.org/manual/5.1/manual.html#5.1):


ipairs (t)

Returns three values: an iterator function, the table t, and 0, so that the construction

     for i,v in ipairs(t) do body end

will iterate over the pairs (1,t[1]), (2,t[2]), ···, up to the first integer key absent from the table.  


The ipairs function does not claim to start at the "start of" the table, but simple starts at index 1, and so on, until it finds a missing integer key.

- Nick Gammon

www.gammon.com.au, www.mushclient.com
[Go to top] top

Posted by Nick Gammon   Australia  (18,772 posts)  [Biography] bio   Forum Administrator
Date Thu 01 Mar 2007 08:31 PM (UTC)  quote  ]
Message
Quote:

... on the machine code level, its still starting with 0 ...


I think this is a matter of definition, a bit. Say I have a table like this:


t = {}
t ["nick"] = 10
t ["gammon"] = 20

table.foreach (t, print)


The keys are "nick" and "gammon". Which one does the table "start" with? It actually prints "gammon / nick" when I run it, so I suppose you can say it "starts" with gammon, but I think this is an idiosyncracy of the hashing algorithm.

The fact is, that in Lua (and various other constructs, for example some container types in STL), there is no linear array, and therefore no "first" entry in the classical sense (that is, the sense of "index by zero").

If you make a linked list in C, the first entry is the first item in the list, but in terms of memory storage, may be occupying a higher memory address than subsequent items.

- Nick Gammon

www.gammon.com.au, www.mushclient.com
[Go to top] top

The dates and times for posts above are shown in Universal Co-ordinated Time (UTC).

To show them in your local time you can join the forum, and then set the 'time correction' field in your profile to the number of hours difference between your location and UTC time.


6,472 views.

This is page 2, subject is 3 pages long:  [Previous page]  1  2 3  [Next page]

[Reply to this subject]  Reply to this subject   [New subject]  Start a new subject   [Refresh] Refresh page

Go to topic:           Search the forum


[Go to top] top

[Home]

Written by Nick Gammon - 5K

Comments to: Gammon Software support
[RH click to get RSS URL] Forum RSS feed ( http://www.gammon.com.au/rss/forum.xml )

[Best viewed with any browser - 2K]    [Internet Contents Rating Association (ICRA) - 2K]    [Web site powered by FutureQuest.Net]