Register forum user name Search FAQ

Gammon Forum

Notice: Any messages purporting to come from this site telling you that your password has expired, or that you need to verify your details, confirm your email, resolve issues, making threats, or asking for money, are spam. We do not email users with any such messages. If you have lost your password you can obtain a new one by using the password reset link.

Due to spam on this forum, all posts now need moderator approval.

 Entire forum ➜ SMAUG ➜ SMAUG coding ➜ Something is corrupt

Something is corrupt

It is now over 60 days since the last post. This thread is closed.     Refresh page


Pages: 1 2  

Posted by Zeno   USA  (2,871 posts)  Bio
Date Thu 27 May 2004 03:51 AM (UTC)

Amended on Thu 27 May 2004 03:56 AM (UTC) by Zeno

Message
Okay, now with all these odd problems, that deal with null values, etc, I start wondering if something seriously wrong with my MUD (see these posts: http://www.gammon.com.au/forum/bbshowpost.php?bbsubject_id=4240 and http://www.gammon.com.au/forum/bbshowpost.php?bbsubject_id=4134 )

This is the latest problem:


#0  0x0812d32e in weather_update () at update.c:2464
2464                    if(d->connected == CON_PLAYING &&
(gdb) bt
#0  0x0812d32e in weather_update () at update.c:2464
#1  0x0812c54d in update_handler () at update.c:2022
#2  0x080a9d66 in game_loop () at comm.c:690
#3  0x080a965d in main (argc=8, argv=0xbfffdf10) at comm.c:304
#4  0x42015967 in __libc_start_main () from /lib/i686/libc.so.6
(gdb) print d
$1 = (struct descriptor_data *) 0x8359548
(gdb) print d->port
$2 = 1184
(gdb) print d->connected
$1 = 0
(gdb)
(gdb) print d->character
$2 = (struct char_data *) 0x0


Its null again...

Here's the (stock) code:

        for(d = first_descriptor; d; d = d->next)
        {
                WEATHER_DATA *weath;

                if(d->connected == CON_PLAYING &&
                        IS_OUTSIDE(d->character) &&
                        !NO_WEATHER_SECT(d->character->in_room->sector_type) &&
                        IS_AWAKE(d->character))
                {
                        weath = d->character->in_room->area->weather;
                        if(!weath->echo)
                                continue;
                        set_char_color(weath->echo_color, d->character);
                        ch_printf(d->character, weath->echo);
                }
        }


Its quite odd... Does any of this relate to the other problems? Right now, I'm just wondering why it crashed with this...

Zeno McDohl,
Owner of Bleached InuYasha Galaxy
http://www.biyg.org
Top

Posted by David Haley   USA  (3,881 posts)  Bio
Date Reply #1 on Thu 27 May 2004 04:54 AM (UTC)
Message
Nah, it just means that somewhere along the line your descriptors are getting corrupted. Note that every time so far, it's been desc->character that is null. That means that somewhere, it's being set to null but shouldn't be, or isn't being set to something when it should be. Unfortunately however these are among the hardest to find. :-)

What I would do is set up some kind of sanity check function, that you run very often. This is a divide and conquer technique for your debugging. Basically, you write a function that checks the descriptors, and makes sure that they're all valid and correct. You call this function all over the place, and that way you narrow down where the error is: e.g. between these two calls, something broke.

That's the best suggestion I can give, other than examining and everything you might have changed that has to do with descriptors, and using a step-by-step debugger like gdb. Frankly though, with a project the size of a MUD I think you'd be better off using the sanity check method. Just remember to take your sanity checks out when you're done.

Optionally, the 'good' way of doing it is the following:
#ifdef SANITY_CHECK
  DoSanityCheck();
#endif

Then, add to your makefile -DSANITY_CHECK, recompile, run, solve problem, then remove the define, then recompile. That way if something breaks again, you still have all the sanity checks in there, but you can turn them off when you don't need them so as to increase performance of your MUD.

David Haley aka Ksilyan
Head Programmer,
Legends of the Darkstone

http://david.the-haleys.org
Top

Posted by Zeno   USA  (2,871 posts)  Bio
Date Reply #2 on Thu 27 May 2004 05:13 AM (UTC)
Message
The only thing have added to do with descriptors is an update function... But it doesn't change the descriptor.

void check_updates( DESCRIPTOR_DATA *d, CHAR_DATA *ch )
{
    char updatebuf[MAX_STRING_LENGTH];

        sprintf( updatebuf, "\n\rChecking for character updates...\n\r");
        write_to_buffer( d, updatebuf, 0 );

        if(!IS_SET(ch->pcdata->flags, PCFLAG_DEADLY))
        {
          sprintf( updatebuf, "Update found. Beginning update.\n\r");
          write_to_buffer( d, updatebuf, 0 );
          SET_BIT(ch->pcdata->flags, PCFLAG_DEADLY);
          sprintf( updatebuf, "Done.\n\r\n\r");
          write_to_buffer( d, updatebuf, 0 );
        }
        else
        {
          sprintf( updatebuf, "None found.\n\r\n\r");
          write_to_buffer( d, updatebuf, 0 );
        }
     return;
}


That, and hotboot... Hmm, some other snippets too. I'll check over those.

Zeno McDohl,
Owner of Bleached InuYasha Galaxy
http://www.biyg.org
Top

Posted by Greven   Canada  (835 posts)  Bio
Date Reply #3 on Thu 27 May 2004 07:13 AM (UTC)

Amended on Thu 27 May 2004 07:17 AM (UTC) by Greven

Message
So you know, you shouldn't have to use sprintf to write to updatebuf if you are not using any variables in the string. I beleive that sprintf should need 3 inputs, as the define for it is
Quote:
int sprintf(char *str, const char *format, ...);
The "..." can work as null, but as I understant it, its safer to do something like
        sprintf( updatebuf, "%s", "\n\rChecking for character updates...\n\r");
        write_to_buffer( d, updatebuf, 0 );
However, since its only a static line, you should be able to
write_to_buffer( d, "\n\rChecking for character updates...\n\r", 0 );
This is minor, but useful in that it kills a couple lines of function calls. If I'm wrong about the ..., someone will correct me, I'm sure ;)

As to the real problem, however, should it happen again, try "print *d" to see all the values of the structure. It may be the right type, but that doesn't nessecarily mean its filled with meaningful data. One possibility that comes to mind is a memory leak somewhere that is crossing into your descriptor data, but may be fairly unconnected. In stock smaugFUSS, d->connected = 0 should not cause any sort of crash, as
Quote:
typedef enum
{
CON_PLAYING, CON_GET_NAME,
playing is supposed to be 0. This may also mean that at some point d->next is not being set properly, or that perhaps a the link call is being corrupted.

Couple questions then:

1. Does this only happen after copyover/warmboot/hotboot?
2. Does this occur for any specific players, or is it random?
3. Will this only happen after a person quits/dies?

The first one may mean an error in the linking back in at the end of copyover_recover. The second may mean that someone specific is doing something, and you may want to log them to try to find a pattern to their work. The third may mean that someone is not getting disconnected properly, and the double linked list for descriptors is corrupt, rather than the data itself.

Hope that helps, though it is a little off topic.

Nobody ever expects the spanish inquisition!

darkwarriors.net:4848
http://darkwarriors.net
Top

Posted by David Haley   USA  (3,881 posts)  Bio
Date Reply #4 on Thu 27 May 2004 04:49 PM (UTC)
Message
It's perfectly fine to not put arguments into the '...' as long as you don't actually specify any vars in your string.

For instance it's perfectly and absolutely fine to write: printf("hello world");
In fact, this is faster than: printf("hello world", 0); simply because it's one less argument to push onto the stack.

Quote:
One possibility that comes to mind is a memory leak somewhere that is crossing into your descriptor data, but may be fairly unconnected.
Memory leaks don't change data; a memory leak is simply when you allocate memory, then lose a pointer to it without freeing it. e.g.

int * a = new int;
*a = 4;
a = new int; // we just lost the pointer to the old int
delete a;
// 4 bytes leaked

In other words leaks don't change things, they just lose memory. Overflows would change stuff, but that typically only happens with arrays (or strings which are basically arrays).

David Haley aka Ksilyan
Head Programmer,
Legends of the Darkstone

http://david.the-haleys.org
Top

Posted by Greven   Canada  (835 posts)  Bio
Date Reply #5 on Thu 27 May 2004 05:50 PM (UTC)
Message
My apoligies, it was late, and I was thinking overflow, but wrote memory leak(what with the overflow "leaking" into the next bit of data). If for example d->host or something was overflowed with an improper strncat ( perhaps its declared as MAX_INPUT_LENGTH, but the strncat used MAX_STRING_LENGTH or the like), it would fill whatever memory went beyond d->host with the null padded bits that strn* uses when filling out the rest of the string. That might set everythingin the following data to 0, as it appears to be so far.

Nobody ever expects the spanish inquisition!

darkwarriors.net:4848
http://darkwarriors.net
Top

Posted by Zeno   USA  (2,871 posts)  Bio
Date Reply #6 on Thu 27 May 2004 07:13 PM (UTC)
Message
Hmm, I can't really answer any of your questions Greven. I do a hotboot everytime I'm on, since I'm making changes, but I don't know if the crash happens after or before a hotboot. (Since there are auto reboots)

I'm pretty sure its random players. ALthough I can't really check if I can't print ch.

Hmm, it did happen in my other post after someone died. But both, I assume.

Zeno McDohl,
Owner of Bleached InuYasha Galaxy
http://www.biyg.org
Top

Posted by David Haley   USA  (3,881 posts)  Bio
Date Reply #7 on Thu 27 May 2004 07:35 PM (UTC)
Message
My suggestion would be to look at the hotboot code. Don't forget the sanity checks. Make sure everything is exactly as it should be after a hotboot. It would be worth checking to see if the MUD ever dies *before* you do a hotboot.

David Haley aka Ksilyan
Head Programmer,
Legends of the Darkstone

http://david.the-haleys.org
Top

Posted by Zeno   USA  (2,871 posts)  Bio
Date Reply #8 on Thu 27 May 2004 08:37 PM (UTC)
Message
Yeah, I've just added the sanity check to some places. I'll have to see about checking if it crashes after or before a hotboot. I'll leave it up for a few days, without doing a hotboot.

Zeno McDohl,
Owner of Bleached InuYasha Galaxy
http://www.biyg.org
Top

Posted by Zeno   USA  (2,871 posts)  Bio
Date Reply #9 on Fri 28 May 2004 10:54 PM (UTC)
Message
Nothing yet, except for a crash because I took out the tail skill.

Ksilyan, would you explain SanityCheck for me? I've never even heard of it, and I can't find anything on it.

Zeno McDohl,
Owner of Bleached InuYasha Galaxy
http://www.biyg.org
Top

Posted by David Haley   USA  (3,881 posts)  Bio
Date Reply #10 on Fri 28 May 2004 11:19 PM (UTC)
Message
It's a technique, not really a function. SMAUG has some of it from time to time. The basic idea is that you loop through your structures, and make sure everything looks like what it's supposed to do.

An example sanity check in SMAUG is when it loops through linked lists, making sure that me->next->prev == me. If you find an entry that is not like that, you know there is a problem.

In your case it would be a little harder, but basically, you would want to check that for every character, if ch->desc, then ch->desc->character == ch. (I think.)

Then, you put these function calls all over the place to narrow down your problem. With a moderate amount of cleverness, using macros, you could even make function calls that specify the line and file at which the function was called.

e.g.

#define DO_CHECK SanityCheck(__FILE__, __LINE__)

void SanityCheck(const char * filename, int lineNum)
{
  printf("SanityCheck, file %s line %d\n", filename, lineNum);

  // check char structures here, do whatever you need to do
}


Then to call the function, just type DO_CHECK; and it should automatically call the function with the correct arguments: the filename (e.g. act_wiz.c) and line number at which you called it.

Using this, you basically enter 'search and destroy' mode, in which you narrow things down. Eventually you will narrow down to a small region in which something breaks. I like to call this 'divide and conquer' even if that term is used more in other contexts... anyways. :) Does this all make more sense now? The basic idea is that you run checks and make sure things are normal, and you narrow it down until:
CheckWorksHere;
SomeCodeHappensHere;
CheckNoLongerWorksHere;


Ideally you keep on doing that until you narrow it down to just one line. Of course generally you the human can figure out where it's breaking before getting down to just one line, but it can help isolate the problem.

Unfortunately however it seems to be a rather rare thing, so you might have to run your MUD intensively for a while to make it crash. Do you have any way to reliably reproduce the crash?

David Haley aka Ksilyan
Head Programmer,
Legends of the Darkstone

http://david.the-haleys.org
Top

Posted by Zeno   USA  (2,871 posts)  Bio
Date Reply #11 on Fri 28 May 2004 11:30 PM (UTC)
Message
Ah, and here I am thinking its a Smaug function, no wonder I couldn't find it. Anyways, what crash? The odd ones that have null descriptor's, etc? No, not at all. I can reproduce the crash with the tail skill, but not with the null parts.

Zeno McDohl,
Owner of Bleached InuYasha Galaxy
http://www.biyg.org
Top

Posted by David Haley   USA  (3,881 posts)  Bio
Date Reply #12 on Fri 28 May 2004 11:35 PM (UTC)
Message
Well, in that case you should just put sanity checks in semi-random places - where you think things might be going wrong - and just see if it helps. It can't hurt, in any case. Might slow things down a bit, but it might help things. :)

Have you checked the hotboot yet? Those are always suspicious...

David Haley aka Ksilyan
Head Programmer,
Legends of the Darkstone

http://david.the-haleys.org
Top

Posted by Zeno   USA  (2,871 posts)  Bio
Date Reply #13 on Fri 28 May 2004 11:52 PM (UTC)
Message
*nod* I've been doing reboots instead of hotboots lately. No null crashes like before, yet.

Yeah, already put a sanity check in the update function.

Zeno McDohl,
Owner of Bleached InuYasha Galaxy
http://www.biyg.org
Top

Posted by David Haley   USA  (3,881 posts)  Bio
Date Reply #14 on Fri 28 May 2004 11:57 PM (UTC)
Message
If there's been no crash yet, that makes me much more suspicious about hotboot. Is it a standard snippet or have you made changes to it?

David Haley aka Ksilyan
Head Programmer,
Legends of the Darkstone

http://david.the-haleys.org
Top

The dates and times for posts above are shown in Universal Co-ordinated Time (UTC).

To show them in your local time you can join the forum, and then set the 'time correction' field in your profile to the number of hours difference between your location and UTC time.


68,573 views.

This is page 1, subject is 2 pages long: 1 2  [Next page]

It is now over 60 days since the last post. This thread is closed.     Refresh page

Go to topic:           Search the forum


[Go to top] top

Information and images on this site are licensed under the Creative Commons Attribution 3.0 Australia License unless stated otherwise.