Notice: Any messages purporting to come from this site telling you that your password has expired, or that you need to verify your details, confirm your email, resolve issues, making threats, or asking for money, are
spam. We do not email users with any such messages. If you have lost your password you can obtain a new one by using the
password reset link.
Due to spam on this forum, all posts now need moderator approval.
Entire forum
➜ Programming
➜ General
➜ Baffled by seemingly-mysterious segfault error(s)
Baffled by seemingly-mysterious segfault error(s)
|
It is now over 60 days since the last post. This thread is closed.
Refresh page
Posted by
| Lmclarke
(26 posts) Bio
|
Date
| Sun 04 Mar 2012 08:44 PM (UTC) Amended on Mon 05 Mar 2012 12:17 AM (UTC) by Lmclarke
|
Message
| Alright, in and of itself, this shouldn't be too terribly strange, but it seems to be narrowed down to only one, single player who crashes the game every single time he shows up. It isn't his pfile - the same error occurs every time he connects, whether to his own pfile, the Guest pfile, random new pfiles, etc. Me and others can connect to the same pfiles with no difficulty.
Troubleshooting that has already been done:
- As mentioned above, I checked and doublechecked that it wasn't an error with his pfile. No matter which pfile he signs into, the game crashes with the same error. Other users, including myself, can use the same pfiles without issue.
- Just in case I deleted his pfile and remade it. Same things occurred - I can sign into it without issue. He signs into it, we crash.
- His client is MUSHclient, which is the most commonly used client on our game. It seems highly unlikely to be his client because we (almost) all use it.
- He is not inputting any unusual commands, and he can occasionally stay logged in long enough to have a brief conversation. The timing seems fairly random as to when the error decides to occur, but it always occurs eventually.
- He can sign into the game from The Mud Connector FMud in-browser client without any problems. The game does not crash. This is using the mudconnector.com proxy. However, having him use a free proxy through MUSHClient does not have the same effect. He connects through the proxy and the same error occurs. It is only the in-browser FMud client on TMC that seems immune.
I am a fairly new coder, so this is a little outside my knowledge. I did what I could to try and pin the problem down, but the results are difficult for me to process, let alone what I should do with them.
This is what I get in gdb when it crashes:
Program received signal SIGSEGV, Segmentation fault.
0x00259bb9 in strcat () from /lib/libc.so.6
Backtrace:
(gdb) bt
#0 0x00259bb9 in strcat () from /lib/libc.so.6
#1 0x0809a76a in read_from_buffer (d=0xb70460e4) at comm.c:1397
#2 0x080995f6 in game_loop_unix (control=8) at comm.c:835
#3 0x08098fc3 in main (argc=2, argv=0xbfffd834) at comm.c:439
The portions of comm.c referenced in the backtrace:
Line 1837 specified in context below
if ((signed char) 255 > d->inbuf > (signed char) 250)
{
i++;
strcat(telbuf, " ");
if (IS_TELOPT(d->inbuf))
strcat(telbuf, telopts[(unsigned char)d->inbuf]); /* Line 1397 */
else
{
sprintf(buf, "(%d)", d->inbuf);
strcat(telbuf, buf);
}
}
Line 835 specified in context below
if (d->character != NULL && d->character->wait > 0)
{
--d->character->wait;
continue;
}
read_from_buffer (d); /* Line 835 */
if (d->incomm[0] != '\0')
{
d->fcommand = TRUE;
stop_idling (d->character);
Line 439 specified in context below
#if defined(unix)
if (!fCopyOver)
control = init_socket (port);
boot_db (fCopyOver);
sprintf (log_buf, "The Requiem has docked at port %d.", port);
log_string (log_buf, CHANNEL_LOG_NORMAL);
game_loop_unix (control); /* Line 439 */
close (control);
#endif
Continued... | Top |
|
Posted by
| Lmclarke
(26 posts) Bio
|
Date
| Reply #1 on Sun 04 Mar 2012 08:46 PM (UTC) Amended on Sun 04 Mar 2012 08:50 PM (UTC) by Lmclarke
|
Message
| Valgrind leak check (standard):
Write_to_descriptor: Bad file descriptor
Write_to_descriptor: Bad file descriptor
Write_to_descriptor: Bad file descriptor
Write_to_descriptor: Bad file descriptor
Write_to_descriptor: Bad file descriptor
==28441== HEAP SUMMARY:
==28441== in use at exit: 16,661,251 bytes in 95 blocks
==28441== total heap usage: 122 allocs, 27 frees, 16,669,134 bytes allocated
==28441==
==28441== LEAK SUMMARY:
==28441== definitely lost: 0 bytes in 0 blocks
==28441== indirectly lost: 0 bytes in 0 blocks
==28441== possibly lost: 0 bytes in 0 blocks
==28441== still reachable: 16,661,251 bytes in 95 blocks
==28441== suppressed: 0 bytes in 0 blocks
==28441== Reachable blocks (those to which a pointer was found) are not shown.
==28441== To see them, rerun with: --leak-check=full --show-reachable=yes
==28441==
==28441== For counts of detected and suppressed errors, rerun with: -v
==28441== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 13 from 8)
Valgrind leak check (full):
New_descriptor: accept: Socket operation on non-socket
New_descriptor: accept: Socket operation on non-socket
New_descriptor: accept: Socket operation on non-socket
New_descriptor: accept: Socket operation on non-socket
New_descriptor: accept: Socket operation on non-socket
New_descriptor: accept: Socket operation on non-socket
New_descriptor: accept: Socket operation on non-socket
New_descriptor: accept: Socket operation on non-socket
New_descriptor: accept: Socket operation on non-socket
New_descriptor: accept: Socket operation on non-socket
New_descriptor: accept: Socket operation on non-socket
New_descriptor: accept: Socket operation on non-socket
New_descriptor: accept: Socket operation on non-socket
New_descriptor: accept: Socket operation on non-socket
New_descriptor: accept: Socket operation on non-socket
New_descriptor: accept: Socket operation on non-socket
New_descriptor: accept: Socket operation on non-socket
New_descriptor: accept: Socket operation on non-socket /* There were about 10 dozen more of these */
==1702==
==1702== HEAP SUMMARY:
==1702== in use at exit: 16,661,251 bytes in 95 blocks
==1702== total heap usage: 123 allocs, 28 frees, 16,669,486 bytes allocated
==1702==
==1702== LEAK SUMMARY:
==1702== definitely lost: 0 bytes in 0 blocks
==1702== indirectly lost: 0 bytes in 0 blocks
==1702== possibly lost: 0 bytes in 0 blocks
==1702== still reachable: 16,661,251 bytes in 95 blocks
==1702== suppressed: 0 bytes in 0 blocks
==1702== Reachable blocks (those to which a pointer was found) are not shown.
==1702== To see them, rerun with: --leak-check=full --show-reachable=yes
==1702==
==1702== For counts of detected and suppressed errors, rerun with: -v
==1702== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 13 from 8)
ANY help on this is so terribly appreciated. I feel like overturning my desk. The lines of code in comm.c that popped up in gdb haven't been changed in years, and looking at them reveals nothing to my relatively untrained eye. We're going 52 hours without stability and I'm no closer to figuring out what's wrong. :/ | Top |
|
Posted by
| Zeno
USA (2,871 posts) Bio
|
Date
| Reply #2 on Sun 04 Mar 2012 11:18 PM (UTC) |
Message
| Which is line 1397? Can you print all variables on that line? |
Zeno McDohl,
Owner of Bleached InuYasha Galaxy
http://www.biyg.org | Top |
|
Posted by
| Lmclarke
(26 posts) Bio
|
Date
| Reply #3 on Mon 05 Mar 2012 12:14 AM (UTC) |
Message
| Line 1397:
strcat(telbuf, telopts[(unsigned char)d->inbuf]);
| Top |
|
Posted by
| Nick Gammon
Australia (23,133 posts) Bio
Forum Administrator |
Date
| Reply #4 on Mon 05 Mar 2012 06:22 AM (UTC) |
Message
| Get him to make a new world file from scratch. Just enter the MUD address/port. See if it still happens.
I suspect something along the lines of, he changed his terminal type to something longer than you are allowing for. |
- Nick Gammon
www.gammon.com.au, www.mushclient.com | Top |
|
Posted by
| Lmclarke
(26 posts) Bio
|
Date
| Reply #5 on Mon 05 Mar 2012 06:33 PM (UTC) Amended on Mon 05 Mar 2012 06:36 PM (UTC) by Lmclarke
|
Message
| Bah. I thought that worked. Seemed stable for a little while, there.
Suddenly:
Program received signal SIGINT, Interrupt.
0x001cb7f2 in _dl_sysinfo_int80 () from /lib/ld-linux.so.2
(gdb) bt
#0 0x001cb7f2 in _dl_sysinfo_int80 () from /lib/ld-linux.so.2
#1 0x0038902d in ___newselect_nocancel () from /lib/libc.so.6
#2 0x08099991 in game_loop_unix (control=8) at comm.c:949
#3 0x08098fc3 in main (argc=2, argv=0xbfffd834) at comm.c:439
The backtrace is different this time...
EDIT:
Alright, I started us back up and logged in. I was the only person online this time and it crashed.
Program received signal SIGINT, Interrupt.
0x001cb7f2 in _dl_sysinfo_int80 () from /lib/ld-linux.so.2
#0 0x001cb7f2 in _dl_sysinfo_int80 () from /lib/ld-linux.so.2
#1 0x002b502d in ___newselect_nocancel () from /lib/libc.so.6
#2 0x08099991 in game_loop_unix (control=8) at comm.c:949
#3 0x08098fc3 in main (argc=2, argv=0xbfffd834) at comm.c:439
| Top |
|
Posted by
| Twisol
USA (2,257 posts) Bio
|
Date
| Reply #6 on Mon 05 Mar 2012 06:46 PM (UTC) |
Message
| I don't claim to be an expert on Linux, but - SIGINT? That's not something that just happens like a SIGSEGV, someone had to actively send that signal to the process. Did you or someone else with shell access to the server interrupt the process (such as by hitting Ctrl+C from the shell, or via the 'kill' program)?
If that's not the case - and again, not a Linux expert - someone might have unauthorized access to the server. Run 'w' from the shell to see if there are any other users online. You may want to change the passwords on your shell accounts, too. |
'Soludra' on Achaea
Blog: http://jonathan.com/
GitHub: http://github.com/Twisol | Top |
|
Posted by
| Lmclarke
(26 posts) Bio
|
Date
| Reply #7 on Mon 05 Mar 2012 07:19 PM (UTC) |
Message
| Hmm. I am the only person with shell access. I didn't kill the process or use the kill command, and didn't Ctrl+C. I checked who though, and found this, which seemed odd:
myusername pts/4 NOT myIP 09:14 3:57m 0.01s 0.01s -bash
myusername pts/8 myIP 11:01 2:14m 0.01s 0.01s python ./flashpolicyd.py --file=flashpolicy.xml --por
myusername pts/21 myIP 11:35 40:35 0.15s 0.00s login -- myusername
myusername pts/26 myIP 13:16 0.00s 0.00s 0.00s w
| Top |
|
Posted by
| Nick Gammon
Australia (23,133 posts) Bio
Forum Administrator |
Date
| Reply #8 on Mon 05 Mar 2012 08:23 PM (UTC) |
Message
| A search for "dl_sysinfo_int80" reveals about 20,000 hits.
However your problem seems to have changed. Now you are getting SIGINT not SIGSEGV. |
- Nick Gammon
www.gammon.com.au, www.mushclient.com | Top |
|
Posted by
| Zeno
USA (2,871 posts) Bio
|
Date
| Reply #9 on Wed 07 Mar 2012 04:13 AM (UTC) |
Message
| You get a SIGINT even without debugging it at all?
If so, check your MUD code for any SIGINT use. Sometimes copyover-crashes send specific signals, but I don't know if SIGINT is one of them. |
Zeno McDohl,
Owner of Bleached InuYasha Galaxy
http://www.biyg.org | Top |
|
The dates and times for posts above are shown in Universal Co-ordinated Time (UTC).
To show them in your local time you can join the forum, and then set the 'time correction' field in your profile to the number of hours difference between your location and UTC time.
29,973 views.
It is now over 60 days since the last post. This thread is closed.
Refresh page
top