Discussion:
malformed bucket chain in Tcl_DeleteHashEntry
Jim Wilcoxson
2002-01-16 03:53:12 UTC
Permalink
Our server is crashing 1-5 times per day with this error in the server log:

malformed bucket chain in Tcl_DeleteHashEntry

Does anyone have a good strategy or starting point for debugging this?
Running the server with gdb and waiting for the crash isn't really an
option because I'd have to watch it the whole day, plus once it
breakpoints our production server would be down.

We're using AS 3.4 and TCL 7x - it's faster than 8x for us, and more
compatible with 2.3.3's TCL.

Any suggestions are welcome.
Jim
Mike Hoegeman
2002-01-16 06:51:44 UTC
Permalink
Post by Jim Wilcoxson
malformed bucket chain in Tcl_DeleteHashEntry
the above means the tcl is trying to delete a hash table entry that is not there
anymore for some reason

you've probably got a malloc arena corruption or double free of some kind happening
if i had to make a off the cuff guess. it you have C extensions
that would be the first place to look. but it could just be a subtle
7x threading bug.. in which case. you'd have to break out purify or electric fence

i know of people having this problem in tcl7.6 when using the rename command
under certain situations.
Post by Jim Wilcoxson
Does anyone have a good strategy or starting point for debugging this?
Running the server with gdb and waiting for the crash isn't really an
option because I'd have to watch it the whole day, plus once it
breakpoints our production server would be down.
We're using AS 3.4 and TCL 7x - it's faster than 8x for us, and more
compatible with 2.3.3's TCL.
Any suggestions are welcome.
Jim
Jim Wilcoxson
2002-01-16 13:47:30 UTC
Permalink
Thanks for the help. I'll review the C extensions that we only use on this
particular server since I think it's the only one that crashes with this error.

RE: rename - we use this command when the server boots but not while it's
running. Do you know if crashes occur when the rename function is running,
or is the fact that we have renamed procedures enough to cause a crash
later at some point?

Thanks,
Jim
Post by Mike Hoegeman
Post by Jim Wilcoxson
malformed bucket chain in Tcl_DeleteHashEntry
the above means the tcl is trying to delete a hash table entry that is not there
anymore for some reason
you've probably got a malloc arena corruption or double free of some kind happening
if i had to make a off the cuff guess. it you have C extensions
that would be the first place to look. but it could just be a subtle
7x threading bug.. in which case. you'd have to break out purify or electric fence
i know of people having this problem in tcl7.6 when using the rename command
under certain situations.
Post by Jim Wilcoxson
Does anyone have a good strategy or starting point for debugging this?
Running the server with gdb and waiting for the crash isn't really an
option because I'd have to watch it the whole day, plus once it
breakpoints our production server would be down.
We're using AS 3.4 and TCL 7x - it's faster than 8x for us, and more
compatible with 2.3.3's TCL.
Any suggestions are welcome.
Jim
Mike Hoegeman
2002-01-16 17:05:41 UTC
Permalink
Post by Jim Wilcoxson
Thanks for the help. I'll review the C extensions that we only use on this
particular server since I think it's the only one that crashes with this error.
RE: rename - we use this command when the server boots but not while it's
running. Do you know if crashes occur when the rename function is running,
or is the fact that we have renamed procedures enough to cause a crash
later at some point?
the bug reports i saw re: rename had crash dumps that indicated it happened as the rename
was being done. not at some time later...

-mike

Loading...