curl-library
splay related crashes in rtorrent, bzflag
Date: Sun, 16 Jul 2006 02:59:57 +0200
I'm seeing the same kind of crashes in rtorrent as reported by David Vuorio
on bzflag with libcurl3{-dbg}-7.15.4-1. (Debian bug #375076) It happens
twice or so each day so the cost of rehashing files forced me to look closer
at the curl code.
The usual crash:
(gdb) bt
#0 0x028796ce in Curl_splay (i=1152823333, t=0x18e539c) at splay.c:49
#1 0x02879815 in Curl_splaygetbest (i=1152823333, t=0x18e539c,
removed=0xbffff4cc) at splay.c:205
#2 0x02873fbe in curl_multi_perform (multi_handle=0x2c01040,
running_handles=0xbffff530) at multi.c:916
#3 0x0004516c in core::CurlStack::perform (this=0x2c01008) at
curl_stack.cc:69
#4 0x0003e59c in core::PollManagerSelect::poll (this=0x2c01000,
timeout={m_time = 107664}) at poll_manager_select.cc:89
#5 0x00007af9 in main (argc=1, argv=0xbffff840) at main.cc:255
I found that when Curl_splayremovebyaddr returns an error it doesn't
actually do anything about it in my build, so i replaced them with asserts
(checking "t != remove" rather than comparing the keys) and got this
backtrace:
On what used to be "if(t->key != remove->key) return 2;"
#0 0x8fe03e5a in __dyld__ZN4dyld14bindLazySymbolEPK11mach_headerPm ()
#1 0x8fe1377f in __dyld_stub_binding_helper_interface ()
#2 0x00000000 in ?? ()
#3 0x02875051 in Curl_expire (data=0x2e94000, milli=0) at multi.c:1308
#4 0x0286444b in Curl_done (connp=0x2d88bfc, status=CURLE_OK) at url.c:4048
#5 0x028744c9 in multi_runsingle (multi=0x2d010d0, easy=0x2d88bf0,
running_handles=0xbfffe910) at multi.c:875
#6 0x0287495c in curl_multi_perform (multi_handle=0x2d010d0,
running_handles=0xbfffe910) at multi.c:948
#7 0x000451d0 in core::CurlStack::perform (this=0x2d01098) at
curl_stack.cc:69
#8 0x0003e600 in core::PollManagerSelect::poll (this=0x2d01090,
timeout={m_time = 77165}) at poll_manager_select.cc:89
#9 0x00007b5d in main (argc=1, argv=0xbfffec24) at main.cc:255
frame 3:
b) print multi.easy
$12 = {
next = 0x0,
prev = 0x0,
easy_handle = 0x44b95236,
easy_conn = 0x2e94000,
state = CURLM_STATE_INIT,
result = CURLE_OK,
msg = 0x0,
msg_num = 0,
sockstate = {
socks = {0, 0, 0, -1, 0, 306, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0},
action = 0
}
}
(gdb) print data
$13 = (struct SessionHandle *) 0x2e94000
(gdb) print data->state.expiretime
$4 = {
tv_sec = 1152995894,
tv_usec = 4410
}
(gdb) print data->state.timenode
$5 = {
smaller = 0x0,
larger = 0x0,
same = 0x0,
key = 1152995894,
payload = 0x2e94000
}
(gdb) print multi->timetree
$9 = (struct Curl_tree *) 0x0
Since Curl_expire clears expiretime and the easy_conn looks ok I don't think
the SessionHandle was previously removed. Checking the two Curl_splaygetbest
calls in multi.c, i found that they don't clear expiretime, which afaics
could cause the first Curl_splayremovebyaddr in Curl_expire to be called on
an already removed node.
The attached patch clears the timer, and I'm hoping it will solve my
problem. Though perhaps the higher layers are supposed to make sure
Curl_expire doesn't get called.
Having looked at the splay code, I think the implementation could have been
cleaner and more robust.
Rakshasa
- application/octet-stream attachment: clear_expiretime.diff