Bugs item #1431750, was opened at 2006-02-14 20:59
Message generated for change (Comment added) made by bagder
You can respond by visiting:
https://sourceforge.net/tracker/?func=detail&atid=100976&aid=1431750&group_id=976
Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: libcurl
Group: crash
>Status: Closed
Resolution: Fixed
Priority: 8
Submitted By: xbx (xbx___)
Assigned to: Daniel Stenberg (bagder)
Summary: crash when dns slow
Initial Comment:
I have a strange setup with the dns taking a long time
to answer.
I run my app that make periodic http(s) requests.
Everything goes fine, but after a while (that is, after
maybe 10 successfull requests to the same host) I have
this crash:
#0 0xb7fade01 in Curl_connecthost (conn=0x8a73094,
remotehost=0x0,
sockconn=0x8a7312c, addr=0xbfaefdc4,
connected=0xbfaefe07 "")
at connect.c:789
#1 0xb7f9ed04 in ConnectPlease (conn=0x8a73094,
hostaddr=0x0,
connected=0xbfaefe07 "") at url.c:2017
#2 0xb7fa19f3 in SetupConnection (conn=0x8a73094,
hostaddr=0x0,
protocol_done=0xbfaefe88 "") at url.c:3670
#3 0xb7fa1b08 in Curl_connect (data=0x8ab48bc,
in_connect=0x8b02798,
asyncp=0xbfaefe89 "", protocol_done=0xbfaefe88 "")
at url.c:3729
#4 0xb7faf002 in curl_multi_perform
(multi_handle=0x8aeac1c,
running_handles=0xbfaeff34) at multi.c:422
[....]
(notice the hostaddr=0x0 part)
I'm using libcurl 7.15.1, with this configure options:
./configure --prefix=/usr/ --disable-gopher
--disable-ldap --disable-dict --disable-telnet
--enable-nonblocking --enable-debug
--disable-crypto-auth --enable-ares --disable-ipv6
I could further investigate, but I'd have to know which
way to look..
More precisely,
code = CreateConnection(data, in_connect, &dns, asyncp);
returns:
code == OK
dns == null
asyncp == false
Is the problem that 'code' shouldn't be 'ok', or that
the program shouldn't crash when dns == Null?
----------------------------------------------------------------------
>Comment By: Daniel Stenberg (bagder)
Date: 2006-02-23 22:31
Message:
Logged In: YES
user_id=1110
Argh.
1) Sorry for being sloppy and thanks for correcting me. Yes
of course that check should be done like that!
2) Well, yes, but I added an empty macro in the hostip.h
header in case libcurl is built without c-ares
----------------------------------------------------------------------
Comment By: xbx (xbx___)
Date: 2006-02-23 22:15
Message:
Logged In: YES
user_id=1452265
just did "cvs update", the fix is in! but...
1) in multi.c:249, you'd need
if (easy->easy_conn)
Curl_done(&easy->easy_conn, easy->result);
instead of
Curl_done(&easy->easy_conn, easy->result);
(otherwise it crashes for me, for instance when a request
completes successfully)
2) in url.c
didn't you say you needed to warp
ares_cancel(data->state.areschannel);
around #ifdef USE_ARES / #endif ?
----------------------------------------------------------------------
Comment By: xbx (xbx___)
Date: 2006-02-23 12:30
Message:
Logged In: YES
user_id=1452265
No!
thank *you* for the hard work and the lib!
----------------------------------------------------------------------
Comment By: Daniel Stenberg (bagder)
Date: 2006-02-22 22:55
Message:
Logged In: YES
user_id=1110
Lovely!
If you tell me your name, I'll make sure you give you proper
credits for your hard work in the changelog.
----------------------------------------------------------------------
Comment By: xbx (xbx___)
Date: 2006-02-22 19:56
Message:
Logged In: YES
user_id=1452265
seems to cure my crash...
great!
----------------------------------------------------------------------
Comment By: Daniel Stenberg (bagder)
Date: 2006-02-22 18:30
Message:
Logged In: YES
user_id=1110
Right. I guess it just didn't trigger for me since it
resolved the name "too fast". What about this additional
patch. It takes a little extra #ifdef magic to be really
nice for non-ares cases but I first need to know if this
cures your problem!
diff -u -r1.492 url.c
--- lib/url.c 21 Feb 2006 07:46:41 -0000 1.492
+++ lib/url.c 22 Feb 2006 17:28:58 -0000
@@ -3999,6 +3999,10 @@
Curl_pgrsDone(conn); /* done with the operation */
+ /* for ares-using, make sure all possible outstanding
requests are properly
+ cancelled before we proceed */
+ ares_cancel(data->state.areschannel);
+
/* if data->set.reuse_forbid is TRUE, it means the
libcurl client has
forced us to close this no matter what we think.
----------------------------------------------------------------------
Comment By: xbx (xbx___)
Date: 2006-02-22 16:04
Message:
Logged In: YES
user_id=1452265
I tried it.
1) without the patch, it crashes as usual.
2) with the line added, it crashes earlier in Curl_done
because easy_conn == Null
3) with this patch added to multi.c:
if (easy->easy_conn)
Curl_done(&easy->easy_conn, easy->result);
it actually triggers when the programs decides to abort the
request after a tieout.
and it supresses the crash from the bug report.
However, there still is the crash a bit later when the ares
request completes and calls the callbak with destroyed
pointers..
with this stack:
#0 0xb7f067a9 in addrinfo_callback (arg=0x8b19e3c,
status=0, addr=0x8a89894) at hostasyn.c:126
#1 0xb7f0685a in Curl_addrinfo4_callback (arg=0x8b19e3c,
status=0, hostent=0x8a89894) at hostasyn.c:161
#2 0xb7f0c97a in end_hquery (hquery=0x8a9f26c, status=0,
host=0x8a89894) at ares_gethostbyname.c:187
#3 0xb7f0c8eb in host_callback (arg=0x8a9f26c, status=0,
abuf=0xbfb4504b "\221\233\201\200", alen=128)
at ares_gethostbyname.c:169
#4 0xb7f0a735 in qcallback (arg=0x8ae3444, status=0,
abuf=0xbfb4504b "\221\233\201\200", alen=128)
at ares_query.c:111
#5 0xb7f0a3ba in end_query (channel=0x8a9caec,
query=0x8b198c4, status=0,
abuf=0xbfb4504b "\221\233\201\200", alen=128) at
ares_process.c:696
#6 0xb7f09b2b in process_answer (channel=0x8a9caec,
abuf=0xbfb4504b "\221\233\201\200", alen=128,
whichserver=1, tcp=0, now=1140620161) at ares_process.c:375
#7 0xb7f0990c in read_udp_packets (channel=0x8a9caec,
read_fds=0xbfb45338, now=1140620161)
at ares_process.c:289
#8 0xb7f09349 in ares_process (channel=0x8a9caec,
read_fds=0xbfb45338, write_fds=0xbfb452b8)
at ares_process.c:93
#9 0xb7f06225 in Curl_is_resolved (conn=0x8b306ec,
dns=0xbfb45410) at hostares.c:146
#10 0xb7f03594 in curl_multi_perform
(multi_handle=0x89c8e2c, running_handles=0xbfb454c4) at
multi.c:453
[...]
arg=0x8b19e3c being an invalid pointer.
looks like Curl_done doesn't abort pending ares requests..
----------------------------------------------------------------------
Comment By: Daniel Stenberg (bagder)
Date: 2006-02-22 12:55
Message:
Logged In: YES
user_id=1110
Your sample program segfaults repeatedly.
I could make it stop and behave fine using the patch below,
and I'd be interested to know if this fixes the "real"
problem you've been seeing. It basically just an added call
to Curl_done() when the handle is removed from the multi
stack and the transfer hasn't completed:
--- lib/multi.c 11 Feb 2006 22:35:17 -0000 1.69
+++ lib/multi.c 22 Feb 2006 11:53:35 -0000
@@ -245,6 +245,8 @@
Curl_easy_addmulti(easy->easy_handle, NULL); /* clear
the association
to this
multi handle */
+ Curl_done(&easy->easy_conn, easy->result);
+
/* make the previous node point to our next */
if(easy->prev)
easy->prev->next = easy->next;
----------------------------------------------------------------------
Comment By: Daniel Stenberg (bagder)
Date: 2006-02-21 10:57
Message:
Logged In: YES
user_id=1110
Splendid! I'll get working on this as soon as I can.
----------------------------------------------------------------------
Comment By: xbx (xbx___)
Date: 2006-02-21 10:17
Message:
Logged In: YES
user_id=1452265
Got a repro for you!
What is probably happening is that my program gets tired of
waiting for a reply and aborts the request, by doing a
curl_multi_remove_handle().
(from the manual: "Removing a handle while being used, will
effectively halt all transfers in progress." )
However, you get the crash from the initial bug report if
you do this with latest cvs:
int test(char *URL)
{
CURL *c;
CURLM *m;
CURLMcode res;
int running=1;
curl_global_init(CURL_GLOBAL_ALL);
c = curl_easy_init();
curl_easy_setopt(c, CURLOPT_URL, "http://www.google.com");
m = curl_multi_init();
res = curl_multi_add_handle(m, c);
do {
res = curl_multi_perform(m, &running);
} while (res==CURLM_CALL_MULTI_PERFORM);
curl_multi_remove_handle(m, c);
curl_easy_setopt(c, CURLOPT_URL, "http://www.google.com");
res = curl_multi_add_handle(m, c);
while (running) {
res = curl_multi_perform(m, &running);
if (running <= 0) {
fprintf(stderr, "nothing left running.\n");
break;
}
}
curl_multi_remove_handle(m, c);
curl_easy_cleanup(c);
curl_multi_cleanup(m);
return res;
}
(obviously, my program doesn't do smething as dumb, but when
then dns takes ages to respond, I guess this is equivalent.)
----------------------------------------------------------------------
Comment By: Daniel Stenberg (bagder)
Date: 2006-02-17 15:14
Message:
Logged In: YES
user_id=1110
Yes, it looks like the first transfer's ares resolve remains
"outstanding" when it stops, and the callback for it then
gets called when a subsequent transfer is running.
The only place in the code that stops outstanding ares
resolves is when a DNS timeout is handled, so I guess the
first transfer doesn't bail out because of this. Can you
figure out why/how it completes and why there seems to be a
outstanding resolve left pending?
I don't think this is a c-ares bug. I believe it is a
libcurl bug that is related to how libcurl uses c-ares.
Also, the ares_cancel() function get gets called in the DNS
timeout case (and only then) cancels _all_ outstanding ares
resolves which just can't be the right thing to do.
----------------------------------------------------------------------
Comment By: xbx (xbx___)
Date: 2006-02-17 10:42
Message:
Logged In: YES
user_id=1452265
What I imagine is the following scenario:
- a request is made, a connection (1) is created (but with
no sockets yet), and a dns request is made via ares.
- some time after that, an other similar request is made.
curl reuses (1) because it seems to fit. But at that point
it the socket is invalid, so it either
crashes (the initial bug report),
or, with the "fix" in SocketIsDead(), it notices it,
closes the connection (1) and starts a new one (2).
- after some more time, the first dns querry completes
successfully, the callback is called, with the connection
(1) as parameter which has been destructed -> crash with
callstack from message 2006-02-16 15:38 .
So, according to me the bug isn't in ares, but in curl: when
it closes connections because SocketIsDead() (which, I think
is a network failure recovery code; It shouldn't happen in
normal cases), it should cancel ares request as well (so
that the callback with destructed pointer isn't used).
And it shouldn't even be in the case where it think it can
reuse the connection, as it is still being worked on by a
previous request. -> that's what I tired to fix (with
success) by saying "connections with uninitialised sockets
cannot be reused". but I don't know if it's the correct fix.
(yes I'm doing simultatenous transfers with mutli.)
----------------------------------------------------------------------
Comment By: Daniel Stenberg (bagder)
Date: 2006-02-17 09:06
Message:
Logged In: YES
user_id=1110
The most interesting in that back trace is this:
#2 0xb7ed72c9 in addrinfo_callback (arg=0x8ae1a2c,
status=0, addr=0x8ae1fd8)
Status zero means OK, so that was a successful resolve
callback! The question is why does this callback get sent if
the connection is no longer OK.
Are you doing several simultanoues transfers with the multi
interface? I can see a problem in the code involving c-ares,
DNS timeouts and simultaneous transfers...
----------------------------------------------------------------------
Comment By: xbx (xbx___)
Date: 2006-02-16 15:38
Message:
Logged In: YES
user_id=1452265
yes that's what I said, it does crash in
addrinfo_callback(). With this callstack:
#0 0xb7ed35a5 in Curl_hash_add (h=0x21, key=0x8a824cc
"\001:-1209571840",
key_len=14, p=0x8b159cc) at hash.c:153
#1 0xb7eb0443 in Curl_cache_addr (data=0x8ae1c68,
addr=0x8b1580c,
hostname=0xb7e765d8 "\001", port=-1209571840) at
hostip.c:328
#2 0xb7ed72c9 in addrinfo_callback (arg=0x8ae1a2c,
status=0, addr=0x8ae1fd8)
at hostasyn.c:129
#3 0xb7ed7342 in Curl_addrinfo4_callback (arg=0x8ae1a2c,
status=0,
hostent=0x8ae1fd8) at hostasyn.c:161
#4 0xb7edaa73 in end_hquery (hquery=0x8b2c708,
status=Variable "status" is not available.
)
at ares_gethostbyname.c:187
#5 0xb7edaf27 in host_callback (arg=0x8b2c708, status=0,
abuf=0xbff164f7 ":v\201\200", alen=128) at
ares_gethostbyname.c:169
#6 0xb7ede63c in qcallback (arg=0x8a6b630, status=Variable
"status" is not available.
) at ares_query.c:111
#7 0xb7ed9cad in end_query (channel=0x8af0778,
query=0x8b22c20, status=Variable "status" is not available.
)
at ares_process.c:693
#8 0xb7eda1bd in process_answer (channel=0x8af0778,
abuf=0xbff164f7 ":v\201\200", alen=128, whichserver=1,
tcp=0,
now=1140100006) at ares_process.c:373
#9 0xb7eda73f in ares_process (channel=0x8af0778,
read_fds=0xbff167b8,
write_fds=0xbff16738) at ares_process.c:287
#10 0xb7ed6d0d in Curl_is_resolved (conn=0x8b27e6c,
dns=0xbff16890)
at hostares.c:146
#11 0xb7ed40bd in curl_multi_perform (multi_handle=0x8ad8f4c,
running_handles=0xbff16944) at multi.c:449
[...]
in order to reproduce it, I don't know if it's easy. (I'm
avoiding to change anything/reboot my computer because I'm
afraid to lose the situation..)
but I think the dns querries are very slow because the first
entry is 127.0.0.1 which is probaly running but not
configured correctly, and then a real dns ip. So I guess it
first try to ask 127.0.0.1 and after a timeout it gets the
answer from the other dns...
----------------------------------------------------------------------
Comment By: Daniel Stenberg (bagder)
Date: 2006-02-16 13:03
Message:
Logged In: YES
user_id=1110
The fact that you made it work by simply not re-using the
connection (leaving the struct around) but not when you
disconnect and kill the struct properly, certainly indicates
that "something" wrongly is trying to use that memory at a
later time.
I suspect it might be the c-ares callback that gets called
when the resolve worked or timed out, because it was somehow
not properly cancelled in a previous request.
Any chance you can temporarily remove your fix and add a
debug output or a break-point in
hostasync.c:addrinfo_callback() and check if this or a
related issue is perhaps the case?
Can you think of a way I could do repeat this problem in my end?
----------------------------------------------------------------------
Comment By: Daniel Stenberg (bagder)
Date: 2006-02-16 00:38
Message:
Logged In: YES
user_id=1110
Ah, I hadn't noticed you use c-ares!
I'll check your details closer tomorrow. Thanks for all the
hard work!
----------------------------------------------------------------------
Comment By: xbx (xbx___)
Date: 2006-02-15 22:00
Message:
Logged In: YES
user_id=1452265
I tried something for:
> - the connection shouldn't be reused
> as it is being initialised rather than finished...
in ConnectionExists(), I added a check so that connections
with (check->sock[FIRSTSOCKET] == CURL_SOCKET_BAD) wouldn't
be reused.
And the program doesn't seem to crash any longer. (can't be
100% sure, because it is a random bug)
But I don't know if this condition is supposed to mean
"being initialised connection" ....
----------------------------------------------------------------------
Comment By: xbx (xbx___)
Date: 2006-02-15 19:17
Message:
Logged In: YES
user_id=1452265
I think I understood what is going on!
I changed the SocketIsDead(), and it did remove the crash.
However the app crashed a bit later, in an ares callback,
because the connection data had been deleted and overwritten.
So I guess that what is happening is that a connection is
being created and waiting for a dns querry before being
fully initialized.
And before the dns querry is over, the connection is reused.
Here the fact that it is "dead" is discovered and the
connection is killed. And later when the callback from ares
is called, the connection has been destructed...
So I guess this is in fact 2 bugs:
- when a connection is destructed, ares calls should be aborted.
- the connection shouldn't be reused as it is being
initialised rather than finished...
but
1) I may be wrong
2) I have no idea how to correct any of these...
----------------------------------------------------------------------
Comment By: Daniel Stenberg (bagder)
Date: 2006-02-15 13:24
Message:
Logged In: YES
user_id=1110
Hm. Yes a first test for you would be to modify the
SocketIsDead() code to check for that situation.
I'm a bit reluctant to simply add that check and be happy,
since I consider it being a problem that the
sock[FIRSTSOCKET] value ended up like this in the first place...
----------------------------------------------------------------------
Comment By: xbx (xbx___)
Date: 2006-02-15 13:17
Message:
Logged In: YES
user_id=1452265
The CURLOPT_VERSBOSE doesn't show anything strange, (expect
that the reuse case isn't triggered as often as I thought..).
however, in ConnectionExists() there is something strange:
bool dead = SocketIsDead(check->sock[FIRSTSOCKET]);
returns False, although check->sock[FIRSTSOCKET]==-1
So I reckon that's why the (CURL_SOCKET_BAD ==
conn->sock[FIRST]) triggers in SetupConnection().
So, maybe SocketIsDead() should check for that case?
----------------------------------------------------------------------
Comment By: Daniel Stenberg (bagder)
Date: 2006-02-15 11:34
Message:
Logged In: YES
user_id=1110
Ah, right. That's NULL if the connection is re-used. So it
looks like it enters the SetupConnection() function and then
some 30 lines into the function the condition
if(CURL_SOCKET_BAD == conn->sock[FIRST])
equals true. And then it wrongly tries to re-connect using
the host address that is now NULL.
Can you figure out how it comes it has marked the socket as
being bad, while still believing the connection is being
re-used fine?
What does a CURLOPT_VERSBOSE output reveal? Does the
connection just prior to the failing one look fine and claim
to be leaving the connection alive? Does the failing one
claim to retry the connection or just re-use it normally
like other re-used connections?
----------------------------------------------------------------------
Comment By: xbx (xbx___)
Date: 2006-02-15 11:10
Message:
Logged In: YES
user_id=1452265
looks like it's because, url.c::3530
if(conn->bits.reuse) {
is true.
----------------------------------------------------------------------
Comment By: Daniel Stenberg (bagder)
Date: 2006-02-15 10:42
Message:
Logged In: YES
user_id=1110
Right, can you see if you can figure out why
CreateConnection() returns CURLE_OK while still not
returning any address and setting asnyncp to FALSE?
----------------------------------------------------------------------
You can respond by visiting:
https://sourceforge.net/tracker/?func=detail&atid=100976&aid=1431750&group_id=976
Received on 2006-02-23