Bugs item #1431750, was opened at 2006-02-14 20:59
Message generated for change (Comment added) made by xbx___
You can respond by visiting:
https://sourceforge.net/tracker/?func=detail&atid=100976&aid=1431750&group_id=976
Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: libcurl
Group: crash
Status: Open
Resolution: None
Priority: 8
Submitted By: xbx (xbx___)
Assigned to: Daniel Stenberg (bagder)
Summary: crash when dns slow
Initial Comment:
I have a strange setup with the dns taking a long time
to answer.
I run my app that make periodic http(s) requests.
Everything goes fine, but after a while (that is, after
maybe 10 successfull requests to the same host) I have
this crash:
#0 0xb7fade01 in Curl_connecthost (conn=0x8a73094,
remotehost=0x0,
sockconn=0x8a7312c, addr=0xbfaefdc4,
connected=0xbfaefe07 "")
at connect.c:789
#1 0xb7f9ed04 in ConnectPlease (conn=0x8a73094,
hostaddr=0x0,
connected=0xbfaefe07 "") at url.c:2017
#2 0xb7fa19f3 in SetupConnection (conn=0x8a73094,
hostaddr=0x0,
protocol_done=0xbfaefe88 "") at url.c:3670
#3 0xb7fa1b08 in Curl_connect (data=0x8ab48bc,
in_connect=0x8b02798,
asyncp=0xbfaefe89 "", protocol_done=0xbfaefe88 "")
at url.c:3729
#4 0xb7faf002 in curl_multi_perform
(multi_handle=0x8aeac1c,
running_handles=0xbfaeff34) at multi.c:422
[....]
(notice the hostaddr=0x0 part)
I'm using libcurl 7.15.1, with this configure options:
./configure --prefix=/usr/ --disable-gopher
--disable-ldap --disable-dict --disable-telnet
--enable-nonblocking --enable-debug
--disable-crypto-auth --enable-ares --disable-ipv6
I could further investigate, but I'd have to know which
way to look..
More precisely,
code = CreateConnection(data, in_connect, &dns, asyncp);
returns:
code == OK
dns == null
asyncp == false
Is the problem that 'code' shouldn't be 'ok', or that
the program shouldn't crash when dns == Null?
----------------------------------------------------------------------
>Comment By: xbx (xbx___)
Date: 2006-02-16 15:38
Message:
Logged In: YES
user_id=1452265
yes that's what I said, it does crash in
addrinfo_callback(). With this callstack:
#0 0xb7ed35a5 in Curl_hash_add (h=0x21, key=0x8a824cc
"\001:-1209571840",
key_len=14, p=0x8b159cc) at hash.c:153
#1 0xb7eb0443 in Curl_cache_addr (data=0x8ae1c68,
addr=0x8b1580c,
hostname=0xb7e765d8 "\001", port=-1209571840) at
hostip.c:328
#2 0xb7ed72c9 in addrinfo_callback (arg=0x8ae1a2c,
status=0, addr=0x8ae1fd8)
at hostasyn.c:129
#3 0xb7ed7342 in Curl_addrinfo4_callback (arg=0x8ae1a2c,
status=0,
hostent=0x8ae1fd8) at hostasyn.c:161
#4 0xb7edaa73 in end_hquery (hquery=0x8b2c708,
status=Variable "status" is not available.
)
at ares_gethostbyname.c:187
#5 0xb7edaf27 in host_callback (arg=0x8b2c708, status=0,
abuf=0xbff164f7 ":v\201\200", alen=128) at
ares_gethostbyname.c:169
#6 0xb7ede63c in qcallback (arg=0x8a6b630, status=Variable
"status" is not available.
) at ares_query.c:111
#7 0xb7ed9cad in end_query (channel=0x8af0778,
query=0x8b22c20, status=Variable "status" is not available.
)
at ares_process.c:693
#8 0xb7eda1bd in process_answer (channel=0x8af0778,
abuf=0xbff164f7 ":v\201\200", alen=128, whichserver=1,
tcp=0,
now=1140100006) at ares_process.c:373
#9 0xb7eda73f in ares_process (channel=0x8af0778,
read_fds=0xbff167b8,
write_fds=0xbff16738) at ares_process.c:287
#10 0xb7ed6d0d in Curl_is_resolved (conn=0x8b27e6c,
dns=0xbff16890)
at hostares.c:146
#11 0xb7ed40bd in curl_multi_perform (multi_handle=0x8ad8f4c,
running_handles=0xbff16944) at multi.c:449
[...]
in order to reproduce it, I don't know if it's easy. (I'm
avoiding to change anything/reboot my computer because I'm
afraid to lose the situation..)
but I think the dns querries are very slow because the first
entry is 127.0.0.1 which is probaly running but not
configured correctly, and then a real dns ip. So I guess it
first try to ask 127.0.0.1 and after a timeout it gets the
answer from the other dns...
----------------------------------------------------------------------
Comment By: Daniel Stenberg (bagder)
Date: 2006-02-16 13:03
Message:
Logged In: YES
user_id=1110
The fact that you made it work by simply not re-using the
connection (leaving the struct around) but not when you
disconnect and kill the struct properly, certainly indicates
that "something" wrongly is trying to use that memory at a
later time.
I suspect it might be the c-ares callback that gets called
when the resolve worked or timed out, because it was somehow
not properly cancelled in a previous request.
Any chance you can temporarily remove your fix and add a
debug output or a break-point in
hostasync.c:addrinfo_callback() and check if this or a
related issue is perhaps the case?
Can you think of a way I could do repeat this problem in my end?
----------------------------------------------------------------------
Comment By: Daniel Stenberg (bagder)
Date: 2006-02-16 00:38
Message:
Logged In: YES
user_id=1110
Ah, I hadn't noticed you use c-ares!
I'll check your details closer tomorrow. Thanks for all the
hard work!
----------------------------------------------------------------------
Comment By: xbx (xbx___)
Date: 2006-02-15 22:00
Message:
Logged In: YES
user_id=1452265
I tried something for:
> - the connection shouldn't be reused
> as it is being initialised rather than finished...
in ConnectionExists(), I added a check so that connections
with (check->sock[FIRSTSOCKET] == CURL_SOCKET_BAD) wouldn't
be reused.
And the program doesn't seem to crash any longer. (can't be
100% sure, because it is a random bug)
But I don't know if this condition is supposed to mean
"being initialised connection" ....
----------------------------------------------------------------------
Comment By: xbx (xbx___)
Date: 2006-02-15 19:17
Message:
Logged In: YES
user_id=1452265
I think I understood what is going on!
I changed the SocketIsDead(), and it did remove the crash.
However the app crashed a bit later, in an ares callback,
because the connection data had been deleted and overwritten.
So I guess that what is happening is that a connection is
being created and waiting for a dns querry before being
fully initialized.
And before the dns querry is over, the connection is reused.
Here the fact that it is "dead" is discovered and the
connection is killed. And later when the callback from ares
is called, the connection has been destructed...
So I guess this is in fact 2 bugs:
- when a connection is destructed, ares calls should be aborted.
- the connection shouldn't be reused as it is being
initialised rather than finished...
but
1) I may be wrong
2) I have no idea how to correct any of these...
----------------------------------------------------------------------
Comment By: Daniel Stenberg (bagder)
Date: 2006-02-15 13:24
Message:
Logged In: YES
user_id=1110
Hm. Yes a first test for you would be to modify the
SocketIsDead() code to check for that situation.
I'm a bit reluctant to simply add that check and be happy,
since I consider it being a problem that the
sock[FIRSTSOCKET] value ended up like this in the first place...
----------------------------------------------------------------------
Comment By: xbx (xbx___)
Date: 2006-02-15 13:17
Message:
Logged In: YES
user_id=1452265
The CURLOPT_VERSBOSE doesn't show anything strange, (expect
that the reuse case isn't triggered as often as I thought..).
however, in ConnectionExists() there is something strange:
bool dead = SocketIsDead(check->sock[FIRSTSOCKET]);
returns False, although check->sock[FIRSTSOCKET]==-1
So I reckon that's why the (CURL_SOCKET_BAD ==
conn->sock[FIRST]) triggers in SetupConnection().
So, maybe SocketIsDead() should check for that case?
----------------------------------------------------------------------
Comment By: Daniel Stenberg (bagder)
Date: 2006-02-15 11:34
Message:
Logged In: YES
user_id=1110
Ah, right. That's NULL if the connection is re-used. So it
looks like it enters the SetupConnection() function and then
some 30 lines into the function the condition
if(CURL_SOCKET_BAD == conn->sock[FIRST])
equals true. And then it wrongly tries to re-connect using
the host address that is now NULL.
Can you figure out how it comes it has marked the socket as
being bad, while still believing the connection is being
re-used fine?
What does a CURLOPT_VERSBOSE output reveal? Does the
connection just prior to the failing one look fine and claim
to be leaving the connection alive? Does the failing one
claim to retry the connection or just re-use it normally
like other re-used connections?
----------------------------------------------------------------------
Comment By: xbx (xbx___)
Date: 2006-02-15 11:10
Message:
Logged In: YES
user_id=1452265
looks like it's because, url.c::3530
if(conn->bits.reuse) {
is true.
----------------------------------------------------------------------
Comment By: Daniel Stenberg (bagder)
Date: 2006-02-15 10:42
Message:
Logged In: YES
user_id=1110
Right, can you see if you can figure out why
CreateConnection() returns CURLE_OK while still not
returning any address and setting asnyncp to FALSE?
----------------------------------------------------------------------
You can respond by visiting:
https://sourceforge.net/tracker/?func=detail&atid=100976&aid=1431750&group_id=976
Received on 2006-02-16