curl-library
Some Sites Don't Timeout
Date: Tue, 4 Mar 2008 15:46:32 -0500
Hey there,
I've been using libcurl successfully for about a week in some new
code, but am having a strange problem in that some sites just don't
timeout. Here's a gdb dump of where it gets stuck (it'd run all day if
I let it).
One domain my crawler is trying to fetch is abroeiu.com, which is
timing out on connect.
(gdb) info threads
3 Thread 1094719840 (LWP 10069) 0x0000003cb1ebd9a2 in poll ()
from /lib64/tls/libc.so.6
2 Thread 1084229984 (LWP 10068) 0x0000003cb1ebd9a2 in poll ()
from /lib64/tls/libc.so.6
* 1 Thread 182900061440 (LWP 10065) 0x0000003cb1e8f7d5 in
__nanosleep_nocancel () from /lib64/tls/libc.so.6
(gdb) thread 2
[Switching to thread 2 (Thread 1084229984 (LWP 10068))]#0
0x0000003cb1ebd9a2 in poll ()
from /lib64/tls/libc.so.6
(gdb) bt
#0 0x0000003cb1ebd9a2 in poll () from /lib64/tls/libc.so.6
#1 0x0000002a958968ea in Curl_socket_ready (readfd=-1, writefd=10,
timeout_ms=1896113078) at select.c:218
#2 0x0000002a9588f2f2 in waitconnect (sockfd=Variable "sockfd" is not
available.
) at connect.c:200
#3 0x0000002a9588fab8 in singleipconnect (conn=0x5415d0, ai=Variable
"ai" is not available.
) at connect.c:766
#4 0x0000002a9588ff63 in Curl_connecthost (conn=0x5415d0,
remotehost=0x55de80, sockconn=0x5416e8,
addr=0x40a00040, connected=0x40a0004f) at connect.c:894
#5 0x0000002a95883042 in SetupConnection (conn=0x5415d0,
hostaddr=0x55de80, protocol_done=0x40a000bf)
at url.c:2633
#6 0x0000002a95884d23 in Curl_async_resolved (conn=0x5415d0,
protocol_done=Variable "protocol_done" is not available.
) at url.c:4390
#7 0x0000002a9588dea7 in Curl_perform (data=0x526180) at transfer.c:
2279
#8 0x0000000000403b55 in process_url (CTX=0x514d90, url=0x514df0) at
phishd.c:536
#9 0x0000000000403f74 in process_site (ptr=Variable "ptr" is not
available.
) at phishd.c:432
#10 0x0000003cb290610a in start_thread () from /lib64/tls/
libpthread.so.0
#11 0x0000003cb1ec68c3 in clone () from /lib64/tls/libc.so.6
#12 0x0000000000000000 in ?? ()
(gdb)
Here is the code I am using to invoke libcurl. Any suggestions to get
this working would be appreciated:
CURL *curl;
CURLcode res;
struct curl_slist *slist = NULL;
long one = 1;
long max_redirs = 10;
long curl_timeout = 30;
long curl_connect_timeout = 15;
curl = curl_easy_init();
if (!curl) {
LOG(LOG_CRIT, ERR_CURL_INIT_FAIL, strerror(errno));
return EINVAL;
}
slist = curl_slist_append(slist,
"User-Agent: Mozilla/4.0 (compatible; MSIE 6.0; Windows NT
5.1; SV1)");
slist = curl_slist_append(slist, "Cache-Control: max-age=0");
slist = curl_slist_append(slist, "Accept-Language: en-
us,en;q=0.5");
slist = curl_slist_append(slist, "Accept-Encoding: ");
slist = curl_slist_append(slist,
"Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.7");
slist = curl_slist_append(slist,
"Accept: text/xml,application/xml,application/xhtml+xml,text/
html;"
"q=0.9,text/plain;q=0.8,image/png,*/*;q=0.5");
curl_easy_setopt(curl, CURLOPT_URL, url->url);
curl_easy_setopt(curl, CURLOPT_SSL_VERIFYPEER, &one);
curl_easy_setopt(curl, CURLOPT_SSL_VERIFYHOST, &one);
curl_easy_setopt(curl, CURLOPT_FOLLOWLOCATION, 1);
curl_easy_setopt(curl, CURLOPT_MAXREDIRS, &max_redirs);
curl_easy_setopt(curl, CURLOPT_NOSIGNAL, &one);
curl_easy_setopt(curl, CURLOPT_TCP_NODELAY, &one);
curl_easy_setopt(curl, CURLOPT_AUTOREFERER, 1);
curl_easy_setopt(curl, CURLOPT_HTTPHEADER, slist);
curl_easy_setopt(curl, CURLOPT_WRITEFUNCTION, write_callback);
curl_easy_setopt(curl, CURLOPT_WRITEDATA, &buffer);
curl_easy_setopt(curl, CURLOPT_TIMEOUT, &curl_timeout);
curl_easy_setopt(curl, CURLOPT_CONNECTTIMEOUT,
&curl_connect_timeout);
res = curl_easy_perform(curl);
Jonathan
Received on 2008-03-04