cURL / Mailing Lists / curl-and-php / Single Mail

curl-and-php

RE: CURLE_COULDNT_CONNECT on some valid sites

From: Liu Shan Shui <me_at_lx.sg>
Date: Sat, 20 Jun 2009 16:16:19 +0800

Hi Daniel,

 

Is it *always* the same few sites that don't work? If so, try calling
file_get_contents('URL') and perform pings/telnets on those sites via your
server. And if they don't work as well, the cause of the problem most likely
lies in your server's network configuration (firewall, DNS etc.) rather than
cURL.

 

With regards,

Liu Shan Shui

me_at_lx.sg

"Life would be much easier if I had the source code." - Anonymous

 

From: curl-and-php-bounces_at_cool.haxx.se
[mailto:curl-and-php-bounces_at_cool.haxx.se] On Behalf Of Daniel Marshall
Sent: Friday, June 19, 2009 12:23 AM
To: curl-and-php_at_cool.haxx.se
Subject: CURLE_COULDNT_CONNECT on some valid sites

 

I've searched the mailing list and haven't been able to find a
solution/reason for my problem, I've also tried a laundry list of curlopt
settings to get a result, with no luck.

So, the problem is: I have curl installed on about 25 servers, and have
tested from all of them, with the same results, so I assume it's a problem
with specific target domain setup and my curl options.
An example of one of the servers:
Server: Apache/1.3.37 (Unix) PHP/5.1.6 mod_auth_passthrough/1.8
mod_log_bytes/1.2 mod_bwlimited/1.4 FrontPage/5.0.2.2635.SR1.2
mod_ssl/2.8.28 OpenSSL/0.9.7a
libcurl/7.15.3 zlib/1.2.1.2

Basically what it comes down to is this. I have a list of (~75000) sites
that I need to check the status of.
In order to do this, I just send a curl request to the index, and I grab the
http status (ideally 200) and cache an imprint of the page.
Works fine in most cases, but there are a couple of domains that a) I need
to extend the timeout from 5s to 10s, else I timeout, and then I get a
CURLE_COULDNT_CONNECT error.
Now, If I visit the site in a browser, I connect, no problems, and see the
page.
Also, the page is http://www.domain.com, I curl connect to the exact same
page, no https, no redirection I am aware of.

I am setting the following options:

            $header = array
            (
                "Accept:
text/xml,application/xml,application/xhtml+xml,text/html;q=0.9,text/plain;q=
0.8,image/png,*/*;q=0.5",
                "Cache-Control: max-age=0",
                "Connection: keep-alive",
                "Keep-Alive: 300",
                "Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.7",
                "Accept-Language: en-us,en;q=0.5",
                "Pragma: ", // browsers keep this blank.
            );

            curl_setopt( $ch, CURLOPT_URL, 'http://www.domain.com' ); //
http, no trailing slash
            curl_setopt( $ch, CURLOPT_HTTPHEADER, $header );
            curl_setopt( $ch, CURLOPT_USERAGENT, 'Googlebot/2.1
(+http://www.google.com/bot.html)' );
            curl_setopt( $ch, CURLOPT_REFERER, 'http://www.google.com' );
            curl_setopt( $ch, CURLOPT_ENCODING, 'gzip,deflate' );
            curl_setopt( $ch, CURLOPT_RETURNTRANSFER, 1 );
            curl_setopt( $ch, CURLOPT_HTTP_VERSION, CURL_HTTP_VERSION_1_0 );
            curl_setopt( $ch, CURLOPT_FOLLOWLOCATION, 1 );
            curl_setopt( $ch, CURLOPT_MAXREDIRS, 10 );
            curl_setopt( $ch, CURLOPT_AUTOREFERER, 1 );
            curl_setopt( $ch, CURLOPT_SSL_VERIFYPEER, FALSE );

            curl_setopt( $ch, CURLOPT_TIMEOUT, 10 );

And as mentioned, this works fine on 99% of what I try, but a few it fails
on, even though I have no problems with a web browser.

Any idea on possible causes, curlopts I should change/add to try for
success?

_______________________________________________
http://cool.haxx.se/cgi-bin/mailman/listinfo/curl-and-php
Received on 2009-06-20