cURL / Mailing Lists / curl-and-php / Single Mail

curl-and-php

CURLE_COULDNT_CONNECT on some valid sites

From: Daniel Marshall <dashiva.lunatic_at_gmail.com>
Date: Thu, 18 Jun 2009 11:22:50 -0500

I've searched the mailing list and haven't been able to find a
solution/reason for my problem, I've also tried a laundry list of curlopt
settings to get a result, with no luck.

So, the problem is: I have curl installed on about 25 servers, and have
tested from all of them, with the same results, so I assume it's a problem
with specific target domain setup and my curl options.
An example of one of the servers:
Server: Apache/1.3.37 (Unix) PHP/5.1.6 mod_auth_passthrough/1.8
mod_log_bytes/1.2 mod_bwlimited/1.4 FrontPage/5.0.2.2635.SR1.2
mod_ssl/2.8.28 OpenSSL/0.9.7a
libcurl/7.15.3 zlib/1.2.1.2

Basically what it comes down to is this. I have a list of (~75000) sites
that I need to check the status of.
In order to do this, I just send a curl request to the index, and I grab the
http status (ideally 200) and cache an imprint of the page.
Works fine in most cases, but there are a couple of domains that a) I need
to extend the timeout from 5s to 10s, else I timeout, and then I get a
CURLE_COULDNT_CONNECT error.
Now, If I visit the site in a browser, I connect, no problems, and see the
page.
Also, the page is http://www.domain.com, I curl connect to the exact same
page, no https, no redirection I am aware of.

I am setting the following options:

            $header = array
            (
                "Accept:
text/xml,application/xml,application/xhtml+xml,text/html;q=0.9,text/plain;q=0.8,image/png,*/*;q=0.5",
                "Cache-Control: max-age=0",
                "Connection: keep-alive",
                "Keep-Alive: 300",
                "Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.7",
                "Accept-Language: en-us,en;q=0.5",
                "Pragma: ", // browsers keep this blank.
            );

            curl_setopt( $ch, CURLOPT_URL, 'http://www.domain.com' ); //
http, no trailing slash
            curl_setopt( $ch, CURLOPT_HTTPHEADER, $header );
            curl_setopt( $ch, CURLOPT_USERAGENT, 'Googlebot/2.1 (+
http://www.google.com/bot.html)' );
            curl_setopt( $ch, CURLOPT_REFERER, 'http://www.google.com' );
            curl_setopt( $ch, CURLOPT_ENCODING, 'gzip,deflate' );
            curl_setopt( $ch, CURLOPT_RETURNTRANSFER, 1 );
            curl_setopt( $ch, CURLOPT_HTTP_VERSION, CURL_HTTP_VERSION_1_0 );
            curl_setopt( $ch, CURLOPT_FOLLOWLOCATION, 1 );
            curl_setopt( $ch, CURLOPT_MAXREDIRS, 10 );
            curl_setopt( $ch, CURLOPT_AUTOREFERER, 1 );
            curl_setopt( $ch, CURLOPT_SSL_VERIFYPEER, FALSE );

            curl_setopt( $ch, CURLOPT_TIMEOUT, 10 );

And as mentioned, this works fine on 99% of what I try, but a few it fails
on, even though I have no problems with a web browser.

Any idea on possible causes, curlopts I should change/add to try for
success?

_______________________________________________
http://cool.haxx.se/cgi-bin/mailman/listinfo/curl-and-php
Received on 2009-06-18