curl-and-php
Re: curl not scraping Microsoft.com (something wrong?)
Date: Fri, 15 Apr 2005 14:07:31 -0400 (EDT)
Problem solved, thanks to Kirk!!
Here is Kirk's code that seems to work for most websites. (Perhaps this
should be added to the examples page?)
<?php
$ch = curl_init();
$header[] = "Accept:text/xml,application/xml,application/xhtml+xml,text/html;q=0.9,text/plain;q=0.8,image/png,*/*;q=0.5";
$header[] = "Accept-Language: en-us,en;q=0.5";
$header[] = "Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.7";
$header[] = "Keep-Alive: 300";
$header[] = "Pragma:";
curl_setopt ($ch, CURLOPT_URL,
"http://yourdomainhere.com/Search.aspx?action=search");
curl_setopt ($ch, CURLOPT_USERAGENT,
"Mozilla/4.73 [en] (X11; U; Linux 2.2.15 i686)");
curl_setopt ($ch, CURLOPT_COOKIEJAR, "cookies.txt");
curl_setopt ($ch, CURLOPT_COOKIEFILE, "cookies.txt");
curl_setopt ($ch, CURLOPT_HEADER, 1);
curl_setopt ($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt ($ch, CURLOPT_TIMEOUT, 300);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, 1);
curl_setopt ($ch, CURLOPT_ENCODING,"");
curl_setopt($ch, CURLOPT_HTTPHEADER, $header);
$string = curl_exec ($ch);
curl_close($ch);
echo $string;
?>
On Thu, 14 Apr 2005, Kirk Hedden wrote:
>
> >
> >I'm sure you have had many successes and can do this in a blink of an eye.
> >But as a novice, I've been able to retrieve 1 out of 3 sites.
> >
> >Perhaps I missed where on your website it explains precisely *what* needs
> >to be copied from the headers and *how* to do it in curl, in a
> >step-by-step fashion. I'm not expecting a cookie cutter solution, but it
> >shouldn't be some mystical process, either.
> >
> >Any help will be much appreciated!
>
> I don't know why I did this, but I ran the url through my code and it
> worked, so I looked to see what I was doing that you weren't.
>
> You need to set the CURLOPT_COOKIEJAR option.
>
> CURL is not an http primer. It's an http tool. If you don't know http,
> you'll have a hard time using it. The docs aren't the best, but the info is
> there.
>
> Best,
> Kirk
>
Received on 2005-04-15