curl-library
Can't download URL via libcurl but can using curl
Date: Mon, 29 Mar 2010 21:11:34 +0000 (GMT)
Hi,
I'm trying to use libcurl to download the RSS feed from Google News. The
default feed given to me is
http://news.google.com/news?pz=1&cf=all&ned=uk&hl=en&topic=h&num=3&output=rss
When I try to get it using libcurl the server gives me a HTML webpage (200
response). In CURLOPT_VERBOSE mode I get:
* About to connect() to news.google.com port 80 (#0)
* Trying 74.125.79.99... * Connected to news.google.com (74.125.79.99) port 80 (#0)
> GET /news?pz=1&cf=all&ned=uk&hl=en&topic=h&num=3&output=rss HTTP/1.1
User-Agent: myapplication/1.0
Host: news.google.com
Accept: */*
< HTTP/1.1 200 OK
< Content-Type: text/html; charset=UTF-8
I can get the RSS feed via the curl command line no problems,
curl -i -A "myapplication/1.0" "http://news.google.com/news?pz=1&cf=all&ned=uk&hl=en&topic=h&num=3&output=rss"
This gives XML (also a 200 response).
You'll notice that Google doesn't like to be scraped, hence setting a user
agent string. I'm thinking that they detect my application as a scraper so
they serve me the HTML. Another possibility is that the URL is not formed
properly. My application is passing & to libcurl instead of &.
The curl tool can get the XML using this URL and the same user agent
string as my application so I don't see why I can't get it.
I tried looking at the output from curl using --libcurl but can't see any
reason why my application is different.
Here is the code I am using:
handle = curl_easy_init();
// Set up options
curl_easy_setopt(handle, CURLOPT_URL, url.ascii());
#if DEBUG
curl_easy_setopt(handle, CURLOPT_VERBOSE, 1);
#endif
curl_easy_setopt(handle, CURLOPT_USERAGENT, useragent.ascii());
curl_easy_setopt(handle, CURLOPT_TIMEOUT, timeout);
if(!proxy.isEmpty())
curl_easy_setopt(handle, CURLOPT_PROXY, proxy.ascii());
curl_easy_setopt(handle, CURLOPT_FOLLOWLOCATION, 1);
It's probably a case of not seeing the wood for the trees.
What am I doing wrong ?
-------------------------------------------------------------------
List admin: http://cool.haxx.se/list/listinfo/curl-library
Etiquette: http://curl.haxx.se/mail/etiquette.html
Received on 2010-03-29