curl-and-php
Re: problem accessing sites
Date: Wed, 13 Mar 2002 09:37:51 +0100 (MET)
On Tue, 12 Mar 2002 theexperts_at_allprodirect.com wrote:
> I am using curl to pull data from other sites' web pages. However, some
> sites are not working. curl is using cookies and http authentication
> successfully. However, the sites in question return 301 or 302 status
> codes. If curl is set to follow the redirection links, it goes to wrong
> pages, as the links provided are not correct.
>
> It seems as if the sites are returning 3xx error codes and false links to
> prevent access. But how would they know who to block access to? Somehow the
> browser gets the right pages. What is different about the request coming
> from curl?
The usual: referer, user-agent and other headers. Even the orders of the
headers *could* be used to detect differences.
> The only difference I can find is the IP.
They hardly do IP filtering if they are a public site.
> I am wondering if they block access to Class C IP addresses such as my web
> host has, but not Class A IP addresses such as my ISP has. Does anyone know
> if this technique is used? I thought maybe ISPs usually have Class A
> addresses while web hosting providers usually have Class C addresses.
> Anyone know?
Nothing is impossible, they might do whatever they think is fit, but I
consider it highly unlikely that they would shut off that big parts of the
world. Why would they prevent people to access their site based on the source
IP?
> Either that or maybe curl isn't handling it right.
That's also a possibility, but I don't think that is the most likely reason
either.
> Like maybe it's not receiving the whole response.
It is quite simple to check that for yourself. telnet to the site's port 80
and enter the HTTP request by hand and see for yourself.
> HTTP/1.1 302 Found Date: Tue, 12 Mar 2002 23:20:29 GMT Server: Apache/1.3.22
> Ben-SSL/1.44 (Unix) AuthMySQL/2.20 PHP/4.1.2 X-Powered-By: PHP/4.1.2
> Location: http://www.foo.com/login.html Transfer-Encoding: chunked
> Content-Type: text/html
>
> Plus a small html page.
Looks like a perfectly valid response.
I'd suggest that you run your network sniffer program (tcpdump, ethereal,
whatever) to figure out exactly what your browser sends that works, and then
you make your curl request as similar to that as possible.
-- Daniel Stenberg -- curl groks URLs -- http://curl.haxx.se/Received on 2002-03-13