cURL / Mailing Lists / curl-users / Single Mail

curl-users

RE: help to download web pages

From: Wesley Sgroi <sgroiwes_at_cox.net>
Date: Wed, 17 Dec 2003 10:19:58 -0700

Kjell, Thanks for responding.

I added the option: curl_setopt ($ch, CURLOPT_USERAGENT, $user_agent);
where $user_agent is "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1;
.NET CLR 1.0.3705)".
I also tried a couple other modifications for $user_agent which are
commented out in my code below.

However, I am still only receiving the frame back, which for some reason is
not printed to the
screen by my instruction: echo $result;

I am able to view the returned information by viewing the source in my web
browser. It displays only
the frame shown below. The body of the returned page states that my
browswer does not support frames.

Any further help you can provide would be appreciated.

Thank you,
Wesley Sgroi

==========================
[html response using curl]
============================================================================
========================
<html>
        <head>
                <title>cars.com Search Results</title>
        </head>

        <frameset rows="*,0" cols="*" bordercolor="#CCCCCC" frameborder="NO">

                  <frame name="top" scrolling="AUTO" noresize target="middle"

src="/search/used/cc/standard/results/multiple/search_results.jhtml?aff=nati
onal&src=">

                  <frame name="preview"
src="/search/used/cc/standard/results/single/initial_preview.jhtml?aff=natio
nal&cid="
                                                scrolling="auto">

                <noframes>
                        <body>
                          <p>This page uses frames, but your browser doesn't support them.</p>
                        </body>
                </noframes>

        </frameset>
</html>
============================================================================
============================

[code]
//--------------------------------------------------------------------------
----------------------------
<?php
// Download the result of a search at cars.com
// The search is completed by posting (HTTP GET) to their form,
// submitting the form, following redirection, and downloading resulting
page.
// The url it retrieves, based on the 'print_r( curl_getinfo($ch) )' is
correct,
// however, it only download the html frame, and not the page within the
html frame.

$url =
"http://www.cars.com/search/used/cc/standard/process/search.jhtml?mknm=BMW&m
dnm=325&zc=85251&rd=30";
//$user_agent = "Mozilla/3.0 (Win95; I)";
//$user_agent = "Mozilla/4.0 (compatible; MSIE 5.01; Windows NT 5.0)";
$user_agent = $_SERVER['HTTP_USER_AGENT'];
print "User agent: " . $user_agent . "\n";

$ch = curl_init(); // initialize curl handle
curl_setopt ($ch, CURLOPT_USERAGENT, $user_agent);
curl_setopt($ch, CURLOPT_COOKIEJAR, "cook"); //Tried setting cookies based
on
curl_setopt($ch, CURLOPT_COOKIEFILE, "cook"); //a cURL mailing list archive
response.
curl_setopt($ch, CURLOPT_URL, $url); // set url to post to
curl_setopt($ch, CURLOPT_FAILONERROR, 1);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, 1);// allow redirects
curl_setopt($ch, CURLOPT_RETURNTRANSFER,1); // return into a variable
curl_setopt($ch, CURLOPT_TIMEOUT, 10); // times out

$result = curl_exec($ch); // run the whole process

print "<pre>";
print_r(curl_getinfo($ch));
echo "\n\ncURL error number:" .curl_errno($ch);
echo "\n\ncURL error:" . curl_error($ch);
print "</pre>";

curl_close($ch);

// remove the cookie jar
//unlink("cook") or die("Can't unlink cook");

echo $result;

?>
//--------------------------------------------------------------------------
----------------------
[/code]

Thank you for your help,
Wesley Sgroi

-----Original Message-----
From: curl-users-admin_at_lists.sourceforge.net
[mailto:curl-users-admin_at_lists.sourceforge.net]On Behalf Of Kjell
Ericson
Sent: Tuesday, December 16, 2003 2:43 PM
To: curl-users_at_lists.sourceforge.net
Subject: Re: help to download web pages

On Tue, 16 Dec 2003, Peter Ping wrote:

> Tried to download a web page via curl and was not successful. Viewed
> downloaded source code and found "This web page uses frames, but your
> browser doesn't support them." even I already specified that "in-frame"
> page. Could you please help me how to solve the problem?

Try to identify yourself as a browser:

 curl -A "Mozilla/3.0 (Win95; I)" http://www.framesite.net/page.html

I've used this technique sometime myself, but I also used it for hiding that
I'm curling pages (some webmaster don't like when you sniff their contents).

  // Kjell

-------------------------------------------------------
This SF.net email is sponsored by: IBM Linux Tutorials.
Become an expert in LINUX or just sharpen your skills. Sign up for IBM's
Free Linux Tutorials. Learn everything from the bash shell to sys admin.
Click now! http://ads.osdn.com/?ad_id=1278&alloc_id=3371&op=click

-------------------------------------------------------
This SF.net email is sponsored by: IBM Linux Tutorials.
Become an expert in LINUX or just sharpen your skills. Sign up for IBM's
Free Linux Tutorials. Learn everything from the bash shell to sys admin.
Click now! http://ads.osdn.com/?ad_id=1278&alloc_id=3371&op=click
Received on 2003-12-17