cURL / Mailing Lists / curl-library / Single Mail

curl-library

Re: is it possible to curl browser rendered pages ?

From: Ralph Mitchell <ralphmitchell_at_gmail.com>
Date: Tue, 8 Dec 2015 23:58:08 -0500

On Tue, Dec 8, 2015 at 6:05 PM, <home_at_node.de> wrote:

> If I try to curl this page, then I get only a few lines of source code
> inside curl instead of the whole page with content.
>
> Are there special parameters needed to get the whole page source with curl
> ?
>
> this page as an example, does not load the whole source with curl:
>
> http://www.immobilienscout24.de/anbieter/suchen/Baden-Wuerttemberg/Baden-Baden?geocodeid=1276001002&focustype=2&order=ALPHABETICAL
>
>
Actually, curl does give you the whole source. If you bring that page up
in a browser, then right-click-view-source, you'll see the exact same html
source that curl hands you. What the browser does, which curl does not, is
it then searches that html source for html tags (e.g. img, src, stylesheet
href's, etc) and goes back to fetch those, possibly from multiple other
sites. The browser also interprets any incoming javascript, which can be
used to generate page source (e.g. document.write(), etc) as well. Lather,
rinse, repeat, until there are no more html tags to process.

Curl doesn't try to do the recursive html interpretation, it just fetches
whatever link targets you hand it.

Ralph Mitchell

-------------------------------------------------------------------
List admin: http://cool.haxx.se/list/listinfo/curl-library
Etiquette: http://curl.haxx.se/mail/etiquette.html
Received on 2015-12-09

This message: [ Message body ]
Next message: Zheng, Fred: "RE: Curl_resolv_timeout crash in AIX"
Previous message: Erik Ronström: "Re: is it possible to curl browser rendered pages ?"
In reply to: home_at_node.de: "is it possible to curl browser rendered pages ?"

Contemporary messages sorted: [ by date ] [ by thread ] [ by subject ] [ by author ] [ by messages with attachments ]