cURL / Mailing Lists / curl-users / Single Mail

curl-users

Re: using curl to download files where the url does not end in file.ext

From: Alessandro Vesely <vesely_at_tana.it>
Date: Thu, 13 Sep 2007 09:24:59 +0200

scatterp wrote:
>
>
> ----- Original Message ----
> From: Dan Fandrich <dan_at_coneharvesters.com>
>
> Chances are that is because your browser already has some cookies set, due
> to logging in to the site previously, or posting information previously.

What Dan said is correct

>
>>>> Dan
> --
> Hi Dan
>
> here is the live headers output
> using lynx i get the same scenario it works fine no cookies.

Not quite: when cookies are disabled one can get the second form,
but not the CVS from the latter. More details follow

> https://adwords.google.com/select/TrafficEstimatorSandbox?
>
> GET /select/TrafficEstimatorSandbox? HTTP/1.1
> Cookie:

So this header shows you didn't disable cookies.
You may want to install the Web Developer extension,
or some other add-on that lets you disable cookies on the fly...

> S=awfe=v2VaGeH0uW0:awfe-efe=v2VaGeH0uW0:gmail= [...]

I'm not familiar with Google's cookie format,
but I guess awfe stands for "AdWords Front End".

(Also, I cannot tell how much private data of yours may leak
when you publish your cookies on mailing lists like this.
For more on that format, see "Hacking Google Print"
http://www.kuro5hin.org/story/2005/3/7/95844/59875 .)

Loading the page above results in an html page with a form that,
for the task at hand, may be simplified like so:

  <form action="main" method="POST" name="TESandbox">
  <input type="hidden" name="cmd" value="TrafficEstimatorSandbox">
  <textarea name="keywords" rows="6" cols="40" wrap="off" tabindex="1"></textarea>
  <select name="currency" id="currency" tabindex="" onchange=""><option value="USD" selected>US Dollars (USD $)</option></select>
  <input type="text" name="price" size="8">
  <input type="text" name="budget" size="8">
  <select name="language" tabindex="0" size="7" multiple><option value="*">All Languages</option></select>
  <input type="radio" name="geoselection" value="1" checked onclick="updateTargetingPanels();">
  <select name="country" id="countryList" style="" tabindex="0" onchange="" size="7" multiple><option value="*" selected>All Countries and Territories</option></select>
  <input type="submit" name="continue" value="Continue »" style="font-weight:bold;" onclick="return teSandboxCheckAndNavigate()">
  </form>

> ----------------------------------------------------------
> https://adwords.google.com/select/main
>
> POST /select/main HTTP/1.1
> Host: adwords.google.com
> Content-Type: application/x-www-form-urlencoded
> Content-Length: 142
> cmd=TrafficEstimatorSandbox&keywords=keyword&currency=USD&price=100&budget=&language=*&geoselection=1&country=*&continue=Continue+%C2%A0%C2%BB

The content you posted is obviously derived from the form above.

> HTTP/1.x 200 OK
> Set-Cookie: I=dPRQ/RQBAAA=.Z3CLBMp2JuTZ/1ep8Ja78A==.2gB79Hpv4g4wioapaZ/8gQ==; Path=/select
> Expires: Thu, 01 Jan 1970 00:00:00 GMT

I could do a similar post with cookies disabled. However, I caught one more cookie:

  HTTP/1.x 200 OK
  Set-Cookie: I=8Tl4/RQBAAA=.9QCH/JbBItRG1yn60m2UCA==.cK9cd+NCkUB4rcNe+dPlJw==; Path=/select
  Set-Cookie: S=awfe=dm12Aogbsys:awfe-efe=dm12Aogbsys; Domain=.google.com; Path=/

> Content-Type: text/html; charset=UTF-8
> Content-Encoding: gzip
> Transfer-Encoding: chunked
> Cache-Control: private
> Date: Thu, 13 Sep 2007 04:51:54 GMT
> Server: GFE/1.3

The served html content contains an anchor element
  <a href="TrafficEstimatorSandbox?mode=csv&amp;resultcacheid=1189666175352">Download as .csv</a>

Notice we are still in the "/select" folder, thus "TrafficEstimatorSandbox"
corresponds to the same "/select/TrafficEstimatorSandbox" location that we
used before. However, if I click it with cookies disabled I don't get a CSV file.
Instead, I get back to the first form, losing any settings I put in it before.
I presume the resultcacheid is not enough to retrieve my settings. Therefore the
server cannot send me a CSV file. Hence, it answers with a new blank form.

My guess is that you need two calls, first post the data to select/main
and collect the cookies and the cache id. cURL can help you with the
cookies, but you need a different tool to extract the value of resultcacheid
(perl is recommended, if you are familiar with it. Otherwise you may be better
off using sed, as it is simpler. See http://www.mingw.org/ or look for sed.exe)
As a second call, get select/TrafficEstimatorSandbox using mode=csv and the
values from the first call.

Good luck
Received on 2007-09-13