cURL / Mailing Lists / curl-users / Single Mail

curl-users

RE: Curl hangs?... Was: what i'm doing wrong?

From: STEVE STEFANOVICH <STEVE.STEFANOVICH_at_sydneywater.com.au>
Date: Wed, 21 Nov 2001 10:37:37 +1000

Here you go, it's quite simple. You need to change the pattern to suit you. If you can't construct the ranges for curl command then you don't have the bother with min/max calculation, just print out the curl command for every link encountered.

Cheers,
Steve

P.S. Thanks for the tip, -c only didn't do the job... curl then doesn't hang and receives all the files, but the "articles" received are just the generic error notification produced by the php script. So how do I do "careful combination" :) of -b and -D?...

Daniel mate can you help to make the stupid script return the goddam files as it should?... :)

I think I'll write an e-mail to Webmaster and explain what's going on.. These errors say the "webmaster has been notified" for every article failed so he must not be happy with the flow of e-mails coming in... if he cares at all.

{
  if (NR == 1) {
      max = 0;
      min = 9999999
  }

  where = match($0, /\/cms\/view.php\?id=[0-9]+/);
  if (where) {
    num = substr($0, where + RLENGTH - 6, 6);
    
    if (num > max) max = num;
    if (num < min) min = num
  }
  
}

END { print "curl -v -x myproxy:8080 -L -b cookiejar.txt -o \"file_#1.html\" -c cookiejar.txt -A \"Mozilla/4.0 (compatible; MSIE 5.01; Windows NT 5.0)\" \"http://www.vreme.com/cms/view.php?print=yes&id=[" min "-" max "]\""

}

>>> KPRoth_at_MarathonOil.com 11/21/01 12:52am >>>
Not sure if this is the right guess; however it seems that the last
request shown in your output.txt file doesn't send any COOKIE data along
with the request, however the previous request does.

Could this possibly be because you're using both -b and -c? "-c" should
work all by itself, or a careful combination of -b and -D.

Out of curiosity, would the contents of your vreme.awk file be something
you could send me (either directly or via the list)? Edited to remove
anything sensitive of course. I'm just not familiar with awk, but I'd
like to use curl to download groups of FTP files, which it sounds like
your example could easily be adjusted to do...

Thanks,
--Kevin

-----Original Message-----
From: STEVE STEFANOVICH [mailto:STEVE.STEFANOVICH_at_sydneywater.com.au]
Sent: Monday, November 19, 2001 8:39 PM
To: curl_at_contactor.se
Subject: Curl hangs?... Was: what i'm doing wrong?

Just for the record, I've managed to get to the table of contents by
using -b *and* -c together.

So I happily awk it.. but then when I try to get individual links then
it stops on the second file with curl.exe using 99% CPU time. The odd
thing is when I go the link explicitly I get the page saying there was
an error - but curl just hangs.

The script on the other side dies checking for something not being
returned... but I couldn't find any Javascript or hidden form fields
which need to be passed back.

Can some kind soul check the output and see is there something that I
can do?...

Cheers,
Steve

>>> nntp_at_iname.com 11/12/01 12:10pm >>>
At 10:28 12-11-2001 +1000, STEVE STEFANOVICH wrote:
>The output that happens in Internet Explorer is that I get the table of
contents with the links enabled. I've tried to impersonate the browser
with
curl (-a or whatever) and I get the same result (nothing).

It takes more than the agent signature to impersonate a browser. I have
come across a few webpages which use Javascript to prevent people from
using other agents to download a webpage.

>I guess the php script on the other side creates the page dynamically
based on what's sent back - username/password obviously, but it looks
it's
something else besides - browser? cookies? What else can I try to mimic
IE
passing that info back?

I suggest making a dump of the session using a sniffer. That should
show
you what you need to do to mimic IE.

Regards,
-sm

-----------------------------------------------------------
This message has been scanned by MailSweeper.
-----------------------------------------------------------

-----------------------------------------------------------
This e-mail is solely for the use of the intended recipient
and may contain information which is confidential or
privileged. Unauthorised use of its contents is prohibited.
If you have received this e-mail in error, please notify
the sender immediately via e-mail and then delete the
original e-mail.
-----------------------------------------------------------

-----------------------------------------------------------
This e-mail is solely for the use of the intended recipient
and may contain information which is confidential or
privileged. Unauthorised use of its contents is prohibited.
If you have received this e-mail in error, please notify
the sender immediately via e-mail and then delete the
original e-mail.
-----------------------------------------------------------
Received on 2001-11-21