cURL / Mailing Lists / curl-library / Single Mail

curl-library

Re: Download problem.

From: Daniel Stenberg <daniel_at_haxx.se>
Date: Tue, 6 Nov 2001 09:41:48 +0100 (MET)

On Mon, 5 Nov 2001, Yanick Pelletier wrote:

> I have the following problem:
> - I set the option CURLOPT_NOBODY to 1 so i only retrieve the document
> header.
> - I call curl_easy_perform()
> - I'm doing some validation with the content-type redirected page,
> etc...
> - I set the option CURLOPT_NOBODY to 0 to download the document data.
> - i'm trying to download the document data with a custom functions by
> calling curl_easy_perform().

Okay, it seems pretty straight forward and correct.

> Here the bug happen.

> The curl_easy_perform call transfer(). Normally the transfert recieve
> the header for the send request, followed by the document data. In my
> case the transfer() function doesn't recieve the header, it recieve an
> empty line

Uuuuh. You mean that instead of getting a first line that starts with
"HTTP/1.", it gets a first line that is just a CRLF?

If that really is the case, then the server sends a broken HTTP reply. If the
server sends a correct reply, but this is what curl reads, then curl is
broken.

When I try my curl on the site http://www.lacapitale.com/ I get "HTTP/1.1 200
OK" as the first line in the replies every time... I think this indicates
that something in your use case makes curl perhaps no read the last CRLF of
the first request until you've issued the second request and then it wrongly
thinks that it belongs to the second one... Could that be the case?

> (like the last line recieve in the header), so it switch of mode (he
> don't wait anymore the the header) but the header follow just after. In
> this case the transfer() method loop until it timeout (i have set a time
> out value of 60 secs).

I don't understand this.

If it receives a blank line and considers that to be the end of the headers,
why does it timeout just because of that? Why aren't all the headers as well
as the following (actual) body treated as the body and downloaded? In what
way does it "loop until it timeout" without actually reading all the data the
server sends?

> If i remove the line 548 in "transfert.c" to force the function to wait
> for the header before going into body retrieving mode every thing work
> well. What do you think about this modification?
>
> line 547 else {
> line 548 header = FALSE;
> line 549 break;
>
> Thanks!

I know why I added that check a hundred years ago, and then the situation was
pretty much reversed:

When you fetch a HTTP page from a server that doesn't return *any* header at
all, we just have to forget about header parsing and go straight into "body
reading" mode. Otherwise it'll treat everything as headers...!

I recon that no header is indeed invalid HTTP, but so is sending an initial
blank line before the first headers... Dealing with the unexpected the best
possible way is a worthy task. But it needs careful adjustments.

-- 
    Daniel Stenberg -- curl groks URLs -- http://curl.haxx.se/
Received on 2001-11-06