curl / Mailing Lists / curl-library / Single Mail

curl-library

Re: Write callback function when following HTTP redirections

From: Nicolas Roeser via curl-library <curl-library_at_cool.haxx.se>
Date: Tue, 7 May 2019 16:40:50 +0200

On 2019-04-15 at 22:31+02:00, Daniel Stenberg wrote:
> On Mon, 15 Apr 2019, Nicolas Roeser via curl-library wrote:
>
>> My problem is that I do not know where the boundary between header and
>> body is if the download has been aborted. To make things worse, I have
>> the feeling that it may be difficult to properly detect.
>
> I read your email several times and I can't figure out *why* you need to
> detect that boundary yourself. Why can't you use the different callbacks
> for header and body as then you can simply lean on libcurl's detection
> that it always does?

What I wanted to do at first, was to enable CURLOPT_HEADER and to stuff
all data received by the write callback in one buffer. After the
transmission is complete, I planned to split that buffer into header and
body. I wanted to do that mainly because the existing code which I am
working on did it this way.

But after a _lot_ of thinking and some experiments, I see that your
suggestion is *much* better. In the header callback, I will save the
headers which may be needed after completion of the transmission. And I
will disable CURLOPT_HEADER. Then the data obtained by the write
callback will only be the last body, fine.

>
>> I would like to clear the receive buffer each time the client starts
>> reading a new resource.
>
> And that is not before you invoke curl? When you ask libcurl to follow a
> redirect, the only body that is sent to the write callback is that if
> the URL that isn't itself a redirect.

Ahh, many thanks for clearing this up! I had not understood that because
I had been looking at the number of downloaded octets reported by the
progress callback. This number is always 0 while headers are processed
(which is OK). When a redirecting resource is read, the callback may
report a higher number (the size of the body of the redirecting
document, even though this is not sent to the write callback). And when
the redirection is followed and processing of the headers of the target
resource starts, the number drops to 0 again.

I had been confused because I had assumed that the number would be
monotonically increasing, and would report the number of octets
processed by the write callback (more or less).

>
>> I first thought that I might disable CURLOPT_HEADER and handle some
>> headers differently from what is done now. But this seems not to help
>> with my problem of identifying when to clear my receive buffer as long
>> as CURLOPT_FOLLOWLOCATION is on.
>
> Do you mean a receive buffer for the *headers* of the final non-redirect
> URL? If so, then I presume you can just detect a 2xx response code and
> take that as start of the last set of headers.

Will implement something along these lines, thanks!

>
>> I have a feeling that the write callback function will never be called
>> with data from two HTTP responses at once (that is, will never cross
>> redirections).
>
> I'm not following this. How can there be two HTTP responses at once?

Sorry, that had been wrongly phrased by me. I meant that it could be
called _once_ and be passed data from _two_ HTTP responses that have
_arrived in succession_ (like a response with a redirection and the
final response). So a single call handling data which overlaps two
responses. Anyways, never mind, as now I know that the write callback
will not receive any but the last body, and that I can handle the
headers without CURLOPT_HEADER and in the header callback.

Many thanks again!

-- 
Nico
Nicolas Roeser
kiz – Information Systems Department, Ulm University
-------------------------------------------------------------------
Unsubscribe: https://cool.haxx.se/list/listinfo/curl-library
Etiquette:   https://curl.haxx.se/mail/etiquette.html
Received on 2019-05-07