curl / Mailing Lists / curl-library / Single Mail

curl-library

Point in time during HTTP 3xx response handling with CURLOPT_FOLLOWLOCATION when Location headers are followed

From: Nicolas Roeser via curl-library <curl-library_at_cool.haxx.se>
Date: Sun, 7 Apr 2019 23:38:13 +0200

Hello!

While working on a piece of PHP software which uses curl functions, I have reached a point where I need some feedback. Either I have found a problem in libcurl, or I am not using it the right way, and need a bit of advice. I am writing to _this_ list, because the issue is within libcurl (or how it is used), and not limited to curl functions in PHP.

I have seen the behavior with curl 7.64.1 on Linux and macOS (curl from Homebrew), in a program linked to libcurl directly and in a PHP script which uses the same logic.

What I am trying to do: I would like to use libcurl to download a resource with HTTP. The resource is reached after an HTTP redirect. During the download of the target resource, I like to limit the number of bytes downloaded. That is, I like to abort the connection from a progress function when a certain number of downloaded HTTP payload bytes has been reached. CURLOPT_XFERINFOFUNCTION is used, and CURLOPT_NOPROGRESS is set to 0L.

I am using a local web server which returns an HTTP 302 response for a GET request to the original URI. This response has a large payload (more than 5 kB) – it may be a loooong message telling users about the redirection. It includes a Location header field with a URI reference to the location of the target resource. When the server receives an HTTP GET request for the _target_ resource, it responds with HTTP 200 and a document which is also larger than 5 kB. (Yes, I know about the libcurl receive buffer and its default size, but am under the impression that the problem is unrelated.)

For testing, I am simply using docs/examples/progressfunc.c, which I have modified a tiny bit: I am adding the following line:

    curl_easy_setopt(curl, CURLOPT_FOLLOWLOCATION, 1L);

A full patch against curl-7_64_1 is attached, but I doubt the rest of my changes matters much. I have tried with different download size limits (STOP_DOWNLOAD_AFTER_THIS_MANY_BYTES).

My testing shows that libcurl does not follow the redirection, but stops processing entirely when the progress function returns nonzero. Well, we can argue that this is OK, of course. If the message body in the first HTTP response is small enough, libcurl continues and everything works as intended.

But for now I can not understand the necessity of reading the HTTP response body for a 302 response _at all_(*) when there is a usable Location header field. My current stance is that, if CURLOPT_FOLLOWLOCATION is enabled, libcurl should follow the redirection as soon as it has reached the line separating header fields from the message-body, that is, when header processing has ended. Does that sound reasonable?

Or am I using the wrong means for accomplishing my goal described above? Do I need to implement redirection following myself?

I have not tried it yet, but could think of using a header callback function which monitors the headers for 3xx responses and for Location header fields. As soon as _both_ of them have be seen, I could set a flag which modifies the progress function logic until a new HTTP response is seen (which starts with "HTTP/"). This will still download the first big response payload entirely, won’t it?

(*) Having thought about it a little more – is this behavior related to keep-alive connections and not wanting to close them/open new ones?

Cheers

-- 
Nico
Nicolas Roeser
kiz – Information Systems Department, Ulm University
Received on 2019-04-07