curl-library
Re: How does libCurl handle Content-Encoding GZIP + partial responses in respect to automatically decoding of compressed content
Date: Tue, 01 May 2007 11:07:55 +0200
Dan Fandrich wrote:
> On Mon, Apr 30, 2007 at 04:53:31PM +0200, Stefan Krause wrote:
>
>> I currently think about the use case where compressed content is
>> requested by the client in ranges of a certain size, so that multiple
>> requests for different ranges are needed to get the complete data. After
>> all the (compressed) data was received, it has to be decompressed.
>>
> [...]>
>
>> 3) The clients saves the first 100 bytes to temporary storage and
>> requests the next 100 bytes. The client knows about the compressed data,
>> because the Content-Encoding header in the response is present and set
>> to GZIP.
>> 4) After the client has received the 1000 bytes of compressed data it
>> uncompressed and data which results in the 10000 bytes of (uncompressed)
>> data which is stored on the server.
>>
>
> That's almost right. curl (via zlib) will decompress the data as it
> comes in--it's stored temporarily on the heap, not on disk. If the
> first 100 bytes of compressed data can be decompressed on their own,
> then the application will receive them immediately.
>
Might it theoretically happen that all the compressed data has to be
received and stored on the heap
before it can be decompressed? My application is running in an ressource
constrained environment where I
have to keep memory usage at a minium. I cannot make any assumption
about the data size received, so
storing everything on the heap before decompression might become a
nightmare in the future in my environment.
I try to limit the amount of memory used in order to avoid "out of
memory" situations.
From my point of view it is safer to disable automatic decompression
with CURLOPT_HTTP_CONTENT_DECODING
and save the received (compressed) data to disk. After data has been
received completely I ran a separate decompression
job over the compressed data. I am not into the zlib functionality yet,
but I have seen that it supports operation on streams to keep memory
usage controlled. Is that OK or does anyone have a more appropriate
solution in order to control memory usage?
>
>> And know the questions:
>> 1) How does libCurl deal with compressed partial data (HTTP response
>> code 206) ? According to the documentation there is some sort of
>> "automatic" decompression. How does this work in detail when the
>> CURLOPT_WRITEFUNCTION callback is used for data reception?
>>
>
> The app doesn't need to know if the data is compressed or not. libcurl
> will handle the decompression transparently, so the app will only see
> the uncompressed data.
>
Just to clarify the handling of 206 responses for myself:
The server sends a 206 response with a bunch of compressed data and the
Content-Encoding header set to GZIP.
Thats received by libCurl. libCurl examines the headers, found the
Content-Encoding: gzip and starts decompression
of the data. Then the uncompressed data is given back to the libCurl
client, together with an 206 HTTP status code.
The Content-Encoding header should now be set to identity, otherwise the
now uncompressed data is market as compressed.
Thats implies the following according to my undertanding: The data
received with the 206 response can be
decoded without any other data that the server has not send yet.
Otherwise the next part of data has to be requested from
the server in order to compress the previous HTTP response. This
following request has to be automatically done by libCurl itself or the
libCurl client has to be notified that it has to request the next
partial data from the server. Is that right?
>
>> 2) I have the same use case but in the other direction. Compressed data
>> should be uploaded in parts. Does libCurl provide here also some
>> automation or does the libcurl client has to do the compression first
>> and send send the data in appropriate data segments?
>>
>
> libcurl doesn't handle compressing data on uploads.
>
OK. So here I have to compress the data first (e.g. create a new file
with the compressed data) and then send
parts of that new file with HTTP POST range requests to the server.
After the data is uploaded I delete the compressed
file. Is that right?
Thanks a lot,
Stefan
Received on 2007-05-01