cURL / Mailing Lists / curl-users / Single Mail

curl-users

Re: curl and fallocate

From: Manfred Schwarb <manfred99_at_gmx.ch>
Date: Tue, 04 Feb 2014 23:18:46 +0100

Am 04.02.2014 15:46, schrieb Manfred Schwarb:
> Am 03.02.2014 20:45, schrieb Dan Fandrich:
>> On Mon, Feb 03, 2014 at 12:55:53PM +0100, Manfred Schwarb wrote:
>>> while doing parallel downloads with curl, I found that the resulting
>>> files are somewhat fragmented. I.e. doing
>>>
>>> for ((i=1; i<20; i++)); do
>>> curl -o $i.txt http://server/$i.txt &
>>> done
>>>
>>> So it seems curl does not fallocate the resulting files. Is there a
>>> possibility to achieve this? Or are there plans to add a fallocate
>>> option to curl?
>>
>> I haven't heard anyone suggest it so far. There should be very little down side
>> to adding a call to Linux's fallocate(2) with FALLOC_FL_KEEP_SIZE when the size
>> is known. That call preallocates space but keeps the reported file size the
>> same--only when more data is appended to the file does it appear to grow in
>> size. The primary down side is (naturally) that the space is allocated at once,
>> so curl's behaviour w.r.t. handling out of disk space errors would be slightly
>> different. Plus, I don't think fallocated-space is automatically freed if a
>> download is aborted or the final file size is less than originally expected,
>> which would result in an unexplained loss of disk space. Another down side is
>> that this call is Linux-specific.
>>
>> The posix_fallocate behaviour is quite different in that the reported file size
>> reaches the maximum right after the call. That makes download resumption
>> impossible as there's no reliable way to find out how much of the file has been
>> downloaded. It's also impossible in the general case to tell post facto whether
>> the file was downloaded successfully.
>>
>> It's probably reasonable to unconditionally add an fallocate call to curl when
>> the size is known, but if only posix_fallocate is available I'd be against
>> calling it without a command-line option.
>>
>
> A solution using a command-line option would work for me. On most modern file systems,
> fallocate calls are almost free, and if the application (i.e. curl) calls ftruncate after
> a successful (or controlled aborted) download, there should be no severe drawbacks.
> [ In case of resume after an hard abort, one could even crawl back from the end and
> look for the last non-nul character... Admittedly a hacky solution. ]
>
> Even the use of posix_fallocate would be not too bad as long as curl would call
> ftruncate on program termination time. Even resume downloads after controlled aborts
> would still be possible then. Only resume after hard abort would be hampered.
>
> I tried to use external fallocate(1) calls before executing curl, but this does
> not work of course, as curl truncates the file at open time. So the fallocate(1)
> command does not help.

Another idea:
instead of doing the fallocate(2) inside curl, one could add an option which
omits the O_TRUNC on open and does a ftruncate(2) at program termination time.

Then the user would be responsible in preallocating the file if desired.
This could also be interesting when you mirror some files at a high rate which
do change in contents, but not in length. You could then simply write into
the already existing file. Perhaps there are even filesystems / OS's which
do optimize away the write operation if to-be-written and pre-existing hunks are the same.

Cheers,
Manfred

>
> Thanks,
> Manfred
>
>
>>>>> Dan
>> -------------------------------------------------------------------
>> List admin: http://cool.haxx.se/list/listinfo/curl-users
>> FAQ: http://curl.haxx.se/docs/faq.html
>> Etiquette: http://curl.haxx.se/mail/etiquette.html
>>
>
> -------------------------------------------------------------------
> List admin: http://cool.haxx.se/list/listinfo/curl-users
> FAQ: http://curl.haxx.se/docs/faq.html
> Etiquette: http://curl.haxx.se/mail/etiquette.html
>

-------------------------------------------------------------------
List admin: http://cool.haxx.se/list/listinfo/curl-users
FAQ: http://curl.haxx.se/docs/faq.html
Etiquette: http://curl.haxx.se/mail/etiquette.html
Received on 2014-02-04