cURL / Mailing Lists / curl-users / Single Mail

curl-users

Re: curl and fallocate

From: Manfred Schwarb <manfred99_at_gmx.ch>
Date: Tue, 04 Feb 2014 15:46:36 +0100

Am 03.02.2014 20:45, schrieb Dan Fandrich:
> On Mon, Feb 03, 2014 at 12:55:53PM +0100, Manfred Schwarb wrote:
>> while doing parallel downloads with curl, I found that the resulting
>> files are somewhat fragmented. I.e. doing
>>
>> for ((i=1; i<20; i++)); do
>> curl -o $i.txt http://server/$i.txt &
>> done
>>
>> So it seems curl does not fallocate the resulting files. Is there a
>> possibility to achieve this? Or are there plans to add a fallocate
>> option to curl?
>
> I haven't heard anyone suggest it so far. There should be very little down side
> to adding a call to Linux's fallocate(2) with FALLOC_FL_KEEP_SIZE when the size
> is known. That call preallocates space but keeps the reported file size the
> same--only when more data is appended to the file does it appear to grow in
> size. The primary down side is (naturally) that the space is allocated at once,
> so curl's behaviour w.r.t. handling out of disk space errors would be slightly
> different. Plus, I don't think fallocated-space is automatically freed if a
> download is aborted or the final file size is less than originally expected,
> which would result in an unexplained loss of disk space. Another down side is
> that this call is Linux-specific.
>
> The posix_fallocate behaviour is quite different in that the reported file size
> reaches the maximum right after the call. That makes download resumption
> impossible as there's no reliable way to find out how much of the file has been
> downloaded. It's also impossible in the general case to tell post facto whether
> the file was downloaded successfully.
>
> It's probably reasonable to unconditionally add an fallocate call to curl when
> the size is known, but if only posix_fallocate is available I'd be against
> calling it without a command-line option.
>

A solution using a command-line option would work for me. On most modern file systems,
fallocate calls are almost free, and if the application (i.e. curl) calls ftruncate after
a successful (or controlled aborted) download, there should be no severe drawbacks.
[ In case of resume after an hard abort, one could even crawl back from the end and
   look for the last non-nul character... Admittedly a hacky solution. ]

Even the use of posix_fallocate would be not too bad as long as curl would call
ftruncate on program termination time. Even resume downloads after controlled aborts
would still be possible then. Only resume after hard abort would be hampered.

I tried to use external fallocate(1) calls before executing curl, but this does
not work of course, as curl truncates the file at open time. So the fallocate(1)
command does not help.

Thanks,
Manfred

>>>> Dan
> -------------------------------------------------------------------
> List admin: http://cool.haxx.se/list/listinfo/curl-users
> FAQ: http://curl.haxx.se/docs/faq.html
> Etiquette: http://curl.haxx.se/mail/etiquette.html
>

-------------------------------------------------------------------
List admin: http://cool.haxx.se/list/listinfo/curl-users
FAQ: http://curl.haxx.se/docs/faq.html
Etiquette: http://curl.haxx.se/mail/etiquette.html
Received on 2014-02-04