curl / Mailing Lists / curl-library / Single Mail
Buy commercial curl support. We help you work out your issues, debug your libcurl applications, use the API, port to new platforms, add new features and more. With a team lead by the curl founder Daniel himself.

Re: Increase in CPU usage in 8.7.1 vs 8.6.0 for rate-limited downloads

From: David Pfitzner via curl-library <curl-library_at_lists.haxx.se>
Date: Thu, 20 Jun 2024 17:28:25 +0930

On Wed, May 15, 2024 at 8:33 PM David Pfitzner <dpfitzner_at_netflix.com>
wrote:

> On Wed, May 15, 2024 at 5:31 PM Daniel Stenberg <daniel_at_haxx.se> wrote:
>
>> On Wed, 15 May 2024, David Pfitzner via curl-library wrote:
>>
>
>> > Perhaps it would be useful for a user of libcurl to be able to
>> (somehow)
>> > control this tradeoff between rate-limiting accuracy and CPU usage?
>>
>> Perhaps getting more data would be a first step. How big difference in
>> rate-limit accuracy does this commit make in your case?
>>
>> I have not looked at that carefully, but casually I don't see much
> difference if any. But possibly closer inspection may find a systematic
> difference. I also suspect one may see bigger differences in some regime
> different to what I'm looking at - eg, smaller files.
>
> Getting back to this, although various things can affect timing, I think
the maximum expected error for the rate-limiting timing mainly corresponds
to the maximum number of bytes which can be read in one call of
lib/transfer.c:readwrite_data(). (That is, since rate-limiting is done
outside such calls, and not within such calls.) This turns out to be, for
these two curl versions:

8.6.0: max_error_bytes = min(rate, 10*buffersize)
8.7.1: max_error_bytes = min(rate, buffersize)

(Where rate=max_recv_speed. The first term is that we are not allowed to
read more than max_recv_speed; the second is that for 8.6.0 we can do up to
maxloops = 10, while for 8.7.1 we can only do one iteration of the loop
when rate-limiting is in effect.)

Dividing by the rate to get the corresponding time (in seconds):

8.6.0: max_error_time = min(1s, 10*buffersize/rate)
8.7.1: max_error_time = min(1s, buffersize/rate)

For the curl command-line utility, we have: buffersize = min(rate, 100KiB)
which gives for this case:

8.6.0: max_error_time = min(1s, 1000KiB/rate)
8.7.1: max_error_time = min(1s, 100KiB/rate)

For a few example rates, the maximum expected error in seconds would be:

rate 8.6.0 8.7.1
<=100KiB: 1 1
  200KiB: 1 0.5
  500KiB: 1 0.2
  1000KiB: 1 0.1
  2000KiB: 0.5 0.05
  5000KiB: 0.2 0.02
  10000KiB: 0.1 0.01

To check this, I did some timing experiments with the curl command-line
utility, and measured the following values (taking 95th percentile as
"maximum"):

rate 8.6.0 8.7.1
50KiB 0.906 0.921
100KiB 0.898 0.949
200KiB 0.954 0.439
500KiB 0.934 0.178
1000KiB 0.921 0.093
2000KiB 0.517 0.042
5000KiB 0.198 0.028
10000KiB 0.101 0.011

Given timing variability etc, I would say they match pretty well the
expected values above.

If one is interested in the relative error (that is, the timing error
relative to the total download time), then dividing max_error_time by
(size/rate) gives:

8.6.0: max_relative_error = min(rate/size, 10*buffersize/size)
8.7.1: max_relative_error = min(rate/size, buffersize/size)

Or for the curl command-line utility:

8.6.0: max_relative_error = min(rate/size, 1000KiB/size)
8.7.1: max_relative_error = min(rate/size, 100KiB/size)

Note for both versions the relative error can be 1 (that is, 100%), if the
size is smaller than the number of bytes which can be read in one go by
readwrite_data(). That is, because in that case we effectively do not
rate-limit at all. (But the value of the cutoff size varies by version.) On
the other hand, as the size becomes large, the relative error becomes small
(but still smaller for 8.7.1 than for 8.6.0, for rate > 100KiB).

So what does that all mean? Well 8.7.1 does indeed have improved
rate-limiting accuracy (at least at high rates) compared to 8.6.0 - up to
10 times better. And certainly there may be cases where that improvement is
important. However in my case I am mostly downloading very large files, and
the accuracy of 8.6.0 was sufficient. So I would still say it could be
useful to have a curl request option which influences that accuracy (and
hence the CPU usage tradeoff).

I have not yet tried adding such an option, but I was thinking one could
perhaps specify the desired accuracy in seconds, and then libcurl would
calculate (based on the rate) the maximum number of bytes to be read in one
go by readwrite_data(), and implement it that way.

-- David


-- 
Unsubscribe: https://lists.haxx.se/mailman/listinfo/curl-library
Etiquette:   https://curl.se/mail/etiquette.html
Received on 2024-06-20