Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve speed limits, not based on average speeds anymore #971

Closed
wants to merge 1 commit into from

Conversation

jjk-jacky
Copy link
Contributor

Speed limits (from CURLOPT_MAX_RECV_SPEED_LARGE & CURLOPT_MAX_SEND_SPEED_LARGE)
were applied simply by comparing limits with the cumulative average speed of the
entire transfer; While this might work at times with good/constant connections,
in other cases it can result to the limits simply being "ignored" for more than
"short bursts" (as told in man page).

Consider a download that goes on much slower than the limit for some time
(because bandwidth is used elsewhere, server is slow, whatever the reason), then
once things get better, curl would simply ignore the limit up until the average
speed (since the beginning of the transfer) reached the limit.
This could prove the limit useless to effectively avoid using the entire
bandwidth (at least for quite some time).

So instead, we now use a "moving starting point" as reference, and every time at
least as much as the limit as been transferred, we can reset this starting point
to the current position. This gets a good limiting effect that applies to the
"current speed" with instant reactivity (in case of sudden speed burst).

Speed limits (from CURLOPT_MAX_RECV_SPEED_LARGE & CURLOPT_MAX_SEND_SPEED_LARGE)
were applied simply by comparing limits with the cumulative average speed of the
entire transfer; While this might work at times with good/constant connections,
in other cases it can result to the limits simply being "ignored" for more than
"short bursts" (as told in man page).

Consider a download that goes on much slower than the limit for some time
(because bandwidth is used elsewhere, server is slow, whatever the reason), then
once things get better, curl would simply ignore the limit up until the average
speed (since the beginning of the transfer) reached the limit.
This could prove the limit useless to effectively avoid using the entire
bandwidth (at least for quite some time).

So instead, we now use a "moving starting point" as reference, and every time at
least as much as the limit as been transferred, we can reset this starting point
to the current position. This gets a good limiting effect that applies to the
"current speed" with instant reactivity (in case of sudden speed burst).
@bagder
Copy link
Member

bagder commented Aug 21, 2016

wouldn't it then be easier and smarter to use the data->progress.current_speed number instead? We already have it and it is the transfer speed during the last 5 seconds.

@jjk-jacky
Copy link
Contributor Author

No, it actually gives different - and less useful - results (and also,
current_speed is a single value, while there could be different limits
set for upload & download, so it would need to be "duplicated"). It was
also my first idea, and what I originally tried.

But things aren't as good then, I believe it has to do with the fact
that current_speed is indeed "time-based". A simple case to illustrate
is: imagine a transfer that stops at some point for a few feconds, the
current speed would then stay constant for 5s (as all data counters
stay the same), and then once we loop in the speeders it drops to 0.
Limiting behavior based on the current speed wasn't performing as hoped.
For limits, it's fine in such a case to simply wait until enough data
has been transfered (however long, or short, that may be) and then
calculate speed and impose wait time if needed.

So having a "fixed" starting point to calculate the limit gives much
better result, and then moving said starting point after enough data has
been transfered allows to keep things "current" and not "average" -
which gives the best results.

@bagder
Copy link
Member

bagder commented Aug 22, 2016

A simple case to illustrate is: imagine a transfer that stops at some point for a few feconds, the current speed would then stay constant for 5s (as all data counters stay the same), and then once we loop in the speeders it drops to 0.

I don't get that. If the transfer pauses, the current_speed will drop rather sharply every second until it reaches zero after 5 seconds. Right?

But I agree that it would need to be duplicated (one for each direction) to be possible to be used for this.

For limits, it's fine in such a case to simply wait until enough data has been transferred (however long, or short, that may be) and then calculate speed and impose wait time if needed.

So let me check if I understand your suggested algorithm (for a single direction).

  1. store a time stamp
  2. wait until the speed limit amount of data has been received
  3. check how long it took and deem how long to wait (if at all), then GOTO 1

Is that what you're proposing?

@jjk-jacky
Copy link
Contributor Author

A simple case to illustrate is: imagine a transfer that stops at
some point for a few feconds, the current speed would then stay
constant for 5s (as all data counters stay the same), and then once
we loop in the speeders it drops to 0.

I don't get that. If the transfer pauses, the current_speed will
drop rather sharply every second until it reaches zero after 5
seconds. Right?

Well, depends, because the "current speed" is really an average over
the last 5 seconds. It is calculated using current amount transfered,
and amount transfered 5s ago.

Imagine this, with no transfer happening for the last 5s:

t : speed based on amount of data from t-5
t+1s: speed based on same amount, so same speed
t+2s: still based on same amount, so still same speed
...
t+6s: speed based on amount from t, i.e. current one, speed drops to 0

It also wouldn't get to the actual current speed right away. That is,
if after such a "pause" data comes in with 1M in a second,
current_speed wouldn't say 1M/s because it would average it over the
last 5s, so ~205K/s only.

Anyways, because the current speed is in fact the average over the last
5s (which is probably good for what it is, so it doesn't jump
constantly), it doesn't work as well/have good reactivity, so it's not
ideal for implementing speed limits.

So let me check if I understand your suggested algorithm (for a
single direction).

  1. store a time stamp
  2. wait until the speed limit amount of data has been received
  3. check how long it took and deem how long to wait (if at all), then
    GOTO 1

Is that what you're proposing?

Yes. It's a timestamp & amount of data received then, ofc, but yes
that's it. That way we get instant reactivity and
calculation over "current speed".

FYI, this was inspired from openssh's scp, because after trying to use
the current speed and not getting satisfying results I did some looking
around, and scp is a tool that gives me satisfying results with its
limit, so I looked how they did it, which is basically this.

@bagder
Copy link
Member

bagder commented Aug 22, 2016

Ok, yes that all makes sense. Thanks for explaining. This approach has one additional benefit too: it will support changing the limits in run-time much better than it currently does. That is something that I keep hearing users ask for. I think it could make sense to actually document that fact as well in association with landing this patch.

@bagder bagder closed this in 4b86113 Sep 4, 2016
@bagder
Copy link
Member

bagder commented Sep 4, 2016

Thanks a lot!

@lock lock bot locked as resolved and limited conversation to collaborators Jan 19, 2019
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants