speed-limit and Current Speed problems #1556

paulharris · 2017-06-08T09:01:15Z

Hi there,

This pull-request is not complete, but I'm not sure how you want me to share my work exactly,
I imagine you can check out this branch and see the work done so far.

I've worked out how to make a test in your system, and I've added 2 tests that begin to demonstrate the problems with the speed-limit calculation.

Daniel has started work on improving the algorithm, but I believe the problem runs deeper, in particular the current-speed calculation and the speeder[] array.

I have a long-winded email draft that I'm putting my thoughts into, but its not finished, so I thought I'd submit these 2 tests to the public to start the conversation rolling.

To run them, clone the repo and checkout this test branch

./configure --disable-shared --enable-debug --enable-maintainer-mode
make
make test
cd tests/
./runtests.pl 3000
./runtests.pl 3001

And to see whats going on, in one terminal:

cd tests/ 
tail -f log/http_server.log  &
server/sws --pidfile ".http_server.pid" --logfile "log/http_server.log" --ipv4 --port 8990 --srcdir "."

And in another terminal, run the test that will pass.
It has a low speed limit of an "average" 100 bytes/s over a 6 second window.
The 3000 and 3001 tests are configured to pause every second. The sws server will send 200 bytes per pause, so the speed should be 200 bytes/sec.

Note that the "Current" speed can go above 200 bytes/sec, so there is a problem in the math there.

cd tests/
../src/curl --output log/curl3001.out  --include --trace-ascii log/trace3001 --trace-time http://127.0.0.1:8990/3001 --speed-limit 100 --speed-time 6 -F form1=012

And in another terminal, run the test that will pass. It has a long long command line due to the long post-form-data that is delivered first.
Again, it has a low speed limit of an "average" 100 bytes/s over a 6 second window.
Again, the sws server will send 200 bytes per pause, so the speed should be 200 bytes/sec.

In this case, the Current speed drops to zero because the initial upload data biases the current speed calculation, and the speeder[] array updates will ignore all the downloaded data progress until the total downloaded bytes is more than the total uploaded bytes. Not a good approach.

Note that I don't like the speeder[] array approach much, I think there are problems with the fixed 6 slot history, and the fact that the speeder[] array can be updated on occasions (by Curl_pgrsUpdate) when the Curl_speedcheck() function is not subsequently called.

But thats part of the long winded email, I'm going to try to propose a better approach.
If anyone has any ideas, I'm all ears!

Here is the long command for test 3000:

cd tests/
../src/curl --output log/curl3000.out  --include --trace-ascii log/trace3000 --trace-time http://127.0.0.1:8990/3000 --speed-limit 100 --speed-time 60 -F form1=01234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789

coveralls · 2017-06-08T09:40:12Z

Coverage decreased (-0.008%) to 73.233% when pulling 5d905db on paulharris:test-speedcheck into 7fffe97 on curl:master.

bagder · 2017-06-08T10:15:23Z

the initial upload data biases the current speed calculation, and the speeder[] array updates will ignore all the downloaded data progress until the total downloaded bytes is more than the total uploaded bytes. Not a good approach

Nah, that's really stupid. I think the "current speed" should be the maximum speed in either direction for this to work properly. Possibly it should be both directions?

the speeder[] array can be updated on occasions (by Curl_pgrsUpdate) when the Curl_speedcheck() function is not subsequently called

I think we can fix this pretty neatly by moving the call to Curl_speedcheck to happen from within Curl_pgrsUpdate() instead and remove the invokes from everywhere else. Should make it harder to do this by mistake. Agree?

... by keeping track of the current speed for both upload and download simultaneously and separately, the single "current speed" used internally is now the highest of the two values. Reported-by: Paul Harris Fixes #1556

bagder · 2017-06-08T21:42:15Z

@paulharris: what about my suggested fix in #1559 as a first step?

paulharris · 2017-06-08T23:07:21Z

I haven't checked what you've done yet, but I wanted to fire off an email while you are online.

I was thinking about the approach to solve, and I was thinking...

SPEEDCHECK THRESHOLD

Allow Curl_speedcheck to remember the last total transferred bytes (up+down) and the last "passed threshold time". I don't see why we should use any of the update's transfer speeds at all.
I interpret speedcheck's "average speed within time" to mean "must transfer LIMITxTIME bytes within TIME", so we just need to reset the timer whenever it has transferred enough bytes.
So, if LIMIT=100 bytes/sec, TIME=20 sec, then
THRESHOLD=100x20=2000 bytes must be transferred within a 20 sec window.
You record the start time and the total_transferred at that time.
Each time speedcheck() is called, it checks if
(TRANSFERRED_NOW - TRANSFERRED_START) >= THRESHOLD
if true, then reset... record the start_time and total_transferred again.
If false, and (NOW - START_TIME) >= TIME then its too slow.

WHEN TO CALL SPEEDCHECK

Curl_pgrsUpdate is called in about 20 different places, but Curl_speedcheck is called in only < 8 places.
I don't know when Curl_speedcheck should be called, but putting it directly in prgsUpdate might be a significant change.
eg, we probably don't want to be checking for low speeds when negotiating with the proxy or doing name lookups.

CURRENT SPEED

Instead of speeder, perhaps we should just be adding the dl and ul speeds together?
ie current_speed = ulspeed+dlspeed ?
Why is that not a good approach?
My guess is that ul/dlspeed is a longer term average, whereas current_speed is supposed to be the speed now.
In which case, why not use a rate smoothing style calculation, like this one I use (edited as example)...

void init() {
   state.window_len = 0;
   state.ul_rate_average = 0;
   state.dl_rate_average = 0;
}

void update( Time now_utc, Bytes ulnow, Bytes dlnow ) {
   // ignore updates after a zero time gap!
   if (now_utc > state.last_time_utc) {
      static const double max_window_secs = 60;

      double secs = secs_diff(now_utc-state.last_time_utc);
      double ul_rate = (ulnow-state.last_ul) / secs;
      double dl_rate = (dlnow-state.last_dl) / secs;

      // normally, smoothing = 1/200 ie 1 part in X minutes,
      // but our 'secs' could be different to 1...
      state.window_len = std::min(max_window_secs, state.window_len + secs);
      double smoothing = std::min(1.0, secs / state.window_len);  // don't let ratio be > 1
      state.ul_rate_average = ul_rate * smoothing + (1-smoothing) * state.ul_rate_average;
      state.dl_rate_average = dl_rate * smoothing + (1-smoothing) * state.dl_rate_average;

      state.last_time_utc = now_utc;
      state.last_ul = ulnow;
      state.last_dl = dlnow;
   }
}

Its not perfect, I need to improve it a bit,
but it can be tweaked to better bias towards more recent transfer rates and for a smaller total window.

Then we would have a "total dl speed" (the Curl dlspeed as its implemented now)
and a "recent dl speed" (ie what speeder[] gets us)
and the total speeds are just the dl/ul speeds added together.

paulharris · 2017-06-09T02:22:24Z

re: @paulharris: what about my suggested fix in #1559 as a first step?
I've commented in #1559 directly. I'll continue my approach on my branch now, lets see where it leads...

Rather than watching the computed current average speed, which is not as specific as a simple total over a time window.

Conflicts: lib/speedcheck.c

paulharris · 2017-06-09T05:37:33Z

I've updated this with my suggested implementation.
Note that I have not changed where speedcheck is called, as I don't think I want to call it when its doing name lookups or the connection is just getting started, or I'm not sure where else...

I'll have to leave that to you to determine what is the best course of action in that regard.

At the moment, I figure it'll "be ok" since it'll "probably" get called as often as required, but I can't be sure.

paulharris · 2017-06-09T05:45:56Z

Regarding the definition, the "R bytes/sec over T time" is a bit confusing.

If I had a time machine, I'd suggest just using "N bytes over T time".

At the moment, the minimum rate I can specify over a 60 second window is 1 byte/sec.
What if its a longpoll connection, and I want it as low as 1 byte/minute ?

Or 100 bytes / 2 minutes ? Minimum rate over 2 minutes possible is 120 bytes (1bps)

You can see someone had the same confusion here (I'm not involved in this project, I just found it via search):
collectd/collectd#985

Quote:
- rename LowLimitBytesPerSec to LowSpeedLimit.
- make LowSpeedLimit refer to bytes per second rather than bytes per
  Interval.

The reason I'm using the low-speed limit is actually to avoid using timeouts in longpoll connections, while at the same time watching for broken connections that Curl doesn't detect (by design, it seems?).
From: collectd/collectd#752

I've been seeing a problem in some production systems using write_http where cURL hangs forever (and I'm talking weeks) waiting for a request to complete, because the TCP socket according to the client remains open while the server considers it have been broken by network instability.

Looking at the code (and libCURL docs) it seems that there is no default timeout, and waiting forever is expected behaviour in this case.

I get the impression that this is the only current patch which would address this issue? Is that correct or have I missed some code which introduces some sort of timeout to the HTTP request?

Is the above the correct design, from Curl's point of view?
I see a few reasons for this here: https://stackoverflow.com/questions/14081696/curl-stop-working

Sorry if this is polluting this discussion...

coveralls · 2017-06-09T05:50:15Z

Coverage increased (+0.004%) to 73.221% when pulling fce8e99 on paulharris:test-speedcheck into 844896d on curl:master.

paulharris · 2017-06-09T05:51:55Z

Another question... see here:
https://forum.kodi.tv/showthread.php?tid=131873

Specifically,

* CURLOPT_LOW_SPEED_LIMIT combined with a restricted buffer will cause unexpected abort of playback in media when a pause or a skip back causes throttling of the transport below 1bps.

In short, they are streaming video. They pause the video (or the buffer becomes full) so they throttle the network speed.
But, the low speed limit is still in place and triggers a connection-lost.

In this scenario, shouldn't the Curl user adjust the low-speed-limit to match the throttled speed desired?

And if so, CAN the Curl user adjust the low-speed-limit as the code currently stands?
I believe if the user changes the thresholds, the code (as I've written it) will simply test against the new thresholds at the next call, and work as expected.

paulharris · 2017-06-09T05:55:00Z

This might be related to the not-always-calling Curl_speedcheck() question...
https://stackoverflow.com/questions/33778157/curl-tftp-transfer-timeout

Bug #1556 Reported-by: Paul Harris

bagder · 2017-06-09T09:25:26Z

First, I think we should focus on getting the existing functionality done right. I don't think it is valuable to scan a lot of random misunderstandings or others repeating our bugs before then. Once we believe we have ironed out our current problems, then I think we can start to see what we can do for the remaining problems (if possible).

The low speed timeout was never meant to be the all-covering solution for everyone. It is just one way for applications to kill almost idle connections. Applications can and should still implement their own logic if they want more fancy algorithms.

So, if LIMIT=100 bytes/sec, TIME=20 sec, then THRESHOLD=100x20=2000 bytes must be transferred within a 20 sec window.

Those first 20 seconds are easy - but what happens on the 21st second? Do you restart the counter at second 20 and wait another 20 seconds so it won't abort this connection until second 40?

By checking the speed every second, the situation where 2000 bytes come the first second and is then followed by 19 totally stalled seconds is detected at second 21 where it actually reaches 20 seconds in a row at less than 100 bytes/sec...

paulharris · 2017-06-09T13:30:05Z

It does as you say, If in the first 1 second, 2000 bytes is received, then it is reset at that 1st second, and then at the 21st second it is deemed too slow. If it receives 1999 bytes in the first second, and no more for 19 more seconds, then it too slow at the 20th second. If it receives 1999 bytes in the 1st second, and 1 byte in the 20th second, it is reset there. Then it takes 20 more seconds to be too slow. That is bad. Would be better to be deemed too slow in the 21st second. A speeder array would work, with a slot for every second in the window. If the user requests a huge window, then will require a huge array. But not that huge. You could store the log of number of bytes in each cell (1 char per cell for 1e255 Max bytes for example), so it can inaccurately but reasonably track the data volume even if we get a huge chunk in one second. 1 byte per second window. Alternatively, Not sure if the maths I suggested would work perfectly for this situation. But could probably do it this way. Hmm.. Other ideas?

…

On 9 Jun 2017 5:25 PM, "Daniel Stenberg" ***@***.***> wrote: First, I think we should focus on getting the existing functionality done right. I don't think it is valuable to scan a lot of random misunderstandings or others repeating our bugs before then. Once we believe we have ironed out our current problems, *then* I think we can start to see what we can do for the remaining problems (if possible). The low speed timeout was never meant to be the all-covering solution for everyone. It is just one way for applications to kill almost idle connections. Applications can and *should* still implement their own logic if they want more fancy algorithms. So, if LIMIT=100 bytes/sec, TIME=20 sec, then THRESHOLD=100x20=2000 bytes must be transferred within a 20 sec window. Those first 20 seconds are easy - but what happens on the 21st second? Do you restart the counter at second 20 and wait another 20 seconds so it won't abort this connection until second 40? By checking the speed every second, the situation where 2000 bytes come the first second and is then followed by 19 totally stalled seconds is detected at second 21 where it actually reaches 20 seconds in a row at less than 100 bytes/sec... — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#1556 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/ABkgS_eoqtId3Q4v4mq69Owqr5K_KnOiks5sCQ-QgaJpZM4NzxdY> .

bagder · 2017-06-09T14:30:43Z

I'm not sure what you're suggesting. If we ignore the problem #1559 is trying to fix, what is this new code bringing to libcurl that the existing code doesn't already handle?

I think the current code has several advantages compared to your version, including that the "current speed" is based on the last 5 seconds only. Limiting the speed math to the last N seconds is hard to do without using an array like this.

paulharris · 2017-06-13T02:12:12Z

In short, I think your changes in #1559 are ok,
but the description of the limit doesn't match what it does.
If we change the description then I think its fine.

N = average num bytes/sec lower limit
T = num seconds to tolerate low speed

The current description implies (to me):

Over T seconds, the connection should be operating with an average speed of N bytes/sec.
That implies that over T seconds, N*T bytes must be transferred, or else its too slow.

The description should be changed to something more like...

The connection is defined as too slow if its total speed is lower than N bytes/sec averaged over a 5 second window.
A slow connection will be tolerated for T seconds, after which it is aborted with a CURL_OPERATION_TIMEOUT code.

Alternatively: The connection will be aborted (due to low speed) if it T seconds pass without 5*N bytes uploaded or downloaded within a 5 second window.

More thoughts

Ok, so the current speed is based on the total transferred in the last 5 seconds (divided by 5).
Each time Curl_pgrsUpdate() is called, the current speed is updated IF at least 1 second has passed.

Then, Curl_speedcheck() should be called at least once a second (right?)
It is not called every time update is called, yet, we haven't tacked that. Might be different for different protocols.
If speedcheck isn't called in one of those seconds, we might miss the connection's one brief moment of being fast enough. I could manufacture an edge case, but I don't think its required.

bagder · 2017-06-13T20:39:12Z

You seem to be hung up and not agree that we need more than one second to get an accurate "current speed". I don't think the fact that it uses the last 5 seconds to get that information is something we need to hammer the user with, it is just an implementation detail for how it gets the current speed (I'm concerned that using less will get very unstable number and thus instead cause other issues). And it uses less than 5 seconds the first 5 seconds of a transfer. Therefore, i don't think your suggested updated descriptions improve matters very much.

Then, Curl_speedcheck() should be called at least once a second (right?)

it should. But it is a fairly new requirement so I won't be surprised if there's some case where it doesn't happen. We should probably add a debug-build condition that checks and warns if that happens. Or have you figured out any situation where you know it happens?

paulharris · 2017-06-14T02:55:39Z

I agree we need more than one second, that's logical.
5 seconds seems fine too. I think that is all good, and better than my longer-term-speed approaches.
And I agree my suggested description is not ideal.

My confusion (and I think others) comes from the wording of what an "average speed" is. Current description implies that the average speed over the whole time window should be above the limit.
ie for LIMIT=L bytes/sec, the minimum total bytes transferred over T seconds is T * L

That isn't the case.

The case is that there only needs to be one second in the time window where the "current speed" is above the limit.
ie for LIMIT=L bytes/sec, the minimum total bytes transferred over T seconds is L

Regarding the question regarding whether speedcheck is always called every second, I was thinking of adding some checks, ie when prgsupdate is called AND the speeder is updated, has speedcheck been called since the last speeder update?
But I don't see any other tests that would execute long enough to check the various protocols and scenarios,
And I am only using HTTP.
Who can run the library/exe with the tests enabled?

Bug #1556 Reported-by: Paul Harris Closes #1559

bagder · 2017-06-16T06:28:23Z

Is this PR still valid or does #1567 replace this?

paulharris · 2017-06-16T06:37:39Z

#1567 replaces it, in terms of the commit.
I'd still like to see some change to the description.
Would you like to finish that discussion here or move it to a new Issue ?

By the way, you are doing a great job with such a complex library.
I have more suggestions/fixes in the pipeline.

bagder · 2017-06-16T06:48:48Z

Would you like to finish that discussion here or move it to a new Issue ?

Due to the slight shift in focus from this PR, I think that's better taken to a new PR/issue. Makes it easier for everyone to keep track of the exact topic.

you are doing a great job with such a complex library

Thank you!

bagder · 2017-07-01T21:59:17Z

Let's consider this PR done. Open a new one if there are more/remaining things to fix.

Added 2 tests to begin to test low-speed limits

5d905db

bagder mentioned this pull request Jun 8, 2017

progress: calculate "current speed" better #1559

Closed

paulharris added 2 commits June 9, 2017 13:25

Speedcheck - tests total bytes transferred in time

fee2987

Rather than watching the computed current average speed, which is not as specific as a simple total over a time window.

Merge remote-tracking branch 'upstream/master' into test-speedcheck

fce8e99

Conflicts: lib/speedcheck.c

bagder added a commit that referenced this pull request Jun 9, 2017

progress: let "current speed" be UL + DL speeds combined

d58800a

Bug #1556 Reported-by: Paul Harris

bagder added a commit that referenced this pull request Jun 14, 2017

progress: let "current speed" be UL + DL speeds combined

1213bab

Bug #1556 Reported-by: Paul Harris Closes #1559

bagder closed this Jul 1, 2017

paulharris deleted the test-speedcheck branch July 6, 2017 15:49

lock bot locked as resolved and limited conversation to collaborators Jan 18, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

speed-limit and Current Speed problems #1556

speed-limit and Current Speed problems #1556

paulharris commented Jun 8, 2017

coveralls commented Jun 8, 2017

bagder commented Jun 8, 2017

bagder commented Jun 8, 2017

paulharris commented Jun 8, 2017 •

edited

paulharris commented Jun 9, 2017

paulharris commented Jun 9, 2017

paulharris commented Jun 9, 2017 •

edited

coveralls commented Jun 9, 2017

paulharris commented Jun 9, 2017

paulharris commented Jun 9, 2017

bagder commented Jun 9, 2017

paulharris commented Jun 9, 2017 via email

bagder commented Jun 9, 2017

paulharris commented Jun 13, 2017

bagder commented Jun 13, 2017

paulharris commented Jun 14, 2017

bagder commented Jun 16, 2017

paulharris commented Jun 16, 2017

bagder commented Jun 16, 2017

bagder commented Jul 1, 2017

speed-limit and Current Speed problems #1556

speed-limit and Current Speed problems #1556

Conversation

paulharris commented Jun 8, 2017

coveralls commented Jun 8, 2017

bagder commented Jun 8, 2017

bagder commented Jun 8, 2017

paulharris commented Jun 8, 2017 • edited

SPEEDCHECK THRESHOLD

WHEN TO CALL SPEEDCHECK

CURRENT SPEED

paulharris commented Jun 9, 2017

paulharris commented Jun 9, 2017

paulharris commented Jun 9, 2017 • edited

coveralls commented Jun 9, 2017

paulharris commented Jun 9, 2017

paulharris commented Jun 9, 2017

bagder commented Jun 9, 2017

paulharris commented Jun 9, 2017 via email

bagder commented Jun 9, 2017

paulharris commented Jun 13, 2017

The current description implies (to me):

The description should be changed to something more like...

More thoughts

bagder commented Jun 13, 2017

paulharris commented Jun 14, 2017

bagder commented Jun 16, 2017

paulharris commented Jun 16, 2017

bagder commented Jun 16, 2017

bagder commented Jul 1, 2017

paulharris commented Jun 8, 2017 •

edited

paulharris commented Jun 9, 2017 •

edited