Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

speed-limit and Current Speed problems #1556

Closed
wants to merge 3 commits into from

Conversation

paulharris
Copy link

Hi there,

This pull-request is not complete, but I'm not sure how you want me to share my work exactly,
I imagine you can check out this branch and see the work done so far.

I've worked out how to make a test in your system, and I've added 2 tests that begin to demonstrate the problems with the speed-limit calculation.

Daniel has started work on improving the algorithm, but I believe the problem runs deeper, in particular the current-speed calculation and the speeder[] array.

I have a long-winded email draft that I'm putting my thoughts into, but its not finished, so I thought I'd submit these 2 tests to the public to start the conversation rolling.

To run them, clone the repo and checkout this test branch

./configure --disable-shared --enable-debug --enable-maintainer-mode
make
make test
cd tests/
./runtests.pl 3000
./runtests.pl 3001

And to see whats going on, in one terminal:

cd tests/ 
tail -f log/http_server.log  &
server/sws --pidfile ".http_server.pid" --logfile "log/http_server.log" --ipv4 --port 8990 --srcdir "."

And in another terminal, run the test that will pass.
It has a low speed limit of an "average" 100 bytes/s over a 6 second window.
The 3000 and 3001 tests are configured to pause every second. The sws server will send 200 bytes per pause, so the speed should be 200 bytes/sec.

Note that the "Current" speed can go above 200 bytes/sec, so there is a problem in the math there.

cd tests/
../src/curl --output log/curl3001.out  --include --trace-ascii log/trace3001 --trace-time http://127.0.0.1:8990/3001 --speed-limit 100 --speed-time 6 -F form1=012 

And in another terminal, run the test that will pass. It has a long long command line due to the long post-form-data that is delivered first.
Again, it has a low speed limit of an "average" 100 bytes/s over a 6 second window.
Again, the sws server will send 200 bytes per pause, so the speed should be 200 bytes/sec.

In this case, the Current speed drops to zero because the initial upload data biases the current speed calculation, and the speeder[] array updates will ignore all the downloaded data progress until the total downloaded bytes is more than the total uploaded bytes. Not a good approach.

Note that I don't like the speeder[] array approach much, I think there are problems with the fixed 6 slot history, and the fact that the speeder[] array can be updated on occasions (by Curl_pgrsUpdate) when the Curl_speedcheck() function is not subsequently called.

But thats part of the long winded email, I'm going to try to propose a better approach.
If anyone has any ideas, I'm all ears!

Here is the long command for test 3000:

cd tests/
../src/curl --output log/curl3000.out  --include --trace-ascii log/trace3000 --trace-time http://127.0.0.1:8990/3000 --speed-limit 100 --speed-time 60 -F form1=01234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789

@coveralls
Copy link

Coverage Status

Coverage decreased (-0.008%) to 73.233% when pulling 5d905db on paulharris:test-speedcheck into 7fffe97 on curl:master.

@bagder
Copy link
Member

bagder commented Jun 8, 2017

the initial upload data biases the current speed calculation, and the speeder[] array updates will ignore all the downloaded data progress until the total downloaded bytes is more than the total uploaded bytes. Not a good approach

Nah, that's really stupid. I think the "current speed" should be the maximum speed in either direction for this to work properly. Possibly it should be both directions?

the speeder[] array can be updated on occasions (by Curl_pgrsUpdate) when the Curl_speedcheck() function is not subsequently called

I think we can fix this pretty neatly by moving the call to Curl_speedcheck to happen from within Curl_pgrsUpdate() instead and remove the invokes from everywhere else. Should make it harder to do this by mistake. Agree?

bagder added a commit that referenced this pull request Jun 8, 2017
... by keeping track of the current speed for both upload and download
simultaneously and separately, the single "current speed" used
internally is now the highest of the two values.

Reported-by: Paul Harris
Fixes #1556
@bagder
Copy link
Member

bagder commented Jun 8, 2017

@paulharris: what about my suggested fix in #1559 as a first step?

@paulharris
Copy link
Author

paulharris commented Jun 8, 2017

I haven't checked what you've done yet, but I wanted to fire off an email while you are online.

I was thinking about the approach to solve, and I was thinking...

SPEEDCHECK THRESHOLD

Allow Curl_speedcheck to remember the last total transferred bytes (up+down) and the last "passed threshold time". I don't see why we should use any of the update's transfer speeds at all.
I interpret speedcheck's "average speed within time" to mean "must transfer LIMITxTIME bytes within TIME", so we just need to reset the timer whenever it has transferred enough bytes.
So, if LIMIT=100 bytes/sec, TIME=20 sec, then
THRESHOLD=100x20=2000 bytes must be transferred within a 20 sec window.
You record the start time and the total_transferred at that time.
Each time speedcheck() is called, it checks if
(TRANSFERRED_NOW - TRANSFERRED_START) >= THRESHOLD
if true, then reset... record the start_time and total_transferred again.
If false, and (NOW - START_TIME) >= TIME then its too slow.

WHEN TO CALL SPEEDCHECK

Curl_pgrsUpdate is called in about 20 different places, but Curl_speedcheck is called in only < 8 places.
I don't know when Curl_speedcheck should be called, but putting it directly in prgsUpdate might be a significant change.
eg, we probably don't want to be checking for low speeds when negotiating with the proxy or doing name lookups.

CURRENT SPEED

Instead of speeder, perhaps we should just be adding the dl and ul speeds together?
ie current_speed = ulspeed+dlspeed ?
Why is that not a good approach?
My guess is that ul/dlspeed is a longer term average, whereas current_speed is supposed to be the speed now.
In which case, why not use a rate smoothing style calculation, like this one I use (edited as example)...

void init() {
   state.window_len = 0;
   state.ul_rate_average = 0;
   state.dl_rate_average = 0;
}

void update( Time now_utc, Bytes ulnow, Bytes dlnow ) {
   // ignore updates after a zero time gap!
   if (now_utc > state.last_time_utc) {
      static const double max_window_secs = 60;

      double secs = secs_diff(now_utc-state.last_time_utc);
      double ul_rate = (ulnow-state.last_ul) / secs;
      double dl_rate = (dlnow-state.last_dl) / secs;

      // normally, smoothing = 1/200 ie 1 part in X minutes,
      // but our 'secs' could be different to 1...
      state.window_len = std::min(max_window_secs, state.window_len + secs);
      double smoothing = std::min(1.0, secs / state.window_len);  // don't let ratio be > 1
      state.ul_rate_average = ul_rate * smoothing + (1-smoothing) * state.ul_rate_average;
      state.dl_rate_average = dl_rate * smoothing + (1-smoothing) * state.dl_rate_average;

      state.last_time_utc = now_utc;
      state.last_ul = ulnow;
      state.last_dl = dlnow;
   }
}

Its not perfect, I need to improve it a bit,
but it can be tweaked to better bias towards more recent transfer rates and for a smaller total window.

Then we would have a "total dl speed" (the Curl dlspeed as its implemented now)
and a "recent dl speed" (ie what speeder[] gets us)
and the total speeds are just the dl/ul speeds added together.

@paulharris
Copy link
Author

re: @paulharris: what about my suggested fix in #1559 as a first step?
I've commented in #1559 directly. I'll continue my approach on my branch now, lets see where it leads...

Rather than watching the computed current average speed,
which is not as specific as a simple total over a time window.
@paulharris
Copy link
Author

I've updated this with my suggested implementation.
Note that I have not changed where speedcheck is called, as I don't think I want to call it when its doing name lookups or the connection is just getting started, or I'm not sure where else...

I'll have to leave that to you to determine what is the best course of action in that regard.

At the moment, I figure it'll "be ok" since it'll "probably" get called as often as required, but I can't be sure.

@paulharris
Copy link
Author

paulharris commented Jun 9, 2017

Regarding the definition, the "R bytes/sec over T time" is a bit confusing.

If I had a time machine, I'd suggest just using "N bytes over T time".

At the moment, the minimum rate I can specify over a 60 second window is 1 byte/sec.
What if its a longpoll connection, and I want it as low as 1 byte/minute ?

Or 100 bytes / 2 minutes ? Minimum rate over 2 minutes possible is 120 bytes (1bps)

You can see someone had the same confusion here (I'm not involved in this project, I just found it via search):
collectd/collectd#985

Quote:
- rename LowLimitBytesPerSec to LowSpeedLimit.
- make LowSpeedLimit refer to bytes per second rather than bytes per
  Interval.

The reason I'm using the low-speed limit is actually to avoid using timeouts in longpoll connections, while at the same time watching for broken connections that Curl doesn't detect (by design, it seems?).
From: collectd/collectd#752

I've been seeing a problem in some production systems using write_http where cURL hangs forever (and I'm talking weeks) waiting for a request to complete, because the TCP socket according to the client remains open while the server considers it have been broken by network instability.

Looking at the code (and libCURL docs) it seems that there is no default timeout, and waiting forever is expected behaviour in this case.

I get the impression that this is the only current patch which would address this issue? Is that correct or have I missed some code which introduces some sort of timeout to the HTTP request?

Is the above the correct design, from Curl's point of view?
I see a few reasons for this here: https://stackoverflow.com/questions/14081696/curl-stop-working

Sorry if this is polluting this discussion...

@coveralls
Copy link

Coverage Status

Coverage increased (+0.004%) to 73.221% when pulling fce8e99 on paulharris:test-speedcheck into 844896d on curl:master.

@paulharris
Copy link
Author

Another question... see here:
https://forum.kodi.tv/showthread.php?tid=131873

Specifically,

* CURLOPT_LOW_SPEED_LIMIT combined with a restricted buffer will cause unexpected abort of playback in media when a pause or a skip back causes throttling of the transport below 1bps.

In short, they are streaming video. They pause the video (or the buffer becomes full) so they throttle the network speed.
But, the low speed limit is still in place and triggers a connection-lost.

In this scenario, shouldn't the Curl user adjust the low-speed-limit to match the throttled speed desired?

And if so, CAN the Curl user adjust the low-speed-limit as the code currently stands?
I believe if the user changes the thresholds, the code (as I've written it) will simply test against the new thresholds at the next call, and work as expected.

@paulharris
Copy link
Author

This might be related to the not-always-calling Curl_speedcheck() question...
https://stackoverflow.com/questions/33778157/curl-tftp-transfer-timeout

bagder added a commit that referenced this pull request Jun 9, 2017
@bagder
Copy link
Member

bagder commented Jun 9, 2017

First, I think we should focus on getting the existing functionality done right. I don't think it is valuable to scan a lot of random misunderstandings or others repeating our bugs before then. Once we believe we have ironed out our current problems, then I think we can start to see what we can do for the remaining problems (if possible).

The low speed timeout was never meant to be the all-covering solution for everyone. It is just one way for applications to kill almost idle connections. Applications can and should still implement their own logic if they want more fancy algorithms.

So, if LIMIT=100 bytes/sec, TIME=20 sec, then THRESHOLD=100x20=2000 bytes must be transferred within a 20 sec window.

Those first 20 seconds are easy - but what happens on the 21st second? Do you restart the counter at second 20 and wait another 20 seconds so it won't abort this connection until second 40?

By checking the speed every second, the situation where 2000 bytes come the first second and is then followed by 19 totally stalled seconds is detected at second 21 where it actually reaches 20 seconds in a row at less than 100 bytes/sec...

@paulharris
Copy link
Author

paulharris commented Jun 9, 2017 via email

@bagder
Copy link
Member

bagder commented Jun 9, 2017

I'm not sure what you're suggesting. If we ignore the problem #1559 is trying to fix, what is this new code bringing to libcurl that the existing code doesn't already handle?

I think the current code has several advantages compared to your version, including that the "current speed" is based on the last 5 seconds only. Limiting the speed math to the last N seconds is hard to do without using an array like this.

@paulharris
Copy link
Author

In short, I think your changes in #1559 are ok,
but the description of the limit doesn't match what it does.
If we change the description then I think its fine.

N = average num bytes/sec lower limit
T = num seconds to tolerate low speed

The current description implies (to me):

Over T seconds, the connection should be operating with an average speed of N bytes/sec.
That implies that over T seconds, N*T bytes must be transferred, or else its too slow.

The description should be changed to something more like...

The connection is defined as too slow if its total speed is lower than N bytes/sec averaged over a 5 second window.
A slow connection will be tolerated for T seconds, after which it is aborted with a CURL_OPERATION_TIMEOUT code.

Alternatively: The connection will be aborted (due to low speed) if it T seconds pass without 5*N bytes uploaded or downloaded within a 5 second window.

More thoughts

Ok, so the current speed is based on the total transferred in the last 5 seconds (divided by 5).
Each time Curl_pgrsUpdate() is called, the current speed is updated IF at least 1 second has passed.

Then, Curl_speedcheck() should be called at least once a second (right?)
It is not called every time update is called, yet, we haven't tacked that. Might be different for different protocols.
If speedcheck isn't called in one of those seconds, we might miss the connection's one brief moment of being fast enough. I could manufacture an edge case, but I don't think its required.

@bagder
Copy link
Member

bagder commented Jun 13, 2017

You seem to be hung up and not agree that we need more than one second to get an accurate "current speed". I don't think the fact that it uses the last 5 seconds to get that information is something we need to hammer the user with, it is just an implementation detail for how it gets the current speed (I'm concerned that using less will get very unstable number and thus instead cause other issues). And it uses less than 5 seconds the first 5 seconds of a transfer. Therefore, i don't think your suggested updated descriptions improve matters very much.

Then, Curl_speedcheck() should be called at least once a second (right?)

it should. But it is a fairly new requirement so I won't be surprised if there's some case where it doesn't happen. We should probably add a debug-build condition that checks and warns if that happens. Or have you figured out any situation where you know it happens?

@paulharris
Copy link
Author

I agree we need more than one second, that's logical.
5 seconds seems fine too. I think that is all good, and better than my longer-term-speed approaches.
And I agree my suggested description is not ideal.

My confusion (and I think others) comes from the wording of what an "average speed" is. Current description implies that the average speed over the whole time window should be above the limit.
ie for LIMIT=L bytes/sec, the minimum total bytes transferred over T seconds is T * L

That isn't the case.

The case is that there only needs to be one second in the time window where the "current speed" is above the limit.
ie for LIMIT=L bytes/sec, the minimum total bytes transferred over T seconds is L


Regarding the question regarding whether speedcheck is always called every second, I was thinking of adding some checks, ie when prgsupdate is called AND the speeder is updated, has speedcheck been called since the last speeder update?
But I don't see any other tests that would execute long enough to check the various protocols and scenarios,
And I am only using HTTP.
Who can run the library/exe with the tests enabled?

bagder added a commit that referenced this pull request Jun 14, 2017
@bagder
Copy link
Member

bagder commented Jun 16, 2017

Is this PR still valid or does #1567 replace this?

@paulharris
Copy link
Author

#1567 replaces it, in terms of the commit.
I'd still like to see some change to the description.
Would you like to finish that discussion here or move it to a new Issue ?

By the way, you are doing a great job with such a complex library.
I have more suggestions/fixes in the pipeline.

@bagder
Copy link
Member

bagder commented Jun 16, 2017

Would you like to finish that discussion here or move it to a new Issue ?

Due to the slight shift in focus from this PR, I think that's better taken to a new PR/issue. Makes it easier for everyone to keep track of the exact topic.

you are doing a great job with such a complex library

Thank you!

@bagder
Copy link
Member

bagder commented Jul 1, 2017

Let's consider this PR done. Open a new one if there are more/remaining things to fix.

@bagder bagder closed this Jul 1, 2017
@paulharris paulharris deleted the test-speedcheck branch July 6, 2017 15:49
@lock lock bot locked as resolved and limited conversation to collaborators Jan 18, 2019
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants