Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Wrong Curl_timeleft result leads to connection timeout when CURLOPT_TIMEOUT_MS / CURLOPT_CONNECTTIMEOUT_MS is close to LONG_MAX #5583

Closed
ioancea opened this issue Jun 19, 2020 · 9 comments

Comments

@ioancea
Copy link

ioancea commented Jun 19, 2020

I did this

I reached into this problem by trying to simulate an "infinite" timeout for the connection. Since 0 means 300 seconds, it felt natural to force it to LONG_MAX to achieve the goal of "no connection timeout"

  1. Prerequisites:
  • configure curl without threaded resolver (aka --disable-threaded-resolver) - I have similar requirements as here "./configure --enable-threaded-resolver --disable-rt" doesn't work #3294

  • the system should be under a bit of load but it also reproduces if the program is run through a profiler, like valgrind (it just needs a bit of delay between instructions)

  • set CURLOPT_TIMEOUT_MS or/and CURLOPT_CONNECTTIMEOUT_MS to LONG_MAX (or close enough to it)

  • if curl is compiled on debug, disable signals / alarms by setting CURLOPT_NOSIGNAL to 1L (this is to workaround the DEBUGASSERT from warnless.c:295)

  1. Create a small program which does a simple GET. I've adapted the example from https://curl.haxx.se/libcurl/c/simple.html to include the following 2 lines:
curl_easy_setopt(curl, CURLOPT_NOSIGNAL, 1L);
curl_easy_setopt(curl, CURLOPT_CONNECTTIMEOUT_MS, LONG_MAX);
  1. Run the example with and without valgrind. On my system, with valgrind, the above fails with "Timeout was reached"
$ valgrind --leak-check=full --show-reachable=yes ./curl_example
[...irrelevant valgring messages]
curl_easy_perform() failed: Timeout was reached

I expected the following

The example should work since the timeout is "infinite" (aka LONG_MAX)

curl/libcurl version

Latest release: libcurl/7.70.0, Release-Date: 2020-04-29

operating system

I've run the example on WSL but it can be reproduced on genuine Linux systems as well.

Linux 4.19.84-microsoft-standard #1 SMP Wed Nov 13 11:44:37 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux

Analysis of the problem

Well, the title of the issue it's a bit misleading because the problem isn't in Curl_timeleft itself but in its parameters. See below my analysis

While checking for the root cause of the issue, I have debugged the following relevant parts:

Why does Curl_timediff return a negative value? The short answer is: because the newer timestamp is older than the older one :). What happens is the following:

I'm not proficient enough in curl's internals to understand why the now timestamp is calculated from outside of the multi_runsingle function. At a 1st glance, it looks natural to me to always use the actual current time to calculate the time left for an operation but i'm sure there must be an explanation for the current design.

Why is this problem only visible with --disable-threaded-resolver? It seems there's a special branch when async is enabled (see https://github.com/curl/curl/blob/curl-7_70_0/lib/multi.c#L1662, w/o the threaded resolver, aka the else branch, rc is set to CURLM_CALL_MULTI_PERFORM which makes the function reloop) which makes the multi_runsingle function to stop its loop right after progress.t_startsingle gets set. Then, at the next run, now will be recalculated so it will correctly end up being newer than progress.t_startsingle.

Questions

  • Is there a workaround for this problem rather than enabling the threaded resolver? From my tests, it would be enough to lower the value (e.g. instead of LONG_MAX, set LOG_MAX / 1000) in order not to easily overflow. Is this assumption correct?

  • Can this issue affect other things too? For example, i noticed the multi_runsingle function also sets the progress.t_startop (here https://github.com/curl/curl/blob/curl-7_70_0/lib/multi.c#L1624) which will be at a later time than now.

@ioancea ioancea changed the title Wrong Curl_timeleft result leads to connection timeout when CURLOPT_TIMEOUT_MS / CURLOPT_CONNECTTIMEOUT_MS is close to LONG_A Wrong Curl_timeleft result leads to connection timeout when CURLOPT_TIMEOUT_MS / CURLOPT_CONNECTTIMEOUT_MS is close to LONG_MAX Jun 19, 2020
@mback2k
Copy link
Member

mback2k commented Jun 19, 2020

Could you please test this against the current master branch? There are quite some changes to the timeout handling pending for the upcoming release next Wednesday. It would be good to know if the changes already fix your issue or if additional work is required.

@mback2k
Copy link
Member

mback2k commented Jun 19, 2020

Also CURLOPT_TIMEOUT_MS should be set to 0 to have no timeout at all.

@tawmoto
Copy link

tawmoto commented Jun 19, 2020

Also CURLOPT_TIMEOUT_MS should be set to 0 to have no timeout at all.

Indeed, but shouldn't LONG_MAX be also a valid value, also?

@bagder
Copy link
Member

bagder commented Jun 19, 2020

It should.

@bagder
Copy link
Member

bagder commented Jun 19, 2020

it looks natural to me to always use the actual current time to calculate the time left for an operation but i'm sure there must be an explanation for the current design.

It's done to reduce the number of calls to the time function. The problem here seems to be the mix of using the previous value and a refreshed value...

@bagder
Copy link
Member

bagder commented Jun 19, 2020

@ioancea do you feel like taking a stab at a PR to fix this?

@ioancea
Copy link
Author

ioancea commented Jun 20, 2020

@mback2k I've just checked it against the current master (aka hash 3f84b3e39a36b408d1a8cc603435a356b39b6c50) and the problem is still there. Btw, setting CURLOPT_TIMEOUT_MS to 0 won't make CURLOPT_CONNECTTIMEOUT_MS "unlimited" but will fallback to its default, aka 300 seconds (that's what I get from here https://github.com/curl/curl/blob/master/lib/connect.c#L192).

@bagder Thanks for the explanation, I'd assumed it has something to do with #5574 :)

The problem here seems to be the mix of using the previous value and a refreshed value...

Yes, since the timestamps that mark the start of operations (i.e. progress.startop and progress.startsingle) are set within the loop of multi_runsingle, the now value (which is calculated before calling multi_runsingle) will always be older than those timestamps as long as execution flow do not leave the function. Thus, before trying a fix for this issue, I would like to understand what's the responsibility of multi_runsingle in terms of when its loop should end and when not. That's because it can be fixed in two main ways and i'm not certain what path to chose: abort the loop after the timestamps are set (so that the next multi_runsingle will have a newer now) OR always use current time (at least for the connection) when it comes to measuring the time left - this will only be a small change here https://github.com/curl/curl/blob/master/lib/connect.c#L233

@bagder
Copy link
Member

bagder commented Jun 21, 2020

Out of curiosity: is there actually a valid real-world use case for a connect timeout longer than 300 seconds?

@bagder
Copy link
Member

bagder commented Jun 22, 2020

I would like to understand what's the responsibility of multi_runsingle in terms of when its loop should end and when not.

There's no clear rule there, but in general we stay in the loop until we run into a EWOULDBLOCK or equivalent.

bagder added a commit that referenced this issue Aug 24, 2020
Setting a timeout to INT_MAX could cause an immediate error to get
returned as timeout because of an overflow when different values of
'now' were used.

This is primarily fixed by having Curl_pgrsTime() return the "now" when
TIMER_STARTSINGLE is set so that the parent function will continue using
that time.

Reported-by: Ionuț-Francisc Oancea
Fixes #5583
@bagder bagder closed this as completed in a2c85bb Aug 28, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging a pull request may close this issue.

4 participants