Menu

#1298 libcurl will be hang with libevent when timeout

closed-fixed
None
1
2013-12-15
2013-11-08
he qin
No

hi,

I use libcurl & libevent to build a project (like docs/examples/hiperfifo.c) . I found that the easy handle will be hang in some case such as network timeout.

After reading the libcurl code and testing, I found the reason is libcurl control the timeout is not exact.

This is my case:
1. set easy handle timeout is 80ms.
2. add easy handle to multi
3. when multi called the sock_cb, set the event, and set the timeout(80ms)
4. network timeout (easy handle read timeout)
5. libevent tell the curl timeout and curl call multi_socket function.
6. in multi_socket function, add_next_timeout will be called at last few line, and It will clean the timeoutlist for this easy handle because timeout occur. (at this moment, the easy hanle's state is CURLM_STATE_PERFORM)
7. next, it will call multi_runsingle function to process this easy handle, but in multi_runsingle function, Curl_timeleft will be called to check timeout occur or not. in Curl_timeleft it used the (now - t_startsingle). pay attention to this point, this may be 1ms, because Curl_tvdiff is not exact, such as : now(46151672,246084) - t_startsingle(46151672,166088) = 79ms, timeout is 80ms, so 80 - 79 = 1ms.

  1. next, libcurl will be think this easy handle not timeout, but actually timeout occur, and the timeoutlist was cleared, so it will never be woke up. this easy handle will be hang for ever.

I think the logic for this code, I found that the real reason is that
"Curl_pgrsTime(data, TIMER_STARTSINGLE);" be called after "Curl_expire(data, data->set.timeout);" . I try to move the "Curl_pgrsTime(data, TIMER_STARTSINGLE);" before ""Curl_expire(data, data->set.timeout);" , the bad case never occur.

In a word, call the timer before set t_startsingle, will let libevet premature wake up the libcurl to process, and libcurl will think this not timeout.

of course, this bad case will occur fewly, but when this occur, the result will be very terrible.

1 Attachments

Discussion

  • Daniel Stenberg

    Daniel Stenberg - 2013-11-11

    Thanks for the report and all the details. I'm a bit short on time but I intend to look into this asap.

     
  • Daniel Stenberg

    Daniel Stenberg - 2013-11-11
    • assigned_to: Daniel Stenberg
     
  • Daniel Stenberg

    Daniel Stenberg - 2013-11-13

    I don't fully understand what change you're proposing. Can you show us a diff with your suggestion?

     
    • he qin

      he qin - 2013-11-13

      in curl-7.32.0 version:

      delete Line 1039("Curl_pgrsTime(data, TIMER_STARTSINGLE);") in lib/multi.c
      add "Curl_pgrsTime(data, TIMER_STARTSINGLE);" before Line 1304(if(data->set.timeout)) in lib/transfer.c

       
  • Daniel Stenberg

    Daniel Stenberg - 2013-12-02
    • status: open --> open-confirmed
     
  • Daniel Stenberg

    Daniel Stenberg - 2013-12-02

    I suggest we instead fix this by adding the inaccuracy margin to the timeout value to make sure it never fires before the timeout. Like the attached patch.

     
    • he qin

      he qin - 2013-12-03

      I don't think this is a good patch, because it will later than expected. There is so many delay in libcurl, such as in multi_socket function. It is not a friendly for user when the millisecond timeout be set. For example, I set 10 milliseconds timeout , but must wait for 13 milliseconds. On the other hand, I think ahead the setting of TIMER_STARTSINGLE will make no bad effect and no difference for logic.

      How do you think?

       
  • Daniel Stenberg

    Daniel Stenberg - 2013-12-03

    I think you're looking at the problem and solution too narrowly and for your specific case only.

    The reason for the timeout inaccuracy define in the first place is that timers provided by operating systems and event libraries are imprecise. Most timers have a small time window during which it can trigger, which starts slightly before the given timeout. On some systems this is bigger than on others.

    So, we need to handle this case. We have lib/multi.c:multi_socket() run timeouts that "soon" expires, but it isn't working properly since we have fixed limits elsewhere like in your case basically the Curl_timeleft() or similar.

    In your specific case we can possibly adjust the code ever so slightly slightly so that we push some microseconds to make the math then round to 80 instead of 79, but I don't consider that to be a complete and solid fix for the general problem we have with this timeout check.

     
    • he qin

      he qin - 2013-12-05

      OK, I know what you mean. You are right like what you said. Time window must exist. But One more question, can we provide an interface to let user set the time window? Maybe it's more friendly!

       
  • Daniel Stenberg

    Daniel Stenberg - 2013-12-06

    I'm not totally against such an option, but I would prefer that we first explored the ability to figure out a really good default value.

    The big problem with having this as an option is that users will have a really hard time to figure out what to set it to and quite possibly it will even be tricky to test and verify that your value is always working is it should.

    A lot of older systems have 10 ms resolution on most timers. If you write an application that might run on an older system, your app would need to dynamically set this value depending on where it runs!

     
    • he qin

      he qin - 2013-12-11

      OK
      I know what you mean, thank you very much

       
  • Daniel Stenberg

    Daniel Stenberg - 2013-12-15
    • status: open-confirmed --> closed-fixed
     
  • Daniel Stenberg

    Daniel Stenberg - 2013-12-15

    Ok, this change has been pushed now as commit be28223f35. I'll of course be interested in ways so improve the accuracy but this should at least avoid the mentioned bug, at the cost of making the timeout less accurate...

    Case closed for now.