Menu

#1238 progressfunction called too much

closed-fixed
5
2015-01-11
2013-06-03
No

(bug reported by Miguel Angel in http://curl.haxx.se/mail/lib-2013-05/0254.html)

I am using pycurl in a small script to download coursera videos, it's working
fine but the callback i installed to display a progress bar in console it's
called 22188558 times in 70 seconds of execution, it hangs 1 core of the machine
all the time.
given the documentation states should be called "roughly once per second or
sooner" this seems too much for me, i am using this code to use it, it's python
but i think it's pretty readable.

::: python 
curl = pycurl.Curl() 
curl.setopt(curl.URL, url) 
if rate_limit is not None: 
    curl.setopt(curl.MAX_RECV_SPEED_LARGE, rate_limit) 
file_store = open(filename, "wb") 
curl.setopt(curl.WRITEDATA, file_store) 
curl.setopt(curl.NOPROGRESS, 0) 
curl.setopt(curl.PROGRESSFUNCTION, curl_progress) 
curl.setopt(curl.FOLLOWLOCATION, 1) 
#cookies 
curl.setopt(curl.COOKIEJAR, cookies_filename) 
curl.setopt(curl.COOKIEFILE, cookies_filename) 
try: 
    curl.perform() 
except: 
    import traceback 
    my_logger.error(u"Error downloading file: %s" % traceback.format_exc()) 
#cleaning 
curl.close() 
file_store.close()

no matter that i define curl_progress as:
def curl_progress(dl_total, dl_now, ul_total, ul_now):
return 0
it's called so much times that hangs 1 cpu core anyway.

Discussion

  • Nach M. S.

    Nach M. S. - 2013-06-03

    I can confirm that this occurs, and only when rate limiting for upload/download is in place.

     
  • Nach M. S.

    Nach M. S. - 2013-06-10

    I think I tracked down the bug.
    Please see: http://i.imgur.com/rfDXcYq.png

    I'll go do some tests.

     
  • Nach M. S.

    Nach M. S. - 2013-06-11

    Okay ran some tests.

    Before patching:
    On average: 2907 calls for 3.5 seconds.

    After patching:
    On average: 5.5 calls for 3.5 seconds.

     
  • Nach M. S.

    Nach M. S. - 2013-06-11

    On a side point, the progress functions variables internally are:
    curl_off_t size_dl; / total expected size /
    curl_off_t size_ul; / total expected size /
    curl_off_t downloaded; / transferred so far /
    curl_off_t uploaded; / transferred so far /

    Yet called as:
    result= data->set.fprogress(data->set.progress_client,
    (double)data->progress.size_dl,
    (double)data->progress.downloaded,
    (double)data->progress.size_ul,
    (double)data->progress.uploaded);

    Is it really wise to cast these to a double?

    I know this can't be changed without breaking backwards compatibility. But perhaps an alternate callback can be defined with non-floating point callback variables?

     
  • Daniel Stenberg

    Daniel Stenberg - 2013-06-11

    Thanks Nach, but that patch is not actually fixing the problem only the symptom.

    That check has always been like that and it is not the correct way of rate limiting the progress callback. The problem we need to solve can be phrased: Why is the progress-checking function called that often while rate-limited compared to when not rate-limited?

     
  • Nach M. S.

    Nach M. S. - 2013-06-11

    I think I have the answer to your why.

    From multi.c:

    case CURLM_STATE_TOOFAST: /* limit-rate exceeded in either direction */
      /* if both rates are within spec, resume transfer */
      if(Curl_pgrsUpdate(easy->easy_conn)) /*!!Progress Callback parent called*/
        easy->result = CURLE_ABORTED_BY_CALLBACK;
      else
        easy->result = Curl_speedcheck(data, now);
    
      if(( (data->set.max_send_speed == 0) ||
           (data->progress.ulspeed < data->set.max_send_speed ))  &&
         ( (data->set.max_recv_speed == 0) ||
           (data->progress.dlspeed < data->set.max_recv_speed)))
        multistate(easy, CURLM_STATE_PERFORM);
      break;
    

    I believe you're calling Curl_pgrsUpdate() here in order to update progress.ulspeed and progress.dlspeed, additionally, you're calling Curl_pgrsUpdate() other places in the code as well, unrelated to speed limiting.

    To solve, the parts within Curl_pgrsUpdate() which update progress.ulspeed and progress.dlspeed should probably be factored into a new function. Then this segment in multi.c can call the new function, while other places can continue to call Curl_pgrsUpdate() which in turn calls the new function.

    Additionally, I noticed that speed limiting on transfering large files isn't precisely accurate. Meaning I've maxed an upload to 8MB/s, yet external reporting tools are telling me the upload is approaching 9MB/s. This probably means that a moving average somehow isn't being precisely accounted for...

     

    Last edit: Nach M. S. 2013-06-11
  • Daniel Stenberg

    Daniel Stenberg - 2013-06-11

    The problem is why that code is reached that frequently. If deemed to be TOOFAST, it should abstain from going there for a while but instead it seems to get back there very frequently.

    (and if you want to work on improving rate limiting accuracy, please be my guest but that's separate discussion)

     

    Last edit: Daniel Stenberg 2013-06-11
  • Dan Fandrich

    Dan Fandrich - 2013-06-11

    Don't forget about network overhead and MiB vs MB units when comparing external reporting tools measuring network traffic. Those can account for a 10% difference in measurements.

     
  • Daniel Stenberg

    Daniel Stenberg - 2013-06-13

    I've found the problem, will post a patch shortly

     
  • Daniel Stenberg

    Daniel Stenberg - 2013-06-13

    Here's my stab at fixing this. It was a rather awful bug actually that made the loop in curl_easy_perform() busy-loop during certain conditions.

     
  • Nach M. S.

    Nach M. S. - 2013-06-13

    I tested this patch against 7.30.0 as well as latest GIT, and in both cases, I'm still seeing my progress function being called hundreds of times per second.

     

    Last edit: Nach M. S. 2013-06-13
  • Nach M. S.

    Nach M. S. - 2013-06-13

    Okay, this above patch definitely does help.

    In a new test case I'm running:
    Before: 50225479 calls over 8 seconds.
    After: 1389 calls over 8 seconds.

    So a definite improvement, and CPU is no longer being taxed like mad. However, the amount of calls is still way way above "about once per second".

    I notice now also when I don't limit upload/download speed, that it is also being called several hundred times per second. So it seems this patch fixes the issue where rate limiting itself causes an insane amount of callback, which is excellent. But we still need to fix the issue that too much calling is occuring per second as well.

     
  • Daniel Stenberg

    Daniel Stenberg - 2013-06-13

    That's a completely different matter and is in fact by design. We can discuss changing that in some ways, but that's how libcurl has been working for a long time. The problem this bug report is about and what my patch tries to address, is the very high frequency of calls when in fact nothing has changed and it just busy-loops.

     
  • Daniel Stenberg

    Daniel Stenberg - 2013-06-13
    • status: open-confirmed --> closed-fixed
     
  • Daniel Stenberg

    Daniel Stenberg - 2013-06-13

    Thanks for the report, thanks for testing my patch!

    This fix has been pushed as commit 0feeab7802. I consider this bug fixed now and we can discuss possible callback frequency changes on the mailing list. Closing this.

     
  • Nach M. S.

    Nach M. S. - 2013-06-13

    Thanks for fixing this bug, it is greatly appreciated. I started a discussion about the other issue on the mailing list.

     
  • Miguel Angel Rasero

    I can confirm as well that this patch have fixed the cpu usage problem in my python script, thanks for fix this bug :)