Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Downloading a long sequence of URLs results in high CPU usage and slowness #429

Closed
greafhe opened this issue Sep 11, 2015 · 9 comments
Closed

Comments

@greafhe
Copy link

greafhe commented Sep 11, 2015

Here is what I did
In one terminal run: python -m SimpleHTTPServer 8888
In another terminal run: curl "localhost:8888/[1-1000000]" > /dev/null
In the beginning the speed was over 2000 URLs/second. By the time it reached the 50000th URL, the speed was under 300 URLs/second, and curl's CPU usage was about twice as high as it was at the start.
Restarting curl (without restarting the server) results again in high speed in the beginning, then gradually dropping. Same if I use a different range, e.g. [100000-1000000].

@greafhe
Copy link
Author

greafhe commented Sep 11, 2015

curl --version
curl 7.35.0 (x86_64-pc-linux-gnu) libcurl/7.35.0 OpenSSL/1.0.1f zlib/1.2.8 libidn/1.28 librtmp/2.3
Protocols: dict file ftp ftps gopher http https imap imaps ldap ldaps pop3 pop3s rtmp rtsp smtp smtps telnet tftp
Features: AsynchDNS GSS-Negotiate IDN IPv6 Largefile NTLM NTLM_WB SSL libz TLS-SRP

(from the Ubuntu 14.04 repository)

@iboukris
Copy link
Contributor

Interesting test. I've run it and got similar results with curl 7.45 (fedora 21).
There seem to be a noticeable decrease of the speed and increase of CPU usage over time.

On the other hand, similar test with libcurl doesn't show this behaviour.
The test below shows rather constant speed (and about twice as fast).

#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <curl/curl.h>

int main(int argc, char *argv[])
{
  CURLcode ret;
  CURL *hnd;
  int i;
  char base_url[] = "http://localhost:8888/";

  hnd = curl_easy_init();

  char *url = (char *) malloc(sizeof(base_url) + 7);
  strcpy(url, base_url);

  for(i = 0; i < 1000000; i++) {
    sprintf(url + sizeof(base_url) - 1, "%i", i);
    curl_easy_setopt(hnd, CURLOPT_URL, url);
    ret = curl_easy_perform(hnd);
  }

  curl_easy_cleanup(hnd);

  return ret;
}

I can't tell why the tool behaves that way and I'm not sure I've considered all the aspects.

@jay
Copy link
Member

jay commented Sep 12, 2015

 80.83%  curl  libcurl.so.4.3.0    [.] Curl_slist_append_nodup
 16.00%  curl  [kernel.kallsyms]   [k] 0xffffffff8114f406
  0.23%  curl  libcurl.so.4.3.0    [.] dprintf_formatf
  0.15%  curl  libc-2.19.so        [.] _IO_vfscanf
  0.10%  curl  libc-2.19.so        [.] malloc
  0.10%  curl  libcurl.so.4.3.0    [.] Curl_setopt
  0.09%  curl  curl                [.] operate_do

perf report output from Ubuntu 14 x64 running curl built from master fad9604 2015-09-11.

The Curl_slist_append_nodup calls trace back to the curl tool's libcurl code generation. Therefore it is an issue exclusive to the curl tool. By default the code generation is enabled at build time and for some reason it's always generated at runtime if it was enabled at build time. So for every URL the tool is generating the source. And to generate the source it goes line by line. The lines are stored in a linked list with no tail pointer, and so for each line it has to traverse the list to get to the end to add another line. This conclusion is just based on my read of the code, I did not go more in depth for proof that's what's eating. It's the type of thing that is not noticeable with a URL or two but a million, yes, it's noticeable.

If this is the case I think the fix would be not to do code generation unless it's requested. Though I wonder if this is one of those things that's a whole lot easier to say than do.

Until this is resolved you could build curl with configure option --disable-libcurl-option and it should perform better.

@bagder
Copy link
Member

bagder commented Sep 12, 2015

Nice catch @jay, and indeed a silly flaw. We should of course only waste that time and memory if the user actually asked for code to get generated...

@bagder
Copy link
Member

bagder commented Sep 12, 2015

Oh, and we should probably also consider using a linked list to which we can append strings at no extra cost even when the list gets very long so that this use case still would work fast and nice even when enabled.

@iboukris
Copy link
Contributor

Nice indeed (and thanks for the perf tip)!
I can confirm that after I disable libcurl-option the speed is steady (and faster).

@bagder
Copy link
Member

bagder commented Sep 16, 2015

Anyone volunteering to work on improving this piece of code?

@gnawhleinad
Copy link
Contributor

Oh, and we should probably also consider using a linked list to which we can append strings at no extra cost even when the list gets very long so that this use case still would work fast and nice even when enabled.

We should still pursue this proposal! Should I create a new issue for this?

@bagder
Copy link
Member

bagder commented Sep 20, 2015

We should still pursue this proposal! Should I create a new issue for this?

Yes, it would be good and yeah a separate issue/pull-request makes it more convenient.

jay added a commit that referenced this issue Sep 21, 2015
- Review of 4d95491.

The author changed it so easysrc only initializes when --libcurl but did
not do the same for the call to easysrc cleanup.

Ref: #429
gnawhleinad added a commit to gnawhleinad/curl that referenced this issue Sep 22, 2015
The easysrc generation is run only when --libcurl is initialized.

Ref: curl#429
bagder pushed a commit that referenced this issue Sep 22, 2015
The easysrc generation is run only when --libcurl is initialized.

Ref: #429

Closes #448
gnawhleinad added a commit to gnawhleinad/curl that referenced this issue Sep 25, 2015
Using a last cache linked-list improves the performance of easysrc generation.

Bug: curl#444
Ref: curl#429
bagder pushed a commit that referenced this issue Oct 17, 2015
Using a last cache linked-list improves the performance of easysrc
generation.

Bug: #444
Ref: #429

Closes #452
jgsogo pushed a commit to jgsogo/curl that referenced this issue Oct 19, 2015
Code should only be generated when --libcurl is used.

Bug: curl#429
Reported-by: @greafhe, Jay Satiro

Closes curl#429
Closes curl#442
jgsogo pushed a commit to jgsogo/curl that referenced this issue Oct 19, 2015
- Review of 4d95491.

The author changed it so easysrc only initializes when --libcurl but did
not do the same for the call to easysrc cleanup.

Ref: curl#429
jgsogo pushed a commit to jgsogo/curl that referenced this issue Oct 19, 2015
The easysrc generation is run only when --libcurl is initialized.

Ref: curl#429

Closes curl#448
@lock lock bot locked as resolved and limited conversation to collaborators May 7, 2018
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Development

No branches or pull requests

5 participants