curl / Mailing Lists / curl-library / Single Mail
Buy commercial curl support from WolfSSL. We help you work out your issues, debug your libcurl applications, use the API, port to new platforms, add new features and more. With a team lead by the curl founder himself.

Re: Epoll performance issues.

From: James Read via curl-library <curl-library_at_cool.haxx.se>
Date: Tue, 24 Nov 2020 19:50:35 +0000

Hi,

On Tue, Nov 24, 2020 at 5:37 PM Tomalak Geret'kal via curl-library <
curl-library_at_cool.haxx.se> wrote:

> On 23/11/2020 20:16, James Read via curl-library wrote:
> > I have attempted to make two minimal codes that
> > demonstrate my problem.
> >
> > The first can be
> > downloaded from https://github.com/JamesRead5737/fast
> > <https://github.com/JamesRead5737/fast>
> > It basically recursively downloads http://www.google.com
> > <http://www.google.com>, http://www.yahoo.com
> > <http://www.yahoo.com> and http://www.bing.com
> > <http://www.bing.com>
> > I am able to achieve download speeds of up to 7Gbps with
> > this simple program
> >
> > The second can be downloaded
> > from https://github.com/JamesRead5737/slow
> > <https://github.com/JamesRead5737/slow>
> > The program extends the first program with an asynchronous
> > DNS component and instead of recursively downloading the
> > same URLs over and over again downloads from a list of
> > URLs provided in the http001 file. Full instructions are
> > in the README. What's troubling me is that this second
> > version of the program only achieves average download
> > speed of 16Mbps.
> >
> > I have no idea why this is happening. Shouldn't the second
> > program run just as fast as the first?
> >
> > Any ideas what I'm doing wrong?
>
> That's a lot of code you're asking us to debug.
>
>
Sorry, I've tried my best to produce a minimal reproducer. The code is
largely based on the example at https://curl.se/libcurl/c/ephiperfifo.html


> Have you profiled it?


The fast program produces the following flat profile:

Each sample counts as 0.01 seconds.
  % cumulative self self total
 time seconds seconds calls ms/call ms/call name
 26.72 0.04 0.04 1 40.08 125.25 crawler_init
 23.38 0.08 0.04 11051513 0.00 0.00 event_cb
 23.38 0.11 0.04 11072333 0.00 0.00 check_multi_info
  6.68 0.12 0.01 11083187 0.00 0.00 mcode_or_die
  6.68 0.13 0.01
_curl_easy_getinfo_err_curl_off_t
  3.34 0.14 0.01 21722 0.00 0.00 timer_cb
  3.34 0.14 0.01 multi_timer_cb
  3.34 0.15 0.01 write_cb
  0.00 0.15 0.00 24830 0.00 0.00 print_progress
  0.00 0.15 0.00 22447 0.00 0.00 remsock
  0.00 0.15 0.00 10854 0.00 0.00 new_conn
  0.00 0.15 0.00 10854 0.00 0.00 transfers_dec
  0.00 0.15 0.00 10854 0.00 0.00 transfers_inc
  0.00 0.15 0.00 1561 0.00 0.00
 concurrent_connections_dec
  0.00 0.15 0.00 1561 0.00 0.00
 concurrent_connections_inc
  0.00 0.15 0.00 1561 0.00 0.00 setsock
  0.00 0.15 0.00 1224 0.00 0.00 addsock

The slow program produces the following flat profile:

Each sample counts as 0.01 seconds.
  % cumulative self self total
 time seconds seconds calls ms/call ms/call name
 38.51 0.05 0.05 1 50.06 115.13 crawler_init
 15.40 0.07 0.02 6491517 0.00 0.00 check_multi_info
 15.40 0.09 0.02 6479151 0.00 0.00 event_cb
  7.70 0.10 0.01 6500971 0.00 0.00 mcode_or_die
  7.70 0.11 0.01 13729 0.00 0.00 timer_cb
  7.70 0.12 0.01 multi_timer_cb
  3.85 0.13 0.01 11581 0.00 0.00 remsock
  3.85 0.13 0.01 6041 0.00 0.00 new_body_conn
  0.00 0.13 0.00 31665 0.00 0.00 starts_with
  0.00 0.13 0.00 29448 0.00 0.00 print_progress
  0.00 0.13 0.00 9454 0.00 0.00 get_host_from_url
  0.00 0.13 0.00 9454 0.00 0.00 transfers_dec
  0.00 0.13 0.00 9454 0.00 0.00 transfers_inc
  0.00 0.13 0.00 5270 0.00 0.00
 concurrent_connections_dec
  0.00 0.13 0.00 5270 0.00 0.00
 concurrent_connections_inc
  0.00 0.13 0.00 5270 0.00 0.00 setsock
  0.00 0.13 0.00 4633 0.00 0.00 addsock
  0.00 0.13 0.00 3413 0.00 0.00 new_head_conn
  0.00 0.13 0.00 416 0.00 0.00 parsed_sites_inc




> Have you tried narrowing down the
> problem to a smaller testcase?


I've done my best to cut this down. My program is much larger. If I was to
cut anything else out the epoll wouldn't work and I wouldn't be able to
illustrate the performance problems I'm getting.


> I find it hard to believe
> that these are minimal.
>
> Also, there is no recursion here.
>
>
My mistake. I meant repeatedly. The fast program repeatedly downloads the
same URLs so I guess there is a slight speed up from reusing connections.
But not of the order of magnitude of the problems I am witnessing. I
suspect there may be a problem with the way libcurl handles multiple new
connections but I am hoping there is some kind of mistake my end.

James Read


> Cheers
>
> -------------------------------------------------------------------
> Unsubscribe: https://cool.haxx.se/list/listinfo/curl-library
> Etiquette: https://curl.se/mail/etiquette.html


-------------------------------------------------------------------
Unsubscribe: https://cool.haxx.se/list/listinfo/curl-library
Etiquette: https://curl.se/mail/etiquette.html
Received on 2020-11-24