Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

major performance regression in librepo after upgrade to libcurl-7.61.1 #2948

Closed
kdudka opened this issue Sep 6, 2018 · 7 comments
Closed

Comments

@kdudka
Copy link
Contributor

kdudka commented Sep 6, 2018

Commit caa4efb causes a severe performance regression in librepo code. The dnf test-suite took 1.8s to complete before the patch but now it takes 96s to complete. The reason is that curl_multi_timeout() now returns -1 if there are no transfers running whereas it returned 0 before the mentioned patch. Consequently, librepo calls select() to sleep for 1s whenever all transfers complete before the call to curl_multi_timeout(). The code in question is available here:

https://github.com/rpm-software-management/librepo/blob/7f147258b61b607ae2e311e6ffb57830bc6be27c/librepo/downloader.c#L1874

My original impression was that this is a bug of librepo as they should not call curl_multi_timeout() at all when curl_multi_perform() returns still_running == 0. However, then I realized that their code is heavily based on the following libcurl example, which does exactly the same thing:

/* we start some action by calling perform right away */

@bagder
Copy link
Member

bagder commented Sep 6, 2018

Tricky.

The commit you refer to fixed a real bug. Before that, easy handles could be left in the list splay tree for timeouts even though the transfer had indeed ended and should've been removed. But fixing that bug caused some subtle changes in what curl_multi_timeout() would return. We even had mistakes around the return code in our own test cases so I'm certainly not blaming anyone for getting things wrong and libcurl did after all return this "wrong" value like this for a very long time.

curl_multi_timeout() could still return -1 even before this fix, for example before any handles were added and similar. It was just more rare.

Timeout value 0 means run again straight away and that was clearly not the right thing for that situation.

However, I don't see the problem you mention in multi-app.c? This example doesn't do the 100ms sleep based on the timeout value, it does it based on the maxfd returned from curl_multi_fdset() and that didn't change in this commit did it? The example should probably rather do the sleep based on the curl_multi_timeout() return value to make it behave better...

@kdudka
Copy link
Contributor Author

kdudka commented Sep 7, 2018

Indeed. The code of examples/multi-app.c and librepo/downloader.c has diverged over the time and the example now suffers from the mentioned performance penalty independently of commit caa4efb. But this is just a coincidence -- the performance regression in that example has already been introduced by commit cb13fad (and on Windows already via a607f8a).

I changed the example to download file:///tmp/empty0 and file:///tmp/empty1 and it was wastefully sleeping 100ms after a successful download. Do you think it is appropriate?

This could be fixed in the example by using a while loop instead of do/while loop (perhaps together with adding a check for the return value of curl_multi_perform()):

--- a/docs/examples/multi-app.c
+++ b/docs/examples/multi-app.c
@@ -74,7 +74,7 @@ int main(void)
   /* we start some action by calling perform right away */
   curl_multi_perform(multi_handle, &still_running);

-  do {
+  while(still_running) {
     struct timeval timeout;
     int rc; /* select() return code */
     CURLMcode mc; /* curl_multi_fdset() return code */
@@ -142,7 +142,7 @@ int main(void)
       curl_multi_perform(multi_handle, &still_running);
       break;
     }
-  } while(still_running);
+  }

   /* See how the transfers went */
   while((msg = curl_multi_info_read(multi_handle, &msgs_left))) {

However, people often base their code on the examples distributed with libcurl. So the mentioned change might have broken more libcurl-based software than just librepo.

@kdudka
Copy link
Contributor Author

kdudka commented Sep 7, 2018

Actually, it was commit acefdd0 what landed in the master branch (I was referring to an obsolete PR commit by mistake).

@bagder
Copy link
Member

bagder commented Sep 7, 2018

This could be fixed in the example by using a while loop instead of do/while loop

We really should do that fix, yes. As a matter of fact I think it would make sense to take this as an opportunity and go through the examples with a comb and make sure that they actually are using the API the way we want them to and the way the API is documented to work... It is very unfortunate when our own examples mislead users.

@kdudka
Copy link
Contributor Author

kdudka commented Sep 7, 2018

I am not sure about http2-serverpush.c -- using a while loop instead of do/while loop would break the logic I am afraid. Should we just skip select/sleep in case the first call to curl_multi_perform() returns still_running == 0?

@kdudka kdudka closed this as completed in 1d173f3 Sep 10, 2018
falconindy pushed a commit to falconindy/curl that referenced this issue Sep 10, 2018
@cgwalters
Copy link

Would it be fair to say that this issue just affects libcurl users that are using libcurl as their "main loop", as opposed to integrating with an external mainloop (e.g. glib)?

librepo does the former, where as e.g. libostree does the latter.

@kdudka
Copy link
Contributor Author

kdudka commented Sep 11, 2018

This is a documentation issue. Some of the examples distributed with libcurl used to use its API in a wrong way. The code of librepo was based on those examples, which triggered a problem later on. The code you refer to does not seem to be based on the same examples, so it is unrelated to this issue.

Are you experiencing any performance regression in libostree after upgrade to libcurl-7.61.1?

clrpackages pushed a commit to clearlinux-pkgs/librepo that referenced this issue Oct 31, 2018
….9.2

commit 313a7644d03f9657ee0c7aa747d1db260f8e2fc0
Author: Jaroslav Mracek <jmracek@redhat.com>
Date:   Tue Sep 25 14:04:26 2018 +0200

    Release 1.9.2

commit 420e78d8b7bb69d3041630bc4b901917d101d9ba
Author: Kamil Dudka <kdudka@redhat.com>
Date:   Fri Sep 7 14:53:23 2018 +0200

    fix major performance regression with libcurl-7.61.1

    See curl/curl#2948 for details.
@lock lock bot locked as resolved and limited conversation to collaborators Dec 10, 2018
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Development

No branches or pull requests

3 participants