Multi interface efficiency woes
Date: Sat, 19 Feb 2011 11:24:10 -0800
The documentation for how to use the libcurl-multi interface states:
"Your application can acquire knowledge from libcurl when it would
like to get invoked to transfer data, so that you don't have to
busy-loop and call that curl_multi_perform(3) like crazy.
curl_multi_fdset(3) offers an interface using which you can extract
fd_sets from libcurl to use in select() or poll() calls in order to
get to know when the transfers in the multi stack might need
attention. This also makes it very easy for your program to wait for
input on your own private file descriptors at the same time or perhaps
timeout every now and then, should you want that."
In practice, however, there's little difference - we end up busy
looping anyway. Why?
Assume the simplest case, where there's only one pending request. The
first time we call curl_multi_perform() after adding the easy handle,
libcurl connects to the server. Then, when we call
curl_multi_fdset(), only the write_fd_set contains any descriptors.
We select() on that and when that fd becomes writable (after the
connect completes), we call curl_multi_perform() (because that's what
we do when select says the socket is ready, as the documentation
recommends). At that point, curl issues the HTTP request and returns
control to us. Then we call curl_multi_fdset() again, and we get back
that single fd in the write_fd_set.
And until the response is returned, the underlying descriptor will
*always* be writable after it has connected. But since we have no
knowledge of what's going on with the underlying HTTP transaction, we
ignorantly keep calling curl_multi_perform() anyway. This is very
expensive, but we can't stop doing it until we get our response. The
CPU expense is proportional to the response latency.
What can be done to make this more efficient?
List admin: http://cool.haxx.se/list/listinfo/curl-library
Received on 2011-02-19