cURL / Mailing Lists / curl-library / Single Mail

curl-library

[PATCH RFC] hyper/multi-socket API optimization

From: Robert Iakobashvili <coroberti_at_gmail.com>
Date: Tue, 10 Apr 2007 13:24:20 +0300

CURL VERSION: snapshot 7.16.2-20070326
HOST MACHINE and OPERATING SYSTEM:
linux, debian with debian kernel 2.6.18
COMPILER: gcc version 4.1.2 20061115 (prerelease) (Debian 4.1.1-21)
CURL CONFIGURATION:
configure --prefix $(CURL_BUILD) \
        --disable-thread \
        --enable-ipv6 \
        --with-random=/dev/urandom \
        --with-ssl=/usr/include/openssl \
        --enable-shared=no \
        CFLAGS="-g"

AREA/CLASS/EXAMPLE AFFECTED:
Optimization of function curl_multi_socket()

DOES THE PROBLEM AFFECT:
        COMPILATION? No
        LINKING? No
        EXECUTION? Yes
        OTHER (please specify)?

    MOTIVATION:

curl-loader has added recently so-called hyper mode, based
on curl example hipev.c and using libevent library demultiplexing
via epoll ().
/dev/epoll and epoll () event demultiplexing is known to scale
better with number of descriptors, than poll()/select().

When running curl-loader various loads and strace-ing them, we discovered,
that there are still lots of poll() syscalls involved.
The picture was similar with setups containing simulteneously loading
clients 200, 1000, 2000, 5000, 8000.

Below we present some observations for the experiment with 200 clients
run for 100 seconds, fetching by GET
a static file of 104K size with #strace -o tracefile for tracing.
The experiments were run against lighttpd server at the same host,
configured
itself to use epoll() and 32K max number of file descriptors.

Numbers of syscalls grepped are below:

Name Number Relative to send()
                                                      Number
epoll_wait: 1,305 -
epoll_ctl: 53,441 2
"poll(": 267,001 10
recv: 240,780 9
send: 27,220 1
gettimeofday: 1,562,830 55

Number of send() calls reflects the number of requests sent.
Number of recv () calls correlates well with the number of data chunks
received for the file of such size (server at local host) and seen by
wireshark/ethereal.

Number of epoll_ctl () is twice the number of requests, due to curl-loader
policy to remove a handle and add it back to multi-handle prior to a new
run.

In hipev.c curl example the function eventcallback() is called
by libevent, when an event is triggered at a socket fd.
Function eventcallback() itself calls curl_multi_socket() for the fd.
And inside the call flow is the following:

curl_multi_socket() ->
multi_socket () ->
multi_runsingle () ->
for state CURL_MULTI_PERFORM ->Curl_readwrite() ->
Curl_select(),
where poll () is called.

    DESCRIPTION:
When descriptor (socket) is known (e.g. by epoll)
as event-triggered, we may wish to pass to curl_multi_socket()-like
API not only the descriptor, but also the event bitmask to prevent
extra demultiplexing calls (extra poll()).

   PATCH - RFC:
------------------------------------------------
diff -Naru curl-7.16.2-20070326/include/curl/multi.h
curl-7.16.2-20070326-corr/include/curl/multi.h
--- curl-7.16.2-20070326/include/curl/multi.h 2006-10-13 04:00:
19.000000000 +0200
+++ curl-7.16.2-20070326-corr/include/curl/multi.h 2007-04-06 22:14:
26.000000000 +0300
@@ -222,8 +222,16 @@
 #define CURL_POLL_INOUT 3
 #define CURL_POLL_REMOVE 4

+
+
 #define CURL_SOCKET_TIMEOUT CURL_SOCKET_BAD

+#define CSELECT_IN 0x01
+#define CSELECT_OUT 0x02
+#define CSELECT_ERR 0x04
+
+
+
 typedef int (*curl_socket_callback)(CURL *easy, /* easy handle */
                                     curl_socket_t s, /* socket */
                                     int what, /* see above */
@@ -249,6 +257,9 @@
 CURL_EXTERN CURLMcode curl_multi_socket(CURLM *multi_handle, curl_socket_t
s,
                                         int *running_handles);

+CURL_EXTERN CURLMcode curl_multi_socket_noselect(CURLM *multi_handle,
curl_socket_t s,
+ int bitset, int
*running_handles);
+
 CURL_EXTERN CURLMcode curl_multi_socket_all(CURLM *multi_handle,
                                             int *running_handles);

diff -Naru curl-7.16.2-20070326/lib/multi.c curl-7.16.2-20070326-corr
/lib/multi.c
--- curl-7.16.2-20070326/lib/multi.c 2007-04-06 22:55:58.000000000 +0300
+++ curl-7.16.2-20070326-corr/lib/multi.c 2007-04-06 22:49:59.000000000+0300
@@ -1628,6 +1628,7 @@
 static CURLMcode multi_socket(struct Curl_multi *multi,
                               bool checkall,
                               curl_socket_t s,
+ int bitmask,
                               int *running_handles)
 {
   CURLMcode result = CURLM_OK;
@@ -1665,8 +1666,12 @@
       /* bad bad bad bad bad bad bad */
       return CURLM_INTERNAL_ERROR;

+ data->select_bits = bitmask;
+
     result = multi_runsingle(multi, data->set.one_easy);

+ data->select_bits = 0;
+
     if(result == CURLM_OK)
       /* get the socket(s) and check if the state has been changed since
          last */
@@ -1763,7 +1768,17 @@
                             int *running_handles)
 {
   CURLMcode result = multi_socket((struct Curl_multi *)multi_handle, FALSE,
s,
- running_handles);
+ 0, running_handles);
+ if (CURLM_OK == result)
+ update_timer((struct Curl_multi *)multi_handle);
+ return result;
+}
+
+CURLMcode curl_multi_socket_noselect(CURLM *multi_handle, curl_socket_t s,
+ int bitmask, int *running_handles)
+{
+ CURLMcode result = multi_socket((struct Curl_multi *)multi_handle, FALSE,
s,
+ bitmask, running_handles);
   if (CURLM_OK == result)
     update_timer((struct Curl_multi *)multi_handle);
   return result;
@@ -1773,7 +1788,7 @@

 {
   CURLMcode result = multi_socket((struct Curl_multi *)multi_handle,
- TRUE, CURL_SOCKET_BAD, running_handles);
+ TRUE, CURL_SOCKET_BAD, 0,
running_handles);
   if (CURLM_OK == result)
     update_timer((struct Curl_multi *)multi_handle);
   return result;
diff -Naru curl-7.16.2-20070326/lib/select.h curl-7.16.2-20070326-corr
/lib/select.h
--- curl-7.16.2-20070326/lib/select.h 2007-03-19 05:00:02.000000000 +0200
+++ curl-7.16.2-20070326-corr/lib/select.h 2007-04-06 22:01:48.000000000+0300
@@ -64,10 +64,6 @@

 #endif

-#define CSELECT_IN 0x01
-#define CSELECT_OUT 0x02
-#define CSELECT_ERR 0x04
-
 int Curl_select(curl_socket_t readfd, curl_socket_t writefd, int
timeout_ms);

 int Curl_poll(struct pollfd ufds[], unsigned int nfds, int timeout_ms);
diff -Naru curl-7.16.2-20070326/lib/transfer.c curl-7.16.2-20070326-corr
/lib/transfer.c
--- curl-7.16.2-20070326/lib/transfer.c 2007-03-12 05:00:09.000000000+0200
+++ curl-7.16.2-20070326-corr/lib/transfer.c 2007-04-06 22:51:
01.000000000 +0300
@@ -312,9 +312,10 @@

   curl_socket_t fd_read;
   curl_socket_t fd_write;
- int select_res;
-
   curl_off_t contentlength;
+ int select_res = data->select_bits;
+
+ data->select_bits = 0;

   /* only use the proper socket if the *_HOLD bit is not set simultaneously
as
      then we are in rate limiting state in that transfer direction */
@@ -329,7 +330,11 @@
   else
     fd_write = CURL_SOCKET_BAD;

- select_res = Curl_select(fd_read, fd_write, 0);
+ if (!select_res) { /* Call for select()/poll() only, if read/write/error
+ status is not known. */
+ select_res = Curl_select(fd_read, fd_write, 0);
+ }
+
   if(select_res == CSELECT_ERR) {
     failf(data, "select/poll returned error");
     return CURLE_SEND_ERROR;
diff -Naru curl-7.16.2-20070326/lib/urldata.h curl-7.16.2-20070326-corr
/lib/urldata.h
--- curl-7.16.2-20070326/lib/urldata.h 2007-04-06 22:55:58.000000000+0300
+++ curl-7.16.2-20070326-corr/lib/urldata.h 2007-04-06
22:34:33.000000000+0300
@@ -1360,6 +1360,7 @@
   iconv_t utf8_cd; /* for translating to UTF8 */
 #endif /* CURL_DOES_CONVERSIONS && HAVE_ICONV */
   unsigned int magic; /* set to a CURLEASY_MAGIC_NUMBER */
+ int select_bits; /* read, write or error bits */
 };

 #define LIBCURL_NAME "libcurl"
-------------------------------------------------------------------

------------------ Application part changes: ------------------
static void eventcallback (int fd, short kind, void *userp)
{
  batch_context *bctx = (batch_context *) userp;
  int st;
  CURLMcode rc;

  int bitset = 0;

  if (kind & EV_READ) {
      bitset |= CSELECT_IN;
  }
  if (kind & EV_WRITE) {
      bitset |= CSELECT_OUT;
  }

  /*
     Tell libcurl to deal with the transfer associated
     with this socket
  */
  do
    {
      rc = curl_multi_socket_noselect(bctx->multiple_handle, fd, bitset,
&st);
      // rc = curl_multi_socket(bctx->multiple_handle, fd, &st);
    }
  while (rc == CURLM_CALL_MULTI_PERFORM);
--------------------------------------------------------------------------------------

Results for the same exactly experiment (100 sec), after the patch
applied:

Name Number Relative to send()
                                                      Number
epoll_wait: 1,601 -
epoll_ctl: 63, 441 2
"poll(": 67,740 2
recv: 285,780 9
send: 32,220 1
gettimeofday: 1,567,157 49

The performance improved a bit and the loader made 15% more requests
for the same time.
Number of poll () system calls per request decreased from 9 to 2 and the
remaining polls may be attributed to the testing of connection existance and
some
timers expired, which handling has Curl_select in its path.

    QUESTIONS:

If you would agree, that it does make sense to pass an event bitmask
together with an event-triggered fd, thus a few questions:

- Is it better to add the bitmask to an exiting API curl_multi_socket()
  or for backward comp better to arrange another API like
  curl_multi_socket_noselect()?
- Where to keep the bitmask: in SessionHandle or better in UserDefined?
- Does it makes sense to have a flag and not to scan for all splay timers in
multi_socket () every time, when calling e.g. curl_multi_socket_noselect() ?

STANDALONE QUESTIONS:

1. A registered socket callback is called by libcurl on a connected
handle after the state DONE. Thus, the hipev.c example registeres
fd with libevent only for data read/write actions. Can it be done
earlier just after non-blocking connect () in order to make more usage of
epoll() and less of poll ()?

2. There is a huge number of gettimeofday() calls originating from libcurl.
What are the directions to decrease the numbers somehow?

Sincerely,
Robert Iakobashvili,
coroberti %x40 gmail %x2e com
...................................................................
Navigare necesse est, vivere non est necesse
...................................................................
http://curl-loader.sourceforge.net
Received on 2007-04-10