cURL / Mailing Lists / curl-library / Single Mail

curl-library

Re: a curl_multi_fdset() replacement? (TODO-RELEASE #55)

From: Jamie Lokier <jamie_at_shareable.org>
Date: Mon, 31 Jan 2005 00:05:54 +0000

Daniel Stenberg wrote:
> In order to completely get rid of the FD_SETSIZE related problems from
> libcurl, we need to provide a function in the multi interface that exports
> a set of file descriptors (sockets) that an application should wait for
> actions on. It would then not force the app to use select(), but poll() or
> other available functions can be used instead.

On many platforms, it's possible to overcome the FD_SETSIZE limitation
while still using select(). You have to break the rules a bit, but on
nearly every platform fd_set is a bit vector using some word size and
the FD_SET et al. macros work with larger fd numbers, provided enough
memory is allocated and zeroed. You can't guarantee this is portable,
though, just that it works in practice on many platforms.

(It's the simplest solution to roll out if you have an immediate call for
overcoming FD_SETSIZE on Linux and BSD.)

> struct curl_sock {
> curl_socket_t *rsock; /* points to an array of sockets to check for
> reading */
> int rsock_num; /* number of sockets in the rsock array */
>
> curl_socket_t *wsock; /* points to an array of sockets to check for
> writing */
> int wsock_num; /* number of sockets in the wsock array */
>
> curl_socket_t *esock; /* points to an array of sockets to check for
> exception */
> int esock_num; /* number of sockets in the esock array */
>
> int max; /* the highest socket/file descriptor number used by libcurl (and
> thus included in one of these arrays) */
> };

You're limiting the API to something that's slow when there are lots
of descriptors and only a few are active at each poll.

This means that each call to select() or poll() or equivalent is going
to take O(n) in the number of descriptors being polled.

For applications with thousands of descriptors, the time spent in
select() or poll() can be considerable.

This becomes more and more of a problem when handling large numbers of
descriptors.

Ideally, you want an API which takes O(n) in the number of descriptors
which are _ready_ at the poll, so that you can take advantage of
faster readiness-monitoring system calls such as epoll (Linux), kqueue
(FreeBSD), or /dev/poll (Solaris/IRIX/HPUX).

That means an API where the caller can register an fd to be monitored,
is notified when the fd is ready, and a "poll" operation takes an
opaque data structure which remembers which fds are being monitored.

The important thing for polling algorithmic efficiency is that the "poll"
operation must not be required to scan the whole data structure, which
is size O(n) in the number of descriptors being monitored.

The exposed structure you propose does not seem to support this kind
of polling efficiency.

An opaque structure, where the only ways to access it are through
functions/macros which are the moral equivalent of FD_SET, FD_CLR,
FD_ISSET and select -- and with an additional "destroy_curl_fd_set"
destructor (so that the structure can use heap memory allocation) --
that's enough to support that kind of polling efficiency.

-- Jamie
Received on 2005-01-31