cURL / Mailing Lists / curl-library / Single Mail

curl-library

RE: libcurl and async I/O

From: Andrew Barnert <abarnert_at_adobe.com>
Date: Mon, 18 Aug 2008 15:36:57 -0700

16 Aug 2008 13:13, Daniel Sternberg:
> On Fri, 15 Aug 2008, Andrew Barnert wrote:
> > I've been investigating incorporating HTTP tunneling into a
> > boost-asio-centered tool. Writing all the HTTP stuff myself is no
> > fun; I'd much rather use libcurl. But I need the same asio engine
> > manage all of the sockets, concurrency, etc., whether they're
> > curl-tunneled or native. After looking over the libcurl APIs,
> > there appears to be no way to do what I want.
>
> I have no idea what you want. I don't know boost nor boost-asio.

No problem. I tried to explain it with lots of hand-waving, but let me
give more detail, as you requested.

Any select/poll-style sync I/O API looks like this:

  When you create/accept a new socket:
    add_to_poll_list(sock, POLL_READ);

  When the polling notification tells you the read is ready:
    char *buffer = malloc(4096);
    recv(sock, buffer, 4096, 0);
    processData(buffer, 4096);
    free(buffer);
    add_to_poll_list(sock, POLL_READ);

This is in effect what curl_multi_socket_action does, and the
add_to_poll_list function is just the callback that the user
provides.

(I'm assuming that at the end of each read, you want to do
another read. And that you have no writes. And that errors are
impossible. And that you never want to stop.)

Async I/O instead looks like this:

  When you create/accept a new socket:
    char *buffer = malloc(4096);
    recv_async(sock, buffer, 4096, 0);

  When the async notification tells you the read is complete:
    processData(buffer, 4096);
    recv_async(sock, buffer, 4096, 0);

This is what I was proposing curl_multi_socket_async_action
would do, and recv_async would be the new callback that the
user would provide.

What about all that fancy threading? There's no difference. In
either case, someone (the app's main loop, a background thread,
a pool of threads, whatever) has to call some function that either
says "block until at least one fd is ready, and tell me which ones
are ready" or "block until at least one I/O operation completes,
and tell me which ones have completed." The former is just select,
poll, etc.; the latter is something like GetQueuedCompletionStatus
(Windows IOCP) or aio_suspend (POSIX) or your favorite alternative.

As you can see, the difference isn't that big. First, you (meaning
libcurl) have to prepare the buffers before scheduling the I/O
instead of after you're notified. Second, you don't have to--and
don't even have the option to--do the I/O yourself; that's the
application's job.

> I must admit I still don't know what IOCP is or how it works.

Forget IOCP; it's got a ton of extra complexity you don't need
to worry about. If you need more detail than I gave above, look
at the POSIX aio_* calls instead; they're much simpler, and
it's the same basic idea, and it's explained in man pages.

> I don't quite understand this brief description. Can you add some more
> pseudo code for a client using this suggested API?

int callback(CURL *easy, int sock, int action, void *up, void *sp,
             void *buf, size_t len) {
  struct aiocb *cb = (aiocb *)up;
  cb->aio_buf = buf; cb->aio_nbytes = len;
  if (action & CURL_POLL_IN)
    aio_read(cb);
  else if (action & CURL_POLL_OUT)
    aio_write(cb);
  return 0;
}

int main(int argc, char *argv[]) {
  CURLM *cm;
  CURL *c;
  struct aiocb cb;
  int sock;
  char buf[4096];
  int rc;
  int handles;
  int bytes = 0;
  /* initialize everything--this includes things like pointing the
   * opaque-pointer things in the aiocb and the easy-handle at each
   * other, connecting the socket, calling curl_multi_add_handle,
   * setting up the callback, etc. */
  do {
    rc = curl_multi_socket_async_action(sock, IN, &handles, bytes);
  } while (rc == CULRM_CALL_MULTI_PERFORM &&
           aio_suspend(&cb, 1, 0) == 0 &&
           (bytes = aio_return(&cb)) != 0);
  return 0;
}

Of course somewhere in here you probably want to send some data, and
process the data you receive, and so on.

> In my view, asynchronous is mostly just another word for running the
> stuff in another thread until it has something, and then have a means
> of telling the first thread when it is done. And you can use libcurl
> fine already for doing exactly that.

Yeah, the terminology is not all that good, but this is the way that
POSIX, Microsoft, Sun, and Boost use it. "Async I/O" means that the
actual I/O calls are themselves asynchronous. It has nothing to do
with threads; it means that aio_read (ReadEx, etc.) always returns
immediately, but somewhere in the kernel the read is still yet to
happen.

There doesn't have to be a background thread anywhere in the
application. And, as I explained above, sync I/O can use threads in
exactly the same way as async I/O.

(Yes, there is some kind of magic asynchronicity happening inside the
OS, but no more than with select. Somewhere, the OS is processing
network packets and deciding to trigger user threads based on what it
sees; just do the read or write before scheduling the user thread and
you have a primitive aio implementation.)

> > There's only one problem: SSL, SSH, and Kerberos.
>
> That sounds like three problems to me! ;-)

Or 5 or 6 problems, depending on how you count them (are scp and sftp
really the same problem? what about the 3 different SSL libraries?).
But it's all the same root problem.

> > These are all wrapped by using their send/recv replacements, and you
> > obviously can't just tell Windows or boost.asio to do an overlapped
> > OpenSSL SSL_send call.
>
> Now you lost me again. Are you saying that you need to base this
> functionality on same particular magic functions of the OS to make it
> working?

Sort of. The way libcurl is currently built, it tells the application
"let me know when socket 23 is ready to read," the app (thanks to OS
magic) tells it "socket 23 is ready to read now," and it then can do a
recv, SSL_recv, or anything else it wants (as long as it's ultimately
just a single socket read call).

With an async design, libcurl tells the application "recv from socket
23 into buffer foo and let me know when you're done," the app (thanks
to OS magic) tells it, "I did the read for you, have fun with the
data," and... well, it's obviously too late to do an SSL_recv now.

There are a two obvious (but definitely non-trivial) ways to fix this:

First, libcurl could tell the app "SSL_recv from socket 23 into
buffer foo and let me know when you're done," but this means the app
needs to know how SSL_recv works. I think OpenSSL provides a way to
make this work, but it pushes a lot of knowledge up to the app level,
and it may not work with other libraries.

Second, libcurl could wrap each of its APIs up. Somewhere inside the
SSL_recv function there's a recv call, and OpenSSL provides a pretty
way to hook this. So, libcurl could call down into OpenSSL, hook the
recv, and pass that to the caller's callback just as it would its own
raw recv. This is more work inside libcurl, and I'm not sure all 6 of
the wrapped libraries provide ways to do this.

> Or why can't these other protocols be made to work the same way?

Any protocol can be made to work this way. The question is, are the
libraries that implement these protocols already able to work this
way--and how much work does it take to use them this way.

> And do note that we support SSL with several other libs as well, not
> just OpenSSL...

Sure; I just used OpenSSL as an example because it's the one I've used
myself most recently.

> > Anyway, I think it may make sense for me to hack up libcurl myself
> > to do what I want, and just break the SSL, SSH, and Kerberos support.
> > But I doubt that has any use to the rest of the community, except
> > maybe as a proof of concept for how things could be done properly.
>
> Right, adding an API or mechanisms for libcurl that doesn't work for
> SSL, SSH, and Kerberos seems like something I'd have a hard time to
> accept.

I'd certainly hope that's true!

Anyway, if I did provide such a hacked-up version, people could play
with it and we'd know if it makes sense to do all the work to make
the SSL/SSH/Kerb libraries play nice with async I/O.
Received on 2008-08-19