cURL / Mailing Lists / curl-library / Single Mail

curl-library

curl-multi: Delays with FTP downloads due to blocking select() calls / libcurl architecture

From: Richard Atterer <richard_at_list04.atterer.net>
Date: Sun, 5 Dec 2004 16:52:25 +0100

Hi,

I'm still looking at FTP downloads with curl-multi and my own select() loop
(based on curl_multi_fdset()).

There is a serious (for me) problem which would take some work to fix.
However, ATM I have no time/inclination to fix this myself, so this is just
FYI.

My curl-multi application performs many concurrent requests, both HTTP and
FTP. With FTP, there are a couple of blocking select() calls in ftp.c.
Their effect on the application is not really acceptable: They tend to
block the whole application for several seconds while CWDing through
directories. Since my app retrieves lots and lots of fairly small files in
parallel from several servers, this has a significant performance impact,
to the point that FTP becomes unusable. :-(

Apart from the directory-change code in ftp.c, there is a select() with a
60-second timeout (after issuing a PORT to the server) which also looks
quite frightening to me. ;-/

[I'm also experiencing a problem in the following case: I call
curl_easy_cleanup() to interrupt a download, libcurl sends out the QUIT
command, attempts to wait for the server's response, but gets stuck forever
in the select() in Curl_GetFTPResponse(). Not sure what's happening here.]

All in all, it would be nice if there were no select() calls at all in the
code - everything should go via my own select() to avoid wait delays. Of
course, turning all the code into a state machine is a lot of work. :-(

I used to hack on libwww, which features "state machine"-style code, and I
have to say such code tends to be very difficult to maintain. If there are
only a few paths through the equivalent non-s.m. code, then the s.m. code
is still OK (look at zlib for a nice example), but with implementations for
protocols like FTP and HTTP, this is hopeless IMHO.

For a while, I have played with the thought of writing a
code-transformation program which takes specially marked-up code (a bit
like with bison/yacc) and transforms it into a state machine. This would be
a lot better than coding the state machine manually. A simple version of
such a transformation tool would be easy to implement...

The alternative would be simply to multi-thread. AFAIK from a short look,
this is what the Mozilla code does. Hmm, a fixed dependency on threads is
not ideal for a library like libcurl. Additionally, multi-threaded
programming brings with it its own share of "fun"...

Meanwhile, there's also a workaround: Allow the curl-multi user to specify
his own function pointer and call that function instead of select(). But
this would not solve my FTP performance problems. :-/

Cheers,

  Richard

-- 
  __   _
  |_) /|  Richard Atterer     |  GnuPG key:
  | \/¯|  http://atterer.net  |  0x888354F7
  ¯ '` ¯
Received on 2004-12-05