cURL
Haxx ad
libcurl

Shopping cart software, Online file storage, Online photo storage, Hosted shopping cart, Contact management software, Email marketing software, Project management software, Issue tracking software, Online notepad, Web publishing software

curl's project page on SourceForge.net

Sponsors:
Haxx

cURL > Mailing List > Monthly Index > Single Mail

curl-library Archives

RE: cleanup half closed sockets in connection cache

From: Daniel Stenberg <daniel_at_haxx.se>
Date: Mon, 16 Nov 2009 13:31:09 +0100 (CET)

On Fri, 13 Nov 2009, Frank Meier wrote:

>> What about running the thing every time a transfer completes? Or perhaps
>> even we could do it at completion time and when a handle is removed from
>> the multi handle.
>
> that would not be of much help, because (at least in our case) the servers
> we are taking to have keepalive enabled, so the socket get's into close_wait
> after serveral seconds, when the process is already doing something else.

I see. But I was thinking more of a case there you use N connections and they
all complete with Y seconds interval. Then it would make sense to check the
connection cache when each transfer ends, as the the previous transfer might
be possible to close correctly. This is something we don't do today since the
current logic is more limited.

> So the Problem exists when a lot of backend connections are established, and
> later (few seconds) they are being closed by the peer. At that point in time
> I should be able to detect this and to close the connection from my side.

Right, when N transfers complete almost at once we end up with all of them
going dead if you don't do anything else within Z seconds...

> Generally in an application where I had to handle a lot of sockets, the
> (IMHO) right approach would be to put all this sockets (fdset) in a select,
> and depending on which one something is happending (close, accept, data to
> read) you can handle it appropreatly.

Yes, that is the right way to deal with many sockets but in libcurls case it
knows of many more sockets than what you right now deal with. Adding them too
to the select() would for most use-cases just be an extra set of handles that
only makes select() slower.

> In the meantime the application can suspend in the select, or do something
> else. So its possible to react "~immediatly" to actions on sockets, and
> otherwise the application does only run if necessary. Now with libCurl as it
> is now I don't see this kind of possibility (without extending the API, I'm
> afraid).

Right, but then also most applications don't care about the
sockets/connections that aren't currently in use. And I think rightfully so.
The connection cache can potentially be a lot of connections.

> Even if It would be possible to get an fdset of all the sockets in the
> connection cache, maybe by curl_easy_getinfo(), I'd need a function like
> curl_handle_socket() to handle the socket action (close read etc), just
> closing the socket, that is und curl's control wouldn't be nice I think.

Right, if you'd close them libcurl would not consider that nice.

> Another approch I see, to cleanup the connection cache with each call to
> curl_multi_perform. Since the connection cache is attached to the
> multihandle, and curl_multi_perform does not block

Right, but it's a potentially rather expensive operation so we shouldn't it on
_every_ call. We'd need some cleverer approach.

Perhaps we could make it do that check every N seconds, where N would be a
conservatively high value (at least 60 seconds, possibly even 300 or so) to
start with and we add a new curl_multi/easy_setopt() option to allow it to get
set to something smaller.

> it could be used as a cleanup function that has to be called from time to
> time (not so nice).

Well, in your use-case when you use N connections for a while and then stop
using them all at once, there's really no other way than somehow call libcurl
again at a later point only for the purpose of closing down "dead"
connections. The question is then basically only what function to call. I
can't even think of a way to help the app to know for how long to wait or how
often to do the maintainance call. We just don't know those things.

> An improvement could be only to cleanup the cache if no running easy_handles
> are attached to the multi_handle.

Perhaps, but I think having it based on time is slightly better since it'll
then also cover the case when you do 10 transfers and 9 of them stop at once
and the last one takes another hour to complete.

> In the easy_interface something similiar could be done if URL is set to
> NULL, curl_easy_perform() only cleans up the cache (like it does now if a
> URL is given).

Well, that would make it a really odd exception to how curl_easy_perform() is
used today so I'm not sure I'm that comfortable with such a solution. After
all, if you've done your curl_easy_perform() you can force a closure of
everything by simply closing the easy handle.

> As I see the extension of the API is not an option for you

It _is_ an option. It's just an option I go very far to avoid so I hope
throwing ideas around and discussing around them might make us come up with a
way we all like!

-- 
  / daniel.haxx.se
-------------------------------------------------------------------
List admin: http://cool.haxx.se/list/listinfo/curl-library
Etiquette:  http://curl.haxx.se/mail/etiquette.html
Received on 2009-11-16

These mail archives are generated by hypermail.

donate! Page updated November 16, 2009.
web site info

File upload with ASP.NET