cURL / Mailing Lists / curl-library / Single Mail

curl-library

RE: Many CLOSE_WAIT when handling lots of URLs

From: Daniel Stenberg <daniel_at_haxx.se>
Date: Thu, 20 Feb 2014 11:39:28 +0100 (CET)

On Thu, 20 Feb 2014, Shao, Shuchao wrote:

>>> In my setup, Using libcurl-7.24.0, I can have about 20K ESTABLISHED
>>> connections, no CLOSE_WAIT connections. Using 7.35.0 (after changing if()
>>> to while()), the number will be 9K for ESTABLISHED and 11K for
>> CLOSE_WAIT.
>>
>> Are you adding 20K easy handles to the multi handle?
>
> My application has 8 threads, each thread add 625 easy handles. The
> MAXCONNECTS will be default as 625*4=2500.

Isn't that very excessive? It of course depends on the nature of the URLs you
get, but having having a total cache size of 20000 open connections for re-use
sounds like a lot to me.

> And the system has a limit 20K for maximum open socket. These handles
> request 330K URLS one by one repeatedly.

Shouldn't you then aim at having the cache at no larger than (20000 - 625*8) =
totally 15000 == 1875 per multi handle?

I would even consider just having a few hundred more than you add handles
(unless you have a good reason to believe a very large cache actually helps
you), so let's say 900 per multi handle. (As the size includes the active
connections too.)

> I think the CLOSE_WAIT state connections are waiting there for re-use in the
> connection cache pool till open one new socket when the cache is full,

Connections in the cache are typically in ESTABLISHED and then re-used and
everything is fine. If however they are still in the cache when the server
closes the connection (mainly due to idleness), they end up in CLOSE_WAIT.

> perhaps we need to check and kill the idle connections somewhere else when
> the cache is full, just guessing...

Hm. Actually, in the past we had the connection cache implemented as a simple
list and ConnectionExists() would then always close connections it found to be
dead while traversing the list to find a good fit.

Later on, we've changed the cache code and now connections are kept in
"bundles" that are associated with host names so when we look for a connection
to re-use, it will only scan through connections (and "kill" idle ones) that
are for the same host name... This is much faster.

This of course leads us to detecting much fewer "dead" ones while searching
through the cache.

An experient to do here is to add a loop through the entire cache for each
lookup, and close those connections that appear "dead". It can probably bring
back a behavior more similar to what we had before.

-- 
  / daniel.haxx.se
-------------------------------------------------------------------
List admin: http://cool.haxx.se/list/listinfo/curl-library
Etiquette:  http://curl.haxx.se/mail/etiquette.html
Received on 2014-02-20