cURL / Mailing Lists / curl-library / Single Mail

curl-library

RFC: libcURL and persistent connection management

From: Ryan Beasley <rbeasley_at_vmware.com>
Date: Thu, 19 Jan 2012 20:10:15 -0800 (PST)

Hi, libcURL folks,

I work on an application using libcURL as the basis for its web services client. (The late 90s are back. Aw yeah.)

Problems I'm trying to solve:
    1. By default, without throttling requests passed to libcURL, our application may cause libcURL to create >50 persistent connections to a remote server. These connections persist until the app server's 30 minute timeout fires. (By using CURLMOPT_MAXCONNECTS, we'll still burst up to 50 connections, though the persistent connection count will die down.)

    2. Related to problem 1, even after cleaning up easy handles, persistent connections linger until the proxy/origin server kills 'em (practically indefinitely). The web services bits are only a part of our application, and said services are session-based. That is, when the user is done with a remote host, they terminate their session, and there's no need for the client to keep the connections alive.

I started by looking into fixing up our app to rein in outbound connections, but now I'm thinking this stuff would better fit in libcURL itself. I'd like to run the ideas past Daniel et al., and then—with approval from my management—whip up the related code and send out some patches. Please let me know if the following seem useful. If so, I'd appreciate any feedback before getting to work.

Handful of ideas:
    1. Request + connection tagging
    2. Per host/proxy persistent connection limits
    3. Queueing requests in libcURL

Request + connection tagging:
    Addresses problem 2.

    The idea here is to provide a mechanism whereby a client application can signal to libcURL that a selection of persistent connections need not be kept around, because the application won't need to issue any new requests on said connections anytime soon. The relationship between connections and tags is m:n (sets).

    Example usage:
        CURLM* multi;
        CURLCT* connTag;
        CURL* easy;

        multi = curl_multi_init();
        connTag = curl_conntag_init(); /* get a new tag for each "session" */

        /* Happens once per request. */
        {
           easy = curl_easy_init();
           /* set request options */
           curl_easy_setopt(easy, CURLOPT_CONNTAG, connTag);
           curl_multi_attach_handle(multi, easy);
        }

        /*
         * App runs, event loop pumped, requests processed, etc. Easy
         * handles are removed from multi handle and cleaned up.
         *
         * At some time t_n, user explicitly disconnects from appserver. We
         * don't need to talk to this server any more.
         */
        curl_conntag_cleanup(connTag); // Tags removed from connections. When connection
                                        // has no more outstanding tags, libcURL will close it.

Per host/proxy persistent connection limits:
    Addresses problem 1.

    It's one thing to limit the total number of persistent connections per multi handle, but this says nothing about fairness between individual endpoints. Like, if I don't want more than 4 persistent connections to a single host, I have to drop the CURLMOPT_MAXCONNECTS count for that multi handle, but this limit applies to _all_ hosts. It'd be great if, even though I may send a burst 16 simultaneous requests each across 3 servers, libcURL would eventually close down extraneous connections to 4 conns per host.

    Tangent: It seems like libcURL ignores CURLMOPTS_MAXCONNECTS where the number of connections is less than the default of 10 connections.

    Open question: Should limits for origin servers & proxies differ? Web browsers tend to make this distinction.

    Example usage:
       CURLM* multi = curl_multi_init();
       curl_multi_setopt(multi, CURLMOPT_MAXCONNECTSPERHOST, 4);

Limiting outstanding requests, implementing request queues in libcURL:
    Addresses problem 1.

    So, cURL's connection limits affect persistent connections, but clients have to throttle requests passed to cURL in order to prevent cURL from opening N connections in the first place. That is, if I set the connection limit for a multi handle to the (current effective) minimum of 10, but then I pass a burst of 32 POST requests to libcURL, at any point in time I may actually see 32 separate open connections. Only after the servers respond & requests wind down will libcURL (eventually) close 32 - 10 = 22 of them.

    Since cURL's multi interface is asynchronous by nature, how about having libcURL manage things like this, keeping the requests in an internal queue? (Open question: should this be total or also per host/proxy?)

    Example usage:
        CURLM* multi = curl_multi_init();
        curl_multi_setopt(multi, CURLMOPT_MAXOUTSTANDING, 4); // Enables queueing on this multi handle.

        for (i = 0; i < 10; i++) {
            CURL* easy = curl_easy_init();
            /* set options */
            curl_multi_add_handle(multi, easy);
        }

        /*
         * When pumped from the application's event loop, libcURL will initiate at
         * most 4 requests at a time, pulling each one from the request queue.
         *
         * The end result is that the client app will never have more than 4
         * open connections at once.
         */

     PS: I guess I should rephrase some stuff. Perhaps this should only throttle requests which would require opening a new connection. I.e. it wouldn't limit multiple pipelined GET requests sent over a single connection.

-------------------------------------------------------------------
List admin: http://cool.haxx.se/list/listinfo/curl-library
Etiquette: http://curl.haxx.se/mail/etiquette.html
Received on 2012-01-20