cURL / Mailing Lists / curl-library / Single Mail

curl-library

Re: [PATCH 2/7] pipelining: Fix connection handling under timeouts.

From: m brandenberg <mcbinc_at_panix.com>
Date: Sat, 8 Nov 2014 20:42:10 -0500 (EST)

On Thu, 6 Nov 2014, Carlo Wood wrote:

> Data corruption (1420):
>
> This fixes a four year old bug in multi_runsingle(); when a request
> times out and then setting CURLM_STATE_COMPLETED, followed by a break
> causes the code to continue at the end of the function where the
> message is sent to the user with multi_addmsg() after which the state
> is set to CURLM_STATE_MSGSENT.
>
> An easy handle in that state never gets out of that state anymore:
> the state does nothing. Cleaning up the easy handle from the user
> application (after seeing the request is CURLMSG_DONE) doesn't
> change that fact. As a result, the easy handle gets stuck, owning
> a connection set to 'close' that is never closed or cleaned up.
>
> This commit sets the state to CURLM_STATE_DONE and does NOT break
> out of the loop. As a result the easy handle is cleaned up as a
> handle with an error, as it should. It's connection is closed.
>
> This commit fixes https://sourceforge.net/p/curl/bugs/1420/
> by stopping new requests from being added to such connections
> (the changes in ConnectionExists, but actually probably sufficient
> that now we close and cleanup the connection) causing their replies
> to arrive out of sync.
>
> --
> Connection flooding (comments from me in 1420):
>
> While pipelining it is possible that a request never reached the DO
> state but still timed out because a request in the pipeline before
> it took a long time.
>
> The logic of curl is such that first a new connection is created
> and only then the handles on it are checked for timeouts, hence
> it is possible that a handle times out without ever having sent
> a GET request over their (current) connection. In that case we don't
> want to close that connection: it is a healthy connection that probably
> was JUST created as replacement for the broken pipe at which the
> timeout really occurred.

I have also seen this behavior: connection made and broken
without anything going out on the wire. Not certain that
this was the cause but it certainly does happen.

> Note that the 'bits.close' bit is set by default upon construction of
> a connectdata struct only to be reset (when the user specifically
> requests so) AFTER the connection is established. We can therefore
> not use that bit to test if a connection timed out (which also sets
> that bit) because the close bit is also set BEFORE it even connects
> and when there is a delay between the call to connect() and the
> connection being established many requests can be readded before
> the first establishes a connection.
>
> This commit fixes both problems by using a new bit (timedout) that is
> FALSE upon creation and is only set when the connection times out and
> then use this bit to decide if connection should be closed (or can be
> reused).
>
> As a result of this commit (using my testsuite available at
> https://github.com/CarloWood/curl_http_pipeline_tester) only a single
> active connection is created and used when a full pipe breaks due to
> a timeout.
>
> That is, although this commit fixes the burst of connect()/close()
> calls, the problem remains that the server bundle was refreshed,
> so curl forgot about this server being able to do pipelining...
> This is fixed in a next commit.
> ---
> lib/multi.c | 17 +++++++++++++----
> lib/url.c | 11 +++++++++++
> lib/urldata.h | 3 +++
> 3 files changed, 27 insertions(+), 4 deletions(-)
>
> diff --git a/lib/multi.c b/lib/multi.c
> index 6ee17c6..add396d 100644
> --- a/lib/multi.c
> +++ b/lib/multi.c
> @@ -1009,13 +1009,17 @@ static CURLMcode multi_runsingle(struct Curl_multi *multi,
> }
> }
>
> - /* Force the connection closed because the server could continue to
> + /* Force the connection closed when the server could continue to
> send us stuff at any time. (The disconnect_conn logic used below
> doesn't work at this point). */
> - connclose(data->easy_conn, "Disconnected with pending data");
> + if(data->multi_do_connection_id == data->easy_conn->connection_id) {

See comments at end on 'multi_do_connection_id'

> + connclose(data->easy_conn, "Disconnected with pending data");
> + /* Make sure this connection is no longer considered for
> + pipelining. */
> + data->easy_conn->bits.timedout = TRUE;

Request has timed out, but the connection hasn't. Connection
has been poisoned by the timedout request. Other scenarios
might poison it for other reasons. (Trying to capture the
core idea.)

> + }
> data->result = CURLE_OPERATION_TIMEDOUT;
> - multistate(data, CURLM_STATE_COMPLETED);
> - break;
> + multistate(data, CURLM_STATE_DONE);

This state change worries me and those more familiar with
multi.c should look at this very carefully. This appears
to open several possibilities:

   * Calling Curl_multi_process_pending_handles() which
      will activate queued requests on the now-poisoned
      connection.
   * Cascading down the state machine rather than skipping
      it. This would seem to invite early return on this
      request allowing other (now invalid) requests to operate
      on this connection.

A timed out request can be handled several ways. In order
of decreasing conservatism:

   * Fail all requests on the associated connection and
      fail the connection.
   * Fail the request and all subsequent requests, poison
      the connection against new requests, half-close the
      socket, continue processing remaining requests.
   * If and only if the request has had a side-effect
      on a connection, do previous. Fail request itself.

> }
> }
>
> @@ -1280,6 +1284,9 @@ static CURLMcode multi_runsingle(struct Curl_multi *multi,
> else {
> /* Perform the protocol's DO action */
> data->result = Curl_do(&data->easy_conn, &dophase_done);
> + /* Remember which connection data was (possibly) sent over */
> + if(data->easy_conn) /* Might have been reset on send error */
> + data->multi_do_connection_id = data->easy_conn->connection_id;

If the meaning of 'multi_do_connection_id' is really
that this request has had a side-effect on the connection,
then this may be too late. Would the PROXY case need to
be accounted for earlier (http or socks5)?

>
> /* When Curl_do() returns failure, data->easy_conn might be NULL! */
>
> @@ -2519,6 +2526,8 @@ void Curl_multi_set_easy_connection(struct SessionHandle *handle,
> struct connectdata *conn)
> {
> handle->easy_conn = conn;
> + handle->multi_do_connection_id = -1; /* Nothing was sent over this connection
> + by this easy handle yet. */
> }

Hmm, re-used handles have stale connection IDs
dangling in them?

Also retried requests?

> static bool isHandleAtHead(struct SessionHandle *handle,
> diff --git a/lib/url.c b/lib/url.c
> index 3a877fc..81a27ad 100644
> --- a/lib/url.c
> +++ b/lib/url.c
> @@ -3239,6 +3239,11 @@ ConnectionExists(struct SessionHandle *data,
> the optimal connection to use, i.e the shortest pipe that is not
> blacklisted. */
>
> + /* We can't use the connection if the connection should be closed
> + due to a timeout. */
> + if(check->bits.timedout)
> + continue;
> +
> if(pipeLen == 0) {
> /* We have the optimal connection. Let's stop looking. */
> chosen = check;
> @@ -3652,6 +3657,9 @@ static struct connectdata *allocate_conn(struct SessionHandle *data)
> this need to set this to FALSE in their "curl_do" functions. */
> connclose(conn, "Default to force-close");
>
> + /* This is a fresh connection */
> + conn->bits.timedout = FALSE;
> +
> /* Store creation time to help future close decision making */
> conn->created = Curl_tvnow();
>
> @@ -5231,6 +5239,9 @@ static CURLcode create_conn(struct SessionHandle *data,
> any failure */
> *in_connect = conn;
>
> + /* Nothing was sent over this connection yet, by this handle */
> + data->multi_do_connection_id = -1;
> +
> /* This initing continues below, see the comment "Continue connectdata
> * initialization here" */
>
> diff --git a/lib/urldata.h b/lib/urldata.h
> index 62a2b80..78dd1a8 100644
> --- a/lib/urldata.h
> +++ b/lib/urldata.h
> @@ -486,6 +486,7 @@ struct negotiatedata {
> struct ConnectBits {
> /* always modify bits.close with the connclose() and connkeep() macros! */
> bool close; /* if set, we close the connection after this request */
> + bool timedout; /* if set, the transfer timedout */

I don't know if this is a curl-library@ standard but I generally
leave existing struct offsets alone when possible (adding new
fields at the end).

> bool reuse; /* if set, this is a re-used connection */
> bool proxy; /* if set, this transfer is done through a proxy - any type */
> bool httpproxy; /* if set, this transfer is done through a http proxy */
> @@ -1659,6 +1660,8 @@ struct SessionHandle {
> struct SessionHandle *prev;
>
> struct connectdata *easy_conn; /* the "unit's" connection */
> + long multi_do_connection_id; /* the value of easy_conn->connection_id
> + at the moment Curl_do was called */

Comment/naming/purpose question. This appears to be
duplicated information which would usually be a 'bad thing'
to do. But that is not the actual intent here, is it?
The idea that is trying to be expressed is: "This request
has had a side-effect on an associated connection."
Correct? While the connection_id might be useful for
debug, perhaps the name and comment should reflect the
actual semantics.

>
> CURLMstate mstate; /* the handle's state */
> CURLcode result; /* previous result */

--
Monty Brandenberg, Software Engineer                               MCB, Inc.
mcbinc_at_panix.com                                             P.O. Box 425292
mcbinc_at_pobox.com                                   Cambridge, MA  02142-0006
617.864.6907
-------------------------------------------------------------------
List admin: http://cool.haxx.se/list/listinfo/curl-library
Etiquette:  http://curl.haxx.se/mail/etiquette.html
Received on 2014-11-09