cURL / Mailing Lists / curl-library / Single Mail

curl-library

Re: HTTP pipelining question

From: Dmitry Kurochkin <dmitry.kurochkin_at_gmail.com>
Date: Thu, 27 Dec 2007 04:16:04 +0300

Hi Daniel.

Please find my answers below. I hope it makes sense. English is not my
native language, some thoughts are difficult to explain :)

2007/12/27, Daniel Stenberg <daniel_at_haxx.se>:
> On Tue, 25 Dec 2007, Dmitry Kurochkin wrote:
>
> > I have started working on it. The attached patch is my first attempt
> > to reuse connection before it is fully established.
>
> Thanks a lot for your work on this!
>
> First nit is that it doesn't apply on current CVS. (Quite easy to fix manually
> though.)
I will switch to CVS version.

>
> > The patch is really small. I removed relevant check from ConnectionExists()
> > and added condition for reuse case in CreateConnection(). So the second
> > connection goes straight to DOWAIT state.
>
> Your changes in there don't make much sense to me:
>
> First, you removed the echeck for a non-complete name resolve. Why? When the
> name isn't resolved, there won't be any check->sock[FIRSTSOCKET] filled in to
> use anyway!
The idea is that we always want to reuse existing HTTP connection.
Even if name resolution is not complete. We should wait until the
resolution and connection is done and then use the connection for all
requests to that host.
Actually, now I understand this change is not correct. But the idea is
still true.

>
> Then you removed the check for the "check->bits.close" which is TRUE when the
> downloaded file is about to be closed after this retrieval. Such connections
> are not suitable for trying a pipeline with anyway so I don't agree with the
> removal.
In my sample app connection was not reused because of bits.close
check. Actually I do not know why this it was set even before the
connection is established... I should figure it out.
Anyway, curl should not make second connection if one exists already.
We should wait until it is closed and signalPipeClose() is called.
Then another connection is made.

>
> Your commenting out parts of the logic in Curl_connect() is also a bit strange
> to me since the comments explain the current reasoning and your new code
> disagrees with the comments?! Can you elaborate on the new functionality by
> making sure the comments are correctly describing the code?
In Curl_connect() I removed SetupConnetion() call for the case when we
reuse connection.
As I far as I understand SetupConnection() is not needed in case of
connection reuse since the connection phase is done in another
handler. Also protocol_done is set for reused connection for the same
reason.

Let me explain my thoughts on what should be done in more detail...

Multi handler with pipelining enabled should make at most one
connection to the server. Even if the server does not support
pipelining and closes connection after first request, curl should not
start loading everything in parallel, flooding server with
connections. We should load everything sequentially.
So if I add 4000 easy handlers to multi with pipelining enabled and
start multi_perform loop, I am sure that even if server does not
accepts pipelining, all files will be downloaded sequentially.
Ideally curl can have max connections per host option. But as I
understand it would require more work.

To implement this I want to add another pend_pipe list to struct
connectdata. When we process an easy handler and find that there
already exists a connection to the same host we:
1. If connection is not complete yet, or pipeline is full at the
moment, or protocol is HTTP 1.0, or server is going to close
connection after this response, or server did not sent first reply yet
(we do not know if it is HTTP 1.1 and is not going to close
connection) -
     put request into pend_pipe, set state to CURLM_STATE_WAITDO
2. Otherwise put it to send_pipe, set state to CURLM_STATE_WAITDO

Handler from pend_pipe in CURLM_STATE_WAITDO state will be moved to
send_pipe when we can really pipeline it.

If server does not really supports pipelining and connection is
closed, signalPipeClose() would be called for pend_pipe. So all
requests will processed "from the beginning": new connection is made
for one requests, all other requests are accumulated in pend_pipe
again.

I think this should work. But I am not sure about efficiency. This
looks to me like the most simple way, but may be I am wrong here.

The most difficult for me is not to break anything. All described
above applies only to handlers with pipelining enabled. So if
pipelining flag is not set, curl should behave as before. As I
understand pipelining option is not used much in its present state, so
I hope this makes such changes not so dangerous.

I would like to see your comments on this. If you agree with this
design I can go on with more "technical" questions.

>
> What kind of tests/builds have you done with your patch?
I build on Debian. make tests reports all tests passed.
For pipeline is tested on one sample up. It created one multi handle
and bunch of easy handlers with the same URL, than goes to "multi
perform loop". I did not added anything to curl tests.

Regards,
  Dmitry
>
> > BTW At line 2432 I changed check->bits.proxy to httpproxy. I am behind a
> > proxy at work and set http_proxy env var. detect_proxy() does not set proxy
> > bit but sets httpproxy. So this check fails. I think this should be fixed in
> > detect_proxy() to set proxy bit as well. But I was not sure so put this
> > little hack :)
>
> Yes it should, as this fix is incorrect and breaks re-use of connections using
> other proxies than http. I committed a fix just now based on your findings,
> thanks!
>
> --
> Commercial curl and libcurl Technical Support: http://haxx.se/curl.html
>
Received on 2007-12-27