cURL / Mailing Lists / curl-library / Single Mail

curl-library

Re: HTTP pipelining question

From: Daniel Stenberg <daniel_at_haxx.se>
Date: Thu, 27 Dec 2007 12:51:27 +0100 (CET)

On Thu, 27 Dec 2007, Dmitry Kurochkin wrote:

>> First, you removed the echeck for a non-complete name resolve. Why? When
>> the name isn't resolved, there won't be any check->sock[FIRSTSOCKET] filled
>> in to use anyway!

> The idea is that we always want to reuse existing HTTP connection.
> Even if name resolution is not complete.

[...]

> We should wait until the resolution and connection is done and then use the
> connection for all requests to that host.

Yes, I agree with that.

> Actually, now I understand this change is not correct. But the idea is
> still true.

Indeed. I was just pointing out how the specific change didn't make sense to
me! ;-)

>> Then you removed the check for the "check->bits.close" which is TRUE when
>> the downloaded file is about to be closed after this retrieval. Such
>> connections are not suitable for trying a pipeline with anyway so I don't
>> agree with the removal.

> In my sample app connection was not reused because of bits.close check.

That might be true, but simply ignoring it will not be the right thing to do.

bits.close is TRUE there because a connection is marked to get closed until
the protocol-specific CONNECT function has been called (Curl_http_connect() in
this case).

The problem is thus that we want to not pipeline on existing connections that
have bits.close set true (since they won't allow it anyway), but for
connections that aren't yet done we should ignore that value.

> Actually I do not know why this it was set even before the connection is
> established... I should figure it out.

I think it is basically because internally libcurl defaults to not re-use
connections and sets that variable to true as soon as it knows it can re-use
the connection. It might be possible to change this, or perhaps the better
thing is to modify the check to something like I described above.

> Anyway, curl should not make second connection if one exists already.

Yes it should make a second connection if the currently existing one isn't
deemed "fine" to use.

If the existing one is fine, then a second connection should not be done.

>> Your commenting out parts of the logic in Curl_connect() is also a bit
>> strange to me since the comments explain the current reasoning and your new
>> code disagrees with the comments?! Can you elaborate on the new
>> functionality by making sure the comments are correctly describing the
>> code?

> In Curl_connect() I removed SetupConnetion() call for the case when we reuse
> connection. As I far as I understand SetupConnection() is not needed in case
> of connection reuse since the connection phase is done in another handler.
> Also protocol_done is set for reused connection for the same reason.

But why the change? Are you saying that this code caused problems for your
pipelining without this change? SetupConnection() is perfectly fine to call
even for re-used connections (obviously since libcurl works fine with re-used
connections) and it also does some further inits than "just" the protocol_done
thing which your patch now skips. The conn->now init being perhaps the most
important one.

You may have found that skipping to be fine and everything, but then I want
that explained in the comments (as I can't really see that it is a valid
assumption). In fact, I can't accept a patch that simply modifies the code and
leaves the comments around to no longer explain the current code accurately!

> Multi handler with pipelining enabled should make at most one connection to
> the server. Even if the server does not support pipelining and closes
> connection after first request, curl should not start loading everything in
> parallel, flooding server with connections. We should load everything
> sequentially.

I agree (and it could actually also be subject for some kind of option in the
future). But only if pipelining has been enabled as otherwise you're seriously
hampering and altering how libcurl already works and is supposed to work.

> So if I add 4000 easy handlers to multi with pipelining enabled and
> start multi_perform loop, I am sure that even if server does not
> accepts pipelining, all files will be downloaded sequentially.

Yeah... but 4000 is an extreme amount that I bet not that many apps use, and
those who get 8 files may possibly often prefer to get them as fast as
possible rather than to just use one connection?

I'll admit this is where I'm starting to get vague since all of this is
theories and concepts to me only, I don't really author any apps like this
myself. I suppose the potential future option I mentioned could be a "max
number of simultanoues connections to the same host" that would make the
4000-files app use no more than that amount in case pipelining fails (for
whatever reason).

> Ideally curl can have max connections per host option. But as I
> understand it would require more work.

Uh, like that yes ;-)

> I think this should work. But I am not sure about efficiency. This looks to
> me like the most simple way, but may be I am wrong here.

It looks sensible to me and I think it'll be efficient since there's nothing
inherintly slow or bad in that approach.

What's gonna slow down the operation is the case when you know the existing
connection is going to get closed after this download and you could start
doing a new connection to do pipelining on right now, but instead you sit
waiting for the existing one to finish (and it could be a huge slow download
or whatever) before you proceed with the new connection.

> The most difficult for me is not to break anything. All described above
> applies only to handlers with pipelining enabled. So if pipelining flag is
> not set, curl should behave as before. As I understand pipelining option is
> not used much in its present state, so I hope this makes such changes not so
> dangerous.

Yes, and as you've pointed out the pipelining is currently somewhat broken so
your work on this should really only improve things even for the rare apps
that use it!

>> What kind of tests/builds have you done with your patch?

> I build on Debian. make tests reports all tests passed. For pipeline is
> tested on one sample up. It created one multi handle and bunch of easy
> handlers with the same URL, than goes to "multi perform loop". I did not
> added anything to curl tests.

Are you building libcurl c-ares enabled? For most multi interface uses that
will make your applicaton a lot better performing, and since that is
asynchronous name resolving it also takes a few different paths in the libcurl
code...

Of course, if you end up with a decent test app for your pipelining work I
hope you'll share it with us so that others can try out with their libcurl
builds/tests to see how it performs!

-- 
  Commercial curl and libcurl Technical Support: http://haxx.se/curl.html
Received on 2007-12-27