curl / Mailing Lists / curl-library / Single Mail

curl-library

Re: Pipelining is a pain, can we ditch it now?

From: Radu Hociung <radu.curl_at_ohmi.org>
Date: Mon, 6 Aug 2018 13:55:16 -0500

On 30/06/2018 5:16 AM, Daniel Stenberg wrote:
> We got a fresh new HTTP pipelining bug submitted today [1]. We already
> have other pipelining bugs listed in the KNOWN_BUGS document [2].

I am the submitter of that bug, and now that I understand it better, I
have revised the issue report. It is NOT a pipelining bug, but a
plumbing bug. Internally libcurl uses both a splay tree and a pending
list to consider which request should be started when one ends. When new
jobs get added to the splay tree with EXPIRE_NOW, they will be
considered before already pending jobs on the pending list, which means
they are executed out-of-order. The updated issue report has more detail
about the circumstances of this bug.
(https://github.com/curl/curl/issues/2701#issuecomment-408717419)

> HTTP Pipelining is badly supported by curl in the sense that we have
> bugs and it is a fragile feature without enough tests.

Having looked at the code and having considered the previous HTTP/1.1
pipelining work documented on the mailing list (Carlo Wood and Monty
Brandernberg in 2014, Linus Nielsen Feltzing and Joe Mason in 2013,
Dmitry Kurochkin in 2008, likely others I overlooked, apologies), it
looks like HTTP/1.1 pipelining is very well supported and tested,
especially relative to other "features" that have crept into libcurl,
that are far less well used and abused, like ssh, smtp, gopher, etc.

> How many users do we have that use Pipelining?
>
> In the annual curl survey we ask users and a shockingly large amount of
> users claim they use pipelining. Over 20% in the latest poll. I've never
> trusted this number though, since we would have way more bug reports and
> work done on the code if it truly was used that much.

Also scanning through the mailing list and the issue tracker, I found
there are several users of HTTP pipelining, and the sophistication of
their use-cases exceed that of the cut-and-paste users of libcurl. It
looks to me like 20% is a very plausible number:

* Sunil Sayyaparaju (Nov 9, 2017) wants to enable pipelining for
idempotent POST requests, and modified his own copy (forked) just like
ru-17_at_yandex.ru did 5 years prior, see below.

* Evgen Bodunov (Nov 21, 2017) using pipelining to download map tiles
with a iOS app (probably a mobile app), wants to cancel already issued
requests.

* Alexandre Bouin (May 3, 2016) is using pipelining to maximize
throughput, has a good understanding of bandwidth, latency and network
utilization concepts.

* Cristian Morales Vega (Jun 18, 2015) using 7.43.0 with pipelining and
the curl_multi_socket_action API, ran into the same out-of-order problem
that I just reported at #2701, when using 7.58.0, and submitted a test
case. Also had a similar conversation with badger about the
order-of-execution guarantee that I had in issue #2701. Even the cause
was narrowed down to the splay tree, just as I did.

* Xavi Artigas (Mar 31, 2014) Using libcurl pipelining in a downloader
library, to maximize throughput. Encountered corrupted downloads in
libcurl 7.32 and 7.36

* Török Edwin (Jan 24, 2013), described a bug in 7.28.1 about pipelining
and CONNECTTIMEOUT, and submitted a test case.

* Yamin Zhou (Feb 19, 2014), wanted to use pipelining with
libcurl-7.22.0, found it buggy, then upgraded to 7.35.0, and his app
worked as expected.

* David Strauss (Apr 10, 2013), wants to implement AAAA records and
balancing/fail-over multiple hosts in combination with connection
management and pipelining. Offered to sponsor the the work. The
application is a FuseDav filesystem client.

* Glen Milner (Jul 18, 2012), implementing pipelining and digest
authentication, found a bug, looks like it was fixed by Joe Mason.

* ru-17 <ru-17_at_yandex.ru> (Feb 27, 2012), modified his own libcurl
source to enable pipelining for POST

* Mario Castelan Caestro (July 14, 2011), question about the callback order.

* Anton Serdyuk and Marcin Adamski (Oct 7, 2011), found an infinite loop
in Curl_removeHandleFromPipeline (was already fixed in the repo HEAD at
the time). The application is a web crawler.

* Stefan Krause (Nov 16, 2010), had questions about pipelining and local
ports.

* Justin Skirts (July 20, 2009), wanted to implement flow control in a
pipelined fetch.

It's worth noting that these people generated a bit of mailing list
activity at the time they were working on their respective projects, but
are not regular contributors. I think this is typical of a developer to
join the mailing list only when he has some kind of issue, but not stick
around after he finds the solution. After all, libcurl is not their main
interest, but just another library to get their project moving.

In my experience, pipelining works very well in libcurl, aside from some
configuration constraints. I expect that there are many more users of
pipelining for whom a basic pipelining works well enough that they
didn't need to ask questions on the mailing list. Many of those can
likely tell if their application is indeed getting the expected
performance from pipelining (Ie, I assume that other developers are
reasonably intelligent)

Also I note that none of these people appear to use libcurl in browser
projects, but rather in one of several project types:
- web crawler
- some other backend communications (event logging like)
- as transport for another application (DAV)
- one instance of video streaming app (can't find the reference at the
moment)
- process control and automation (no examples on the ML, I have
encountered many examples in my professional life), ie, embedded HTTP
servers, used to control industrial equipment, test and measurement
instrumentation, etc.

It looks like the users who are interested in HTTP 1.1 pipelining are
more keen on performance and invariably quote throughput (**not
latency**) as the the reason for their need to pipeline. Furthermore,
some of the applications I mentioned require a specified order of
request execution, which is explicitly not available in HTTP/2. In order
to use HTTP/2, these applications would require workarounds to issue
only one request at a time in order to guarantee ordering, thus
defeating the intent of HTTP/2, and achieving a lower performance than
they could with HTTP 1.1 and pipelining. Such apps have no reason to
consider HTTP/2.

HTTP 1.1 will, in my opinion, continue to live along side HTTP/2 for the
forseable future.

While HTTP/2 has some benefits in browsing applications, it is its lack
of pipelining that make is unsuitable for a host of other applications,
like process control (request ordering required), games (many
non-idempotent requests), or streaming (presentation-layer buffering
issues). It may be a replacement for HTTP 1.1 in browsers, but an
undesireable protocol in other applications.

Multiplexing, which HTTP/2 offers, is **not pipelining**.
Multiplexing is a way of performing tasks in parallel, pipelining is a
way of performing tasks serially.

Game client often rely on communicating game state with the game server,
and the messages are often non-idempotent requests, and they need to be
delivered to the server in a specific order. Pipelining guarantees this
order. HTTP/2 by design and desire does not guarantee order of execution:

        "Note: Stream dependencies and weights express a transport
        preference, not a requirement, and as such do not guarantee a
        particular processingor transmission order. That is, the client
        cannot force the server to process the stream in a particular
        order using stream prioritization. While this may seem
        counterintuitive, it is in fact the desired behavior. We do not
        want to block the server from making progress on a lower
        priority resource if a higher priority resource is blocked." [1]

Media Streaming applications, especially network-adaptive ones,
typically need chunks of content delivered in the correct order.
Delivering multiple chunks simultaneously, like HTTP/2 does, is the
opposite of what such apps need. More than this, it is inefficient for
any part of a later chunk to be delivered before the earlier chunk is
completely delivered, as it would slow down the delivery of the earlier
chunk, and require additional memory to buffer the later chunk which
arrives early. It is precisely the pipelining feature of HTTP 1.1 which
helps streaming apps both maximize throughput and minimize local
buffering requirements. Chunks arrive in the order requested with
pipelining.

Take an app like netflix. It starts streaming a low-resolution chunk, so
the presentation can start as soon as possible. While the first(s)
chunks are delivered, the app can estimate the available network
bandwidth, so the 2nd chunk requested can be of appropriate resolution
to match network capabilities. But still, none of the 2nd chunk is
needed until the 1st is completely delivered. This class of app heavily
prefers HTTP 1.1 pipelining over HTTP/2 magic. While it can still work
with HTTP/2, it requires more app development and more buffering space
to deliver a smooth stream over HTTP/2. Or, if the available buffering
space is limited, such as in embedded devices, buffering capacity
wasted due to HTTP/2 delivering chunks simultaneously/overlapping means
these embedded devices would be forced to display a lower resolution
than they could with better buffer control.

Case in point, Viren Baraiya, on Nov 14, 2017, while Engineering Manager
at Netflix responded in a github issue [2] that Netflix were not working
on HTTP/2 support.

Pipelining is here to stay; in the HTTP 1.1 specification, in server
implementations, and in other applications (perhaps) other than
browsers. Whether libcurl continues to support it or not will only
affect whether libcurl remains relevant for the non-browser, HTTP world.
Remember HTTP is not HTML. The HTTP protocol is used for purposes much
more varied than to deliver HTML and the associated stylesheets and images.

I think what is likely to happen is that someone will fork libcurl,
bring it back to its core functionality, bring it into RFC compliance,
and clean up the cruft, and similar to what happened to
OpenSSL/LibreSSL, while libcurl is free to continue addressing the needs
of a browser-developer audience, even though it is not used in any browsers.

Certainly if libcurl still supports a long defunct protocol like gopher,
it should still support pipelining which was only revised 4 years ago
(HTTP 1.1 RFCs were published June 6, 2014). In any case, what happens
with HTTP 1.1 should not depend on the fate of HTTP/2. They are
complementary, not evolutionary protocols.

[1] https://github.com/Netflix/conductor/issues/363#issuecomment-344415181
[2] https://developers.google.com/web/fundamentals/performance/http2/
-------------------------------------------------------------------
Unsubscribe: https://cool.haxx.se/list/listinfo/curl-library
Etiquette: https://curl.haxx.se/mail/etiquette.html
Received on 2018-08-06