curl / Mailing Lists / curl-users / Single Mail
Buy commercial curl support. We help you work out your issues, debug your libcurl applications, use the API, port to new platforms, add new features and more. With a team lead by the curl founder Daniel himself.

Re: Feature request: data stream upload after server response

From: fungs via curl-users <curl-users_at_lists.haxx.se>
Date: Mon, 10 Jun 2024 14:03:40 +0000

Thanks for jumping into the discussion!

>> - I proposed a feature to make curl file upload work with existing HTTP
>> endpoints (not mine) on existing infrastructure (cloud providers with
>> managed reverse proxies and edge routers).
> Sorry, but does it not already work?

Not in the described case with request size limits for authentication and
delegation, unfortunately. I didn't mean to say that it doesn't work in general,
but in the example I gave it doesn't, because curl cannot operate on the
required layer. I don't want to discuss here whether it could be made to work if
the cloud provider would implement things differently, because that is out of
control for me and other users. I'm just observing the limitation and proposing
a possible client side solution.

>> - Nowadays, it is common that requests are delegated, for instance to an
>> authentication proxy, before data is uploaded to an endpoint. This is just
>> how modern cloud and microservices work.
> It is not really interesting nor relevant to curl exactly how many are
> involved in receiving a request or answering to a response as long as protocol
> is adhered to. "Delegating" in this sense does not change anything over the
> wire.

That is true. Somehow, you are assuming that the mode of operation would not
comply with the HTTP protocol, right? As far as I know, the protocol leaves some
room when it comes to the question when the data is sent, in particular if we
are talking about streamed (aka chunked) data. Proof me wrong, but I think it
adheres to the HTTP protocol to open a PUT connection and wait for 10 minutes
before sending a chunk of data, which curl wouldn't do out of the box. It is
still a legal usage pattern, I believe. I'm not a typical web developer, so I'm
open to input and discussion on this aspect.

Now, what works and what doesn't is not only a matter of the protocol, but also
of the logic and limitations of the remote systems, which is basically all of
the involved infrastructure parts including the actual endpoint on the remote
side, some of which is usually provided as managed components by cloud
providers. E.g. uploading 1 TiB of data at once might not be accepted due to
size restrictions. The remote systems at whatever level would respond with a
corresponding error response. This is exactly what happens in the described
case, when the request cannot be processed in time, due to its size. All legal
HTTP, just not what works.

>> - With those setups, it does not make sense to start a data upload until an
>> end to end connection has been authenticated and established, because the
>> data to be passed among systems and needs to be cached in the process of
>> routing. Such intermediate systems need to handle many requests and have
>> storage and time limits.
> I don't understand this claim. Are you suggesting that the client should
> sometimes ignore the protocol mechanics because there might be someone on the
> path to the end point that is acting up?
>
> In which circumstances can the client guess that it should take shortcuts? And
> exactly what shortcuts are we talking about?

I'm not asking curl to violate the HTTP protocol, just to use the protocol
differently in those cases. The circumstances under which a specific mode of
operation needs to be selected is totally up the application, just like REST
interfaces define how to use HTTP with a specific endpoint. The client (curl)
needs to adjust its mode to the remote application. No need to decide in an
automatic fashion. But the application is the sum of all involved remote parts,
including reverse proxies etc.

>> - The solution I see and which seems to work for me is to wait for the
>> response before starting to upload or stream data, which ensure that all the
>> infrastructure negotiation has succeeded.
> I don't understand how this differs from normal behavior.

Open a local TCP port using `netcat -l localhost 8080` and upload some data
using curl (`-T .` in chunked mode). The data is uploaded right away, although
netcat does not talk HTTP and does not respond to the request in any way other
than consuming TCP. This behavior is of course intended and valid HTTP, but it
can create a very large request to be handed around for authentication. The
request needs to be cached and then played against the data consuming endpoint.
When first listening for a status 200 response before sending the actual
payload, things work faster and more predictable in those cases. At least for
chunked data, this approach looks perfectly suitable to me from a practical
point of view. Even if used as a replacement in the standard upload flow, it
would only add a tiny bit of latency.

The two questions to answer from my point of view would be:

1. Does the proposed workaround violate the protocol in any way (regular upload
vs. chunked data; in particular reading the response before streaming the data;
applications in which it would not work)?

2. Does it make sense for curl to implement this functionality in some way (what
are the formal criteria for inclusion)?


-- 
Unsubscribe: https://lists.haxx.se/mailman/listinfo/curl-users
Etiquette:   https://curl.se/mail/etiquette.html
Received on 2024-06-10