Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

no access to HTTP CONNECT proxy error response body content #9513

Closed
aathan opened this issue Sep 15, 2022 · 18 comments
Closed

no access to HTTP CONNECT proxy error response body content #9513

aathan opened this issue Sep 15, 2022 · 18 comments

Comments

@aathan
Copy link

aathan commented Sep 15, 2022

I did this

See #9508, the first part of which I quote here for convenience:

I have proxy that returns a 503 with application/json content:

$ curl -o - -I -x 'http://x:y@localhost:1234' https://ipinfo.io/ip
HTTP/1.1 503 Service Unavailable
Content-Type: application/problem+json
Date: Thu, 15 Sep 2022 00:22:02 GMT
Content-Length: 214

However, curl does not show the content returned with the error. Is that a bug or some aspect of the HTTP spec that states such content should be ignored?

$ tcpdump -Ai lo port 1234
...
HTTP/1.1 503 Service Unavailable
Content-Type: application/problem+json
Date: Thu, 15 Sep 2022 00:22:02 GMT
Content-Length: 214

{"type":"https://xxx.com/problems/xxx-no-clients","title":"xxx","status":503,"detail":"xxx"}

... the content is definitely returned to curl...

Curl reports:

error code 56, "Received HTTP code 500 from proxy after CONNECT"

I expected the following

So, this "error code 56" is apparently not an HTTP error code because (1) HTTP error codes are defined as 3 digits and (2) I cannot find any reference to HTTP error code 56 (including at https://en.wikipedia.org/wiki/List_of_HTTP_status_codes, https://en.wikipedia.org/wiki/List_of_HTTP_status_codes, or https://datatracker.ietf.org/doc/html/rfc7231#section-6).

In usr/include/asm-generic/errno.h we have:

#define EBADRQC         56      /* Invalid request code */

I do see that Chrome takes a similar approach to reporting a proxy error.

Thus, while I do understand the point made by @dfandrich, by that logic, neither is the 503 "what the transfer is about" yet it is in fact reported by curl (even if encapsulated by the message into an error code 56). The 503 error and its associated headers is what's actually returned by proxies in the HTTP envelope, as proven by tcpdump and (importantly) emitted by -I output. I.e., -I shows the headers of the proxy connection not the content connection.

It seems the standard leaves open the possibility for proxies to return content together with an error response; and curl does pass through the 503 headers with -I. Therefore, the issue may be as much about what facilities curl wants to provide as what the standard says. I suspect it may be silent on this point, but I have not chased that down.

Even though the 503 is "not what the transfer is about", it's relevant to the transfer that was requested and I would argue curl should provide a mechanism to pass that content through. Ultimately, the fact is that proxies "can" return data with a proxy sourced error ("can" as in it's provably technically feasible and does occur in the wild, per google searches). That being said, I did not chase down the full spec on HTTP CONNECT to see if there may be a "MAY NOT" statement about body content (but I doubt it).

That being said, if I'm not mistaken, the spec at https://datatracker.ietf.org/doc/html/rfc7231 clearly shows that content may be returned in any HTTP response bearing any status code, including 503.

Therefore, I would argue that it is appropriate to pass through the body even for errors generated by a proxy or, failing that, that there should be a mechanism for retrieving that content (an additional flag? inclusion in -w %{json}? etc) and/or reporting of that content (or at the very least, that the content exists) in the same message that says "error code 56"

The current behavior can be rather frustrating, when you see that the content-length !=0, yet there's no apparent content, and the only way to see it is via tcpdump.

curl/libcurl version

curl 7.81.0 (x86_64-pc-linux-gnu) libcurl/7.81.0 OpenSSL/3.0.2 zlib/1.2.11 brotli/1.0.9 zstd/1.4.8 libidn2/2.3.2 libpsl/0.21.0 (+libidn2/2.3.2) libssh/0.9.6/openssl/zlib nghttp2/1.43.0 librtmp/2.3 OpenLDAP/2.5.12
Release-Date: 2022-01-05
Protocols: dict file ftp ftps gopher gophers http https imap imaps ldap ldaps mqtt pop3 pop3s rtmp rtsp scp sftp smb smbs smtp smtps telnet tftp
Features: alt-svc AsynchDNS brotli GSS-API HSTS HTTP2 HTTPS-proxy IDN IPv6 Kerberos Largefile libz NTLM NTLM_WB PSL SPNEGO SSL TLS-SRP UnixSockets zstd

operating system

all

@bagder
Copy link
Member

bagder commented Sep 15, 2022

curl does transfers to/from a given URL. The proxy is a middle man here that is used to get the transfer going. The content a proxy may include in a body is at best an error message in curl's point of view.

How do you propose curl delivers that to make sure that users don't mistake it for actual content?

in the same message that says "error code 56"

... but that could be excessively large. What if there are hundreds of kilo or megabytes of data?

@bagder
Copy link
Member

bagder commented Sep 15, 2022

BTW, curl's error 56 is clearly documented in curl's man page:

Failure in receiving network data.

@jay
Copy link
Member

jay commented Sep 16, 2022

I don't feel strongly about this either way but the way I interpret RFC 7231 section 4.3.6 is it only forbids CONNECT replies from having a body if the connect was successful (2xx).

@bagder
Copy link
Member

bagder commented Sep 16, 2022

Yeah, I don't see how the RFC is relevant at all here. This is a question of how/if/where curl would provide that proxy response body for the user. Currently, libcurl makes no effort to provide that.

I'm not sure it can without us adding a new callback or some other mechanism that would allow libcurl to deliver what is potentially a large amount of data that isn't the response body and isn't really "headers" either.

@jay
Copy link
Member

jay commented Sep 16, 2022

Yeah, I don't see how the RFC is relevant at all here.

I'm putting that on a t-shirt.

@aathan
Copy link
Author

aathan commented Sep 17, 2022

  • The RFC states that proxies may return error content (just as non-proxy error responses may contain content)
  • curl returns non-proxy error content presumably without differentiating it from non-error content
  • curl returns proxy error headers (via -I)
  • proxy error headers refer to the proxy error content and are in the same transport channel as that content
  • curl squelches the proxy error content, even while processing the proxy error headers
  • curl is frequently used BOTH by end-users fetching resource AND developers testing HTTP infrastructure

I don't see how the RFC is relevant

I ... uh, disagree. :)

curl does transfers to/from a given URL. The proxy is a middle man

There's a lot to unpack there. I know you're speaking loosely but: URLs identify resources and the means by which to access those. The transfers are not to/from a given URL but rather to/from the myriad of services involved in providing the resource in accordance with the specified protocol.

Specifically for HTTP(S) there are some inherent ambiguities in the protocol because there is exactly one response channel (the tcp connection on which the request is made), but the returned data may or may not be returned by the referenced resource due to transports (and errors) reaching it. The content returned may or may not be the URL's content per the spec (@jay's comment.)

Due to this, no one SHOULD assume the content returned by HTTP during proxying MUST be the destination's content.

How do you propose curl delivers that to make sure that users don't mistake it for actual content?

I added the emphasis because clearly this is about "usability" concerns governing curl's behavior/implementation relative to the protocols. A related question would be "how does curl make sure that users don't mistake a non-proxy error response for actual content? Per @dfandrich's comment it sounds like it makes no such effort.

Therefore, I would argue that proxy error content should not be demoted; particularly since curl must be configured to proxy, via command lines and/or env variables. Presumably then, the user is keenly aware that there is an additional error actor in the pipeline.

Given the fact that non-proxy error content is returned by curl in its current implementation, perhaps the better question is: How do you propose curl delivers all errors (not just those generated by the proxy) to ensure that users know the response is an error vs the "actual content"?

If there is a mechanism to do that, we can then ask the secondary question: "How do we differentiate an error response that is returned by the proxy vs one returned by the destination?"

So let me instead focus on the first part of the original question:

How do you propose curl delivers [the proxy error content]

Due to the foregoing, IMHO it would be neither unreasonable nor incorrect for curl to pass that content back exactly as the spec allows, inline in the channel, as "the response" which is an error response. This is also consistent with curl returning the body of error responses when the destination returns an error not via a proxy.

It seems that part of the concern here may be related to the difference between transparent proxies and non-transparent proxies. A goal of transparent proxying is to ensure the proxy is an unknown/unseen actor. I don't think that's the goal when configuring a proxy for curl. The proxy is explicitly known, and its operation is part of the pipeline. This observation may point to implementation choices/compromises to deal with all this. E.g., a "transparent proxy" option would cause curl to do the kinds of things being discussed here, that hide the existence of a proxy (such as eating proxy errors).

I'll again note that curl's current behavior is to report proxy errors via the -I option consistent with the error response being "the response" (vs differentiating those headers as "proxy headers"). Despite this "-I" behavior curl is then inconsistent in that it squelches the content to which those headers refer e.g. in the Content-Length etc.

Given the RFC, I would argue that curl (and libcurl) should probably deliver the content the same way it delivers non-proxy error content, because that's what the RFC says is allowed by the protocol. If possible, I would have curl mark error content (whether proxy or non-proxy) in some manner consistent with its current practices, possibly with the addition of some kind of signaling to differentiate destination vs proxy errors. If it's a libcurl issue, perhaps additional return values, bits in those values, struct pointers to be filled in with flags, etc can be used for that purpose (I'm not conversant with the library APIs).

data that isn't the response body and isn't really "headers" either

That this data "isn't the [response from the ultimate destination] body and isn't really 'headers' either" does not seem terribly relevant here. It is in fact the HTTP response content, and curl should return the HTTP response content, just as it returns the HTTP error response content when not proxied.

By the way, what does curl currently do about proxy errors for non-CONNECT proxies? My guess is those (and the associated content body) are returned "normally."

Arguably, it's the responsibility of the user of curl to differentiate between errors and actual content. Arguably, the user is in an excellent position to do so, given that they configured the proxy (e.g., by supplying -x). If curl can provide additional signaling (an inserted header? stderr output? highlighting? dunno) to indicate the error originated while attempting the CONNECT or pursuing the PROXY, that's great!

Finally, if curl disagrees with my opinions or feels that it needs to protect users from "proxy generated content", then I'd ask you to consider not completely squelching that content always. An additional command-line option could be provided to let it through or to write it to a specified file (similar to -w).

If the core issue is some aspect of libcurl's API I can't speak to that, but on general principles, it would seem very possible to provide access to that content via the same mechanism that returns the destination content; and could acceptably do so only when signaling the special content type via API parameters, bitfields in the return values, etc etec or when the facility is requested via API parameters.

Thanks for listening. I'll leave the conclusion in your capable hands.

@aathan
Copy link
Author

aathan commented Sep 17, 2022

PS: I do consider this a bug, not an enhancement. The fact is that (potentially crucial) data returned to curl is being squelched, and the behavior is contra to the RFC.

@bagder
Copy link
Member

bagder commented Sep 17, 2022

The RFC describes how to communicate, and curl follows the RFC.

The RFC does not dictate nor decide how our command line tool or library interact with the user or the application using them. I claim this issue is about how curl/libcurl provides protocol data/details. The RFC does not say how curl should deliver data. How could it?

curl has supported HTTP proxies for over 24 years and you might say that the current behavior is somewhat established. Changing this behavior now in such a drastic way so that it would show proxy content when a OONNECT request returns a non-200 responses would most certainly be wrongly managed by countless users, scripts and applications. It simply cannot be done like that, unless requested by the user somehow. Hence the enhancement. You ask for an improved behavior. The existing/previous behavior is done on purpose, by design. It was not an accident. It is not a bug.

I am sympathetic to what you want curl to show/provide in this described scenario, and I am willing to cooperate and work on improving the situation, but it has to be done with consideration to old users and old behaviors.

@aathan
Copy link
Author

aathan commented Sep 17, 2022

I appreciate/recognize the regression issues, and like I said, I don't want to get into a philosophical argument, if there is even an argument at all. I'm just pointing out certain facts that may be inconsistent with statements made here and/or with each other. We should probably also observe that this scenario is a bit of an edge case, and that the number of affected users is therefore probably small in practice (vs affecting, by the implication of "countless," a large % of users).

An important such fact which seems quite relevant but hasn't been directly addressed in your responses is the consistency of curl's behaviors. Non-proxy errors are currently returned by curl in-line, and I suspect non CONNECT errors arising at the proxy are returned inline too. Only CONNECT proxy errors are squelched, and that's the line where I would argue there is an arbitrary (and incorrect) choice made in the implementation thus far.

While I understand that the RFC does not explicitly state how the user agent (curl) should deal with the transaction contents, there are things we can infer from it. I completely agree with you that this choice is ultimately a "usability" one, made by curl based on its opinions about what's appropriate. I've given my arguments as to why, given the contents of the RFC, which specifically allow for proxy-error content to be returned as the response body together with the error response headers, I feel it is inappropriate for curl to return those error response headers while simultaneously squelching the error response body, merely because it was generated by the proxy vs the destination. This seems exactly counter to the intent of the RFC.

As I mentioned, if curl's opinion ultimately differs from mine, given that curl is frequently used as a utility within CI pipelines / dev workflows (not merely as an end user "fetch" tool by non-technical users), it would seem desirable that it never "eats" data in this way. If you feel changing the behavior to return the error response body inline is too big a regression, then, as we've exhaustively discussed, I think there are means to provide access to that data while preserving the current (IMHO erroneous) behavior.

Again, thanks for listening, and for a great tool!

@aathan aathan changed the title no access to proxy error response body content no access to HTTP CONNECT proxy error response body content Sep 17, 2022
@bagder
Copy link
Member

bagder commented Sep 19, 2022

I'm more curious in ideas on how to actually solve this and provide the response in a way that does not trick or trip existing apps and users.

@bagder
Copy link
Member

bagder commented Sep 19, 2022

The best solution I can think of is to provide a new callback to libcurl for delivering proxy content and then make the tool set and use that with a new dedicated option for this purpose.

@jay
Copy link
Member

jay commented Sep 19, 2022

Changing this behavior now in such a drastic way so that it would show proxy content when a OONNECT request returns a non-200 responses would most certainly be wrongly managed by countless users, scripts and applications.

Yes

The best solution I can think of is to provide a new callback to libcurl for delivering proxy content and then make the tool set and use that with a new dedicated option for this purpose.

IMO proxy content does not seem to me to be special content, if it's not 2xx it's treated as regular HTTP with content (or at least that's how I read it).

I agree this is not a bug though, it's an enhancement or feature request, for example --proxytunnel-no-error or something that could maybe pass it to the http handler when it's not 2xx.

@bagder
Copy link
Member

bagder commented Sep 19, 2022

That might work, as long as the user can spot that the content comes from the proxy and not from the server.

@aathan
Copy link
Author

aathan commented Sep 20, 2022

The more I read this thread the stronger my opinion that this issue is suffering from a cognitive black hole.

Would you mind directly addressing the apparent non-symmetry of treating CONNECT proxy error content differently than PROXY proxy error content?

I believe the potential negative effects of normalizing the behavior across all proxy error content are being vastly overblown, and that the negative effects of treating this as some special case are not creating enough concern. When the CONNECT proxy has been reached, and it has returned an HTTP error and associated error content then, that sequence of events is clearly not an error 56 relative to the HTTP protocol, and the proxy's error and error content should not be squelched, reformatted, re-interpreted, or otherwise mangled.

I would propose that the project label the existing behavior as "old" and adopt a more standards compliant approach, and that instead of having a flag that causes this proposed new behavior, you can instead include a flag that causes the erroneous OLD behavior for those people or projects that end up affected.

I think those will be very few and far between, since either way, curl will be returning an error ... just, the correct one, instead of this dubious "56"

well. maybe now I have nothing more to say. lol. thanks for listening.

@bagder
Copy link
Member

bagder commented Sep 20, 2022

this issue is suffering from a cognitive black hole.

I would never presume anything else. I think we all see things skewed and differently based on our own biases and previous experiences. Including this issue.

Would you mind directly addressing the apparent non-symmetry of treating CONNECT proxy error content differently than PROXY proxy error content?

The difference without CONNECT is that curl doesn't see any difference. Whatever it gets, is what the proxy gives curl on the behalf of the remote server. In the CONNECT case however, there's an explicit communication with the proxy and thus curl knows very well where the stuff comes from.

But primarily, this is behavior established by curl a long time ago so even if there might be logical flaws in the reasoning, that's the way curl works and users can safely assume that curl doesn't break behavior. Thus the need for an option or something to ask for it.

that sequence of events is clearly not an error 56 relative to the HTTP protocol

I don't even understand what this means. curl does not say it is "error 56 relative to the HTTP protocol". That seems like a gross misunderstanding.

adopt a more standards compliant approach

I will continue to insist that this is not a standards compliant question.

If you want change, I propose you actually work on the that instead of just repeating your complaints in this issue.

@aathan
Copy link
Author

aathan commented Sep 21, 2022

I always aim to contribute code when I can. If I were to dive into that, I'd want the change to be well motivated and the PR likely to be accepted. Thus, this discussion.

Whatever [curl] gets, is what the proxy gives curl on the behalf of the remote server

I think you misunderstood my question. There are cases when the proxy to which curl connects itself generates errors. Here I'm not speaking of a transparent proxy. In that case (when that proxy itself generates an error) then how does curl see and/or report that error? I think it reports such errors without differentiating those errors from an error generated at the ultimate destination. In other words, the protocol anticipates that users "should" understand that the origin of HTTP errors, in the presence of a proxy, may not be the ultimate destination.

That's the sense in which I claim this issue is a conformance issue.

curl doesn't see any difference

I think what you are saying here amounts to this: CONNECT proxying is a two phase process, whereas a pure PROXY is single phase. In a CONNECT proxy there is an initial HTTP request/response and then there is a secondary HTTP request/response containing the "actual" request. Intrinsic to this two phased connection then, curl is in a position to detect that an error has occurred during the first phase and therefore "must" be a proxy generated error.

However, curl proceeds by re-casting the HTTP error as error 56 EBADRQC. It "envelopes" the HTTP error in that report, and eliminates any access (except via -I) to the more expressive nature of the HTTP error reporting protocol (which allows for complex rich data to be returned via HTTP headers and body).

That this hasn't been raised before speaks to the corner-case nature of this issue, and is probably indicative of low (not high) impact from changing the behavior. I am challenging the assumption that this is a high risk change. Do we have empirical evidence that "countless" users would be negatively impacted? Is there a way to A/B test such changes in the wild, perhaps through curl's release process? Whether that's worth the effort, that's another question :)

that's the way curl works and users can safely assume that curl doesn't break behavior

I'm sure there have been well motivated breaking changes to curl in the past. At the limit we can call resistance to such changes a kind of orthodoxy. I'm not saying the issue we're discussing reaches that level, but I'm sure you understand what I mean. Thank god we don't have to still support Windows 95.

[don't merely complain, be the change you want]

Maintainers are in a unique position to make changes to a project without wasting activation energy, and can hopefully accept challenges and bug reports without labeling them as complaints. The issue of whether this change corrects a bug or implements a new feature is ultimately merely a choice about how to best serve the community of users, and best conform to the protocols, and should have no ego involved.

I appreciate your time and attention, and I will try to contribute code...as time allows, I'll pull the repo, learn the internals, and submit a PR for this, as I've done across many open source projects since 1990.

A.

PS: I accept that CONNECT proxying is by it's very nature a different beast that pure PROXY proxying; and that therefore, passing through some signaling as to the phase in which an error originated, is consistent with that fact. If curl would at least not swallow the information, that would be a big step forward.

@bagder
Copy link
Member

bagder commented Sep 21, 2022

Also remember that curl can speak numerous non-HTTP protocols over a proxy-CONNECT setup.

@dfandrich
Copy link
Contributor

dfandrich commented Oct 11, 2022 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

Successfully merging a pull request may close this issue.

4 participants