New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Inconsistent/Incompatible handling of filename escaping in multipart/form-data compared to RFC 7578 and browsers #7789
Comments
I agree. @monnerat, do you want to grab this? |
This is a picky area: we have something that seems to work for forms and featuring mime did not change this escaping that was already present in original formdata code. In addition, there are contradictory informations in the mentioned documents: In https://html.spec.whatwg.org/multipage/form-control-infrastructure.html#multipart-form-data:
In https://datatracker.ietf.org/doc/html/rfc6266#section-4.3:
So: %-escape or not, and what about the backslash ??? I fear this change may break a lot of things: in particular, tests 39 and 1158. As a consequence, I can do it, but things have to be clarified first. |
This is rather tricky. It seems the three specs have very different ideas on how to encode some basic bytes:
Or did I get any of them wrong?
Those tests verify curl's behavior. If we change behavior, we of course also need to update those tests accordingly! When specs are hard to follow (like here) I think we need to check what other tools and libraries use and what widely deployed server-side applications accept/support. I think unifying with browser behavior on this seems reasonable as they are probably the major producers of formdata bodies. |
So this isn’t quite the problem with URLs, but similar. In order of precedence, and recognizing that cURL may ignore the WHATWG work due to the Living Standard nature:
I say “ish”, because virtually no one is using the ( |
Yes sorry, I realized my mistake and edited the table. WHATWG and RFC 7578 agree on 0x0a, 0x0d and 0x22. I think I'm still wrong with the backslash there as 7578 doesn't say it should be encoded I think... |
@bagder I think your table is correct now, in that Section 2 of RFC 7578 says you can percent encode the However, I don’t think |
Seems not. But you know http better than I do.
Sure! But this shows we have no bug, but perhaps a misinterpretation. That said, if tests break, applications running perfectly for more that 13 years now may fail too.
I think it's the most important. Servers themselves may parse it too (Apache modules?) As we only generate these and do not parse them, It would be good to know how widespread CGI-like apps and servers (i.e.: PHP) understand them.
Curl does not support charset encoding in http/mime header fields. This would be rather cumbersome to implement.
AFAICS, you're as puzzled as I am :-) As the various documents are not OK with each other, this introduces a huge spelunking work for us in third-party products. Even the latter may not all stick to the same behavior. I'm not sure I'm ready for thst :-( |
Indeed! I mentioned separately to @bagder that this began as an “Oh, that’s odd” when someone noticed a webpage behaving differently if browsers sent the request vs cURL, and the deeper I went, the more boxes of spiders I found myself opening. I thought it useful to write this down and try to capture the references. cURL’s ubiquity suggests this would be the main place people run into it. From a browser evolution space, there’s been a wide variety of behaviours before converging on the current approach. For example, one was just to replace the Although ultimately this is only intended as a hint for servers (for the reasons and risks noted in the various specs), and to that end, many behaviors are valid, I figured I would at least file this so it’s on your radar, even if the risk/complexity/ineffability is too much :) |
Before speaking semantics, we should care about lexical and syntax. If we do not encode a backslash, would it be correct or would it be interpreted as a |
Test form:
Firefox 88 on Linux for the file name
Chrome 94 on the same box same name:
|
It's even worse. I edited my html for the form to be:
(the newline there is intended) The results from the jury: Firefox 88:
Chrome 94:
|
I tried with firefox 92.0 and it uses percent encoding without backslash escaping. |
Maybe some M$ tester could check with IE and edge? |
It still means that receivers of multipart formposts today have to deal with both the \-escaped and percent encoded fields for now and the near-term future. Not sure that helps us with this issue, I'm just stating the obvious. |
Escaping-style detection (if some) is a server-side problem, no change is then required in curl for that. And sooner or later, if major clients move to %-escaping, receivers will have to adapt :-( I checked PHP 7.4.24: it uses raw name values as To be versatile on our side, I can only suggest we have an additional |
Until now, form field and file names where escaped using the backslash-escaping algorithm defined for multipart mails. This commit replaces this with the percent-escaping method for URLs. As this may introduce incompatibilities with server-side applications, a libcurl option CURLOPT_FORM_ESCAPE_AS_MIME is introduced to revert to legacy use of backslash-escaping. This is controlled by new cli tool option --mime-escape. New tests and documentation are provided for this feature. Reported by: Ryan Sleevi Fixes curl#7789
Until now, form field and file names where escaped using the backslash-escaping algorithm defined for multipart mails. This commit replaces this with the percent-escaping method for URLs. As this may introduce incompatibilities with server-side applications, a libcurl option CURLOPT_FORM_ESCAPE_AS_MIME is introduced to revert to legacy use of backslash-escaping. This is controlled by new cli tool option --mime-escape. New tests and documentation are provided for this feature. Reported by: Ryan Sleevi Fixes curl#7789
Until now, form field and file names where escaped using the backslash-escaping algorithm defined for multipart mails. This commit replaces this with the percent-escaping method for URLs. As this may introduce incompatibilities with server-side applications, a new libcurl option CURLOPT_MIME_OPTIONS with bitmask CURLMIMEOPT_FORMESCAPE is introduced to revert to legacy use of backslash-escaping. This is controlled by new cli tool option --form-escape. New tests and documentation are provided for this feature. Reported by: Ryan Sleevi Fixes curl#7789
Until now, form field and file names where escaped using the backslash-escaping algorithm defined for multipart mails. This commit replaces this with the percent-escaping method for URLs. As this may introduce incompatibilities with server-side applications, a new libcurl option CURLOPT_MIME_OPTIONS with bitmask CURLMIMEOPT_FORMESCAPE is introduced to revert to legacy use of backslash-escaping. This is controlled by new cli tool option --form-escape. New tests and documentation are provided for this feature. Reported by: Ryan Sleevi Fixes curl#7789
Until now, form field and file names where escaped using the backslash-escaping algorithm defined for multipart mails. This commit replaces this with the percent-escaping method for URLs. As this may introduce incompatibilities with server-side applications, a new libcurl option CURLOPT_MIME_OPTIONS with bitmask CURLMIMEOPT_FORMESCAPE is introduced to revert to legacy use of backslash-escaping. This is controlled by new cli tool option --form-escape. New tests and documentation are provided for this feature. Reported by: Ryan Sleevi Fixes curl#7789
Until now, form field and file names where escaped using the backslash-escaping algorithm defined for multipart mails. This commit replaces this with the percent-escaping method for URLs. As this may introduce incompatibilities with server-side applications, a new libcurl option CURLOPT_MIME_OPTIONS with bitmask CURLMIMEOPT_FORMESCAPE is introduced to revert to legacy use of backslash-escaping. This is controlled by new cli tool option --form-escape. New tests and documentation are provided for this feature. Reported by: Ryan Sleevi Fixes curl#7789
Until now, form field and file names where escaped using the backslash-escaping algorithm defined for multipart mails. This commit replaces this with the percent-escaping method for URLs. As this may introduce incompatibilities with server-side applications, a new libcurl option CURLOPT_MIME_OPTIONS with bitmask CURLMIMEOPT_FORMESCAPE is introduced to revert to legacy use of backslash-escaping. This is controlled by new cli tool option --form-escape. New tests and documentation are provided for this feature. Reported by: Ryan Sleevi Fixes curl#7789
names Until now, form field and file names where escaped using the backslash-escaping algorithm defined for multipart mails. This commit replaces this with the percent-escaping method for URLs. As this may introduce incompatibilities with server-side applications, a new libcurl option CURLOPT_MIME_OPTIONS with bitmask CURLMIMEOPT_FORMESCAPE is introduced to revert to legacy use of backslash-escaping. This is controlled by new cli tool option --form-escape. New tests and documentation are provided for this feature. Reported by: Ryan Sleevi Fixes curl#7789
names Until now, form field and file names where escaped using the backslash-escaping algorithm defined for multipart mails. This commit replaces this with the percent-escaping method for URLs. As this may introduce incompatibilities with server-side applications, a new libcurl option CURLOPT_MIME_OPTIONS with bitmask CURLMIMEOPT_FORMESCAPE is introduced to revert to legacy use of backslash-escaping. This is controlled by new cli tool option --form-escape. New tests and documentation are provided for this feature. Reported by: Ryan Sleevi Fixes curl#7789
Until now, form field and file names where escaped using the backslash-escaping algorithm defined for multipart mails. This commit replaces this with the percent-escaping method for URLs. As this may introduce incompatibilities with server-side applications, a new libcurl option CURLOPT_MIME_OPTIONS with bitmask CURLMIMEOPT_FORMESCAPE is introduced to revert to legacy use of backslash-escaping. This is controlled by new cli tool option --form-escape. New tests and documentation are provided for this feature. Reported by: Ryan Sleevi Fixes curl#7789
Until now, form field and file names where escaped using the backslash-escaping algorithm defined for multipart mails. This commit replaces this with the percent-escaping method for URLs. As this may introduce incompatibilities with server-side applications, a new libcurl option CURLOPT_MIME_OPTIONS with bitmask CURLMIMEOPT_FORMESCAPE is introduced to revert to legacy use of backslash-escaping. This is controlled by new cli tool option --form-escape. New tests and documentation are provided for this feature. Reported by: Ryan Sleevi Fixes curl#7789
Until now, form field and file names where escaped using the backslash-escaping algorithm defined for multipart mails. This commit replaces this with the percent-escaping method for URLs. As this may introduce incompatibilities with server-side applications, a new libcurl option CURLOPT_MIME_OPTIONS with bitmask CURLMIMEOPT_FORMESCAPE is introduced to revert to legacy use of backslash-escaping. This is controlled by new cli tool option --form-escape. New tests and documentation are provided for this feature. Reported by: Ryan Sleevi Fixes curl#7789
I did this
"foo.jpg
(note the leading quote)curl -i -F encoded_image=@\"foo.jpg https://www.google.com/searchbyimage/upload
I expected the following
Within the
multipart/form-data
that is generated, thefilename
parameter of the request will be in the following form:filename="\"foo.jpg"
- using thequoted-string
production of RFC 822.However, I would expect that the filename would of the form
filename="%22foo.jpg"
, using the Percent-Encoding Option of RFC 7578curl/libcurl version
ToT
operating system
Further Details
The context here is that
multipart/form-data
encoding is handled by lib/formdata.c. The form-data support is implemented using the generic MIME encoder, in the family ofcurl_mime_*
functions. For example, in the above example of attaching a file, this is handled bycurl/lib/formdata.c
Line 882 in e081048
curl_mime_filedata
curl_mime_filedata
passes the filename onwards usingcurl_mime_filename
, passing only the basename of the filename.Later, when compiling the multipart message, the field name and filename for the
Content-Disposition
header are escaped using theescape_string
function, as shown atcurl/lib/mime.c
Lines 1867 to 1886 in 52fab72
These is a complex area due to issues of character sets and filenames. Within this space, RFC 6266, Section 4.3 is relevant, and further expanded upon in RFC 6266, Appendix C.2. RFC 7578, Section 4.2 attempts to provide guidance here with respect to the
filename
handling inter-operably within aContent-Disposition: form-data
part.The HTML Living Standard Multipart form data specification places a normative dependency on RFC 7578, providing further guidance with respect to the encoding and escaping of
field names
andfilenames for file fields
, the two areas escaped viaescape_string
, to encode to the appropriate encoding and then replace{ 0x0A, 0X0D, 0X22 }
with{ "%0A", "%0D", "%22" }
, respectively. This is compatible with the guidance of RFC 7578, and reflects widespread use within browsers (e.g. Chromium/Chrome or WebKit/Safari)This is an area where MIME attachments differ in practice than form data. It seems that it may be useful to at least align the form encoder to match the approach mentioned within RFC 7578, for greater harmonization.
The text was updated successfully, but these errors were encountered: