Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

curl does not parse multipart/byterange responses #6124

Closed
jjatria opened this issue Oct 25, 2020 · 6 comments
Closed

curl does not parse multipart/byterange responses #6124

jjatria opened this issue Oct 25, 2020 · 6 comments
Labels
cmdline tool HTTP not-a-bug This is not a bug in curl

Comments

@jjatria
Copy link
Contributor

jjatria commented Oct 25, 2020

I did this

I attempted requesting multiple non-consecutive byte ranges from a server that supports this by responding with a multipart/byterange response. The test request was for two 4-byte ranges, for a total of 8 bytes.

$ curl -v --range '0-3,100-103' --output curl.out --silent https://pinguinorodriguez.cl
* Rebuilt URL to: https://pinguinorodriguez.cl/
*   Trying 35.185.44.232...
* TCP_NODELAY set
* Connected to pinguinorodriguez.cl (35.185.44.232) port 443 (#0)
* ALPN, offering h2
* ALPN, offering http/1.1
* successfully set certificate verify locations:
*   CAfile: /etc/ssl/certs/ca-certificates.crt
  CApath: /etc/ssl/certs
} [5 bytes data]
* TLSv1.3 (OUT), TLS handshake, Client hello (1):
} [512 bytes data]
* TLSv1.3 (IN), TLS handshake, Server hello (2):
{ [122 bytes data]
* TLSv1.3 (IN), TLS Unknown, Certificate Status (22):
{ [1 bytes data]
* TLSv1.3 (IN), TLS handshake, Unknown (8):
{ [15 bytes data]
* TLSv1.3 (IN), TLS Unknown, Certificate Status (22):
{ [1 bytes data]
* TLSv1.3 (IN), TLS handshake, Certificate (11):
{ [2828 bytes data]
* TLSv1.3 (IN), TLS Unknown, Certificate Status (22):
{ [1 bytes data]
* TLSv1.3 (IN), TLS handshake, CERT verify (15):
{ [520 bytes data]
* TLSv1.3 (IN), TLS Unknown, Certificate Status (22):
{ [1 bytes data]
* TLSv1.3 (IN), TLS handshake, Finished (20):
{ [36 bytes data]
* TLSv1.3 (OUT), TLS change cipher, Client hello (1):
} [1 bytes data]
* TLSv1.3 (OUT), TLS Unknown, Certificate Status (22):
} [1 bytes data]
* TLSv1.3 (OUT), TLS handshake, Finished (20):
} [36 bytes data]
* SSL connection using TLSv1.3 / TLS_AES_128_GCM_SHA256
* ALPN, server accepted to use h2
* Server certificate:
*  subject: CN=pinguinorodriguez.cl
*  start date: Sep 13 14:41:18 2020 GMT
*  expire date: Dec 12 14:41:18 2020 GMT
*  subjectAltName: host "pinguinorodriguez.cl" matched cert's "pinguinorodriguez.cl"
*  issuer: C=US; O=Let's Encrypt; CN=Let's Encrypt Authority X3
*  SSL certificate verify ok.
* Using HTTP2, server supports multi-use
* Connection state changed (HTTP/2 confirmed)
* Copying HTTP/2 data in stream buffer to connection buffer after upgrade: len=0
} [5 bytes data]
* TLSv1.3 (OUT), TLS Unknown, Unknown (23):
} [1 bytes data]
* TLSv1.3 (OUT), TLS Unknown, Unknown (23):
} [1 bytes data]
* TLSv1.3 (OUT), TLS Unknown, Unknown (23):
} [1 bytes data]
* Using Stream ID: 1 (easy handle 0x55d829880580)
} [5 bytes data]
* TLSv1.3 (OUT), TLS Unknown, Unknown (23):
} [1 bytes data]
> GET / HTTP/2
> Host: pinguinorodriguez.cl
> Range: bytes=0-3,100-103
> User-Agent: curl/7.58.0
> Accept: */*
> 
{ [5 bytes data]
* TLSv1.3 (IN), TLS Unknown, Certificate Status (22):
{ [1 bytes data]
* TLSv1.3 (IN), TLS handshake, Newsession Ticket (4):
{ [130 bytes data]
* TLSv1.3 (IN), TLS Unknown, Unknown (23):
{ [1 bytes data]
* Connection state changed (MAX_CONCURRENT_STREAMS updated)!
} [5 bytes data]
* TLSv1.3 (OUT), TLS Unknown, Unknown (23):
} [1 bytes data]
* TLSv1.3 (IN), TLS Unknown, Unknown (23):
{ [1 bytes data]
* TLSv1.3 (IN), TLS Unknown, Unknown (23):
{ [1 bytes data]
< HTTP/2 206 
< accept-ranges: bytes
< cache-control: max-age=600
< content-type: multipart/byteranges; boundary=e2eec456071a70903cb801ab7cbc8ccad158e8fb1cb5bb22b1212fd7096c
< expires: Sun, 25 Oct 2020 14:33:12 UTC
< last-modified: Sun, 25 Oct 2020 06:42:22 GMT
< vary: Origin
< content-length: 358
< date: Sun, 25 Oct 2020 14:23:12 GMT
< 
{ [5 bytes data]
* TLSv1.3 (IN), TLS Unknown, Unknown (23):
{ [1 bytes data]
* Connection #0 to host pinguinorodriguez.cl left intact

I expected the following

I expected the content of curl.out to contain the 8 bytes resulting from the two 4-byte parts.

$ hexdump curl.out
0000000 213c 4f44 7461 6269
0000008

I got this instead

Instead, the output file was generated with the raw multipart response, including the part boundaries, the part headers, and the final closing boundary.

$ less curl.out
--e2eec456071a70903cb801ab7cbc8ccad158e8fb1cb5bb22b1212fd7096c
Content-Range: bytes 0-3/26092
Content-Type: text/html; charset=utf-8

<!DO
--e2eec456071a70903cb801ab7cbc8ccad158e8fb1cb5bb22b1212fd7096c
Content-Range: bytes 100-103/26092
Content-Type: text/html; charset=utf-8

atib
--e2eec456071a70903cb801ab7cbc8ccad158e8fb1cb5bb22b1212fd7096c--

I believe this to be a bug because RFC 7233 § 4.1 states that

A client that cannot process a multipart/byteranges response MUST NOT generate a request that asks for multiple ranges.

curl/libcurl version

$ curl -V
curl 7.58.0 (x86_64-pc-linux-gnu) libcurl/7.58.0 OpenSSL/1.1.1 zlib/1.2.11 libidn2/2.0.4 libpsl/0.19.1 (+libidn2/2.0.4) nghttp2/1.30.0 librtmp/2.3
Release-Date: 2018-01-24
Protocols: dict file ftp ftps gopher http https imap imaps ldap ldaps pop3 pop3s rtmp rtsp smb smbs smtp smtps telnet tftp 
Features: AsynchDNS IDN IPv6 Largefile GSS-API Kerberos SPNEGO NTLM NTLM_WB SSL libz TLS-SRP HTTP2 UnixSockets HTTPS-proxy PSL 

operating system

$ uname -a
Linux xxx 5.4.0-51-generic #56~18.04.1-Ubuntu SMP Tue Oct 6 09:47:18 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux
@bagder
Copy link
Member

bagder commented Oct 25, 2020

This behavior is documented in the man page for the --range option. If you can't handle the returned content then you shouldn't ask for it. This is not a curl bug.

@bagder bagder added the not-a-bug This is not a bug in curl label Oct 25, 2020
@jjatria
Copy link
Contributor Author

jjatria commented Oct 26, 2020

This is what my version of curl has to say about --range:

-r, --range <range>
       (HTTP  FTP SFTP FILE) Retrieve a byte range (i.e a partial document) from a HTTP/1.1, FTP or SFTP server or a local
       FILE. Ranges can be specified in a number of ways.

       0-499     specifies the first 500 bytes

       500-999   specifies the second 500 bytes

       -500      specifies the last 500 bytes

       9500-     specifies the bytes from offset 9500 and forward

       0-0,-1    specifies the first and last byte only(*)(HTTP)

       100-199,500-599
                 specifies two separate 100-byte ranges(*) (HTTP)

       (*) = NOTE that this will cause the server to reply with a multipart response!

       Only digit characters (0-9) are valid in the 'start' and 'stop' fields of the 'start-stop' range syntax. If a  non-
       digit character is given in the range, the server's response will be unspecified, depending on the server's config‐
       uration.

       You should also be aware that many HTTP/1.1 servers do not have this feature enabled, so that when you  attempt  to
       get a range, you'll instead get the whole document.

       FTP  and SFTP range downloads only support the simple 'start-stop' syntax (optionally with one of the numbers omit‐
       ted). FTP use depends on the extended FTP command SIZE.

       If this option is used several times, the last one will be used.

I can see it contain a notice that certain range specifications will make the server respond with a multipart response, but nowhere in there it says that curl will do nothing to parse this response. Or are you thinking about a different part of this?

I'd argue that, as quoted above at least (and this seems to be the latest version of the documentation) this behaviour is not documented, in that the documentation does not say anything about what curl will do (or won't do) to that response.

@bagder
Copy link
Member

bagder commented Oct 26, 2020

I think it does, but feel free to suggest improvements in the phrasing.

@jzakrzewski
Copy link
Contributor

@bagder maybe we should add in some general section that curl does not interpret or otherwise parse/transform the response unless explicitly documented? This is not the first time someone kinda expects that curl would do some magic with the response which is out of scope of the project.

@bagder
Copy link
Member

bagder commented Oct 26, 2020

Yeah, we should try to make that clearer. The challenge is probably then just where we would put such an explanation...

@jjatria
Copy link
Contributor Author

jjatria commented Oct 26, 2020

I think it makes sense for this project principle ("curl does not transform responses") to be documented somewhere central, but I think it should also probably be mentioned wherever it will have a relevant impact, even if it points to that central place where the principle is explained in more detail.

If we don't object to having this mentioned in more than one place, then I'll happily push a proposed amendment to the --range documentation, but I think I should leave any other changes in the documentation to people more familiar with curl.

@bagder bagder closed this as completed in 15ae039 Oct 26, 2020
bagder added a commit that referenced this issue Oct 26, 2020
Explain the basic concepts behind curl output.

Inspired by #6124
bagder added a commit that referenced this issue Oct 27, 2020
Explain the basic concepts behind curl output.

Inspired by #6124
bagder added a commit that referenced this issue Oct 29, 2020
Explain the basic concepts behind curl output.

Inspired by #6124

Closes #6134
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cmdline tool HTTP not-a-bug This is not a bug in curl
Development

Successfully merging a pull request may close this issue.

3 participants