Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Absolute domain names get trailing dot stripped from host request header #8290

Closed
ccazabon opened this issue Jan 17, 2022 · 13 comments
Closed

Comments

@ccazabon
Copy link

ccazabon commented Jan 17, 2022

curl doesn't honour the domain name part of the redirected URL if it is an absolute name; curl strips the trailing dot. This causes problems loading some pages where the servers are configured to redirect non-absolute requests to the absolute ones:

$ curl -v -IL --max-redirs 1 http://pyropus.ca./
> HEAD / HTTP/1.1
> Host: pyropus.ca
> User-Agent: curl/7.74.0
> Accept: */*
> 
< HTTP/1.1 301 Moved Permanently
< Location: https://pyropus.ca./
< 
* Connection #0 to host pyropus.ca left intact
* Issue another request to this URL: 'https://pyropus.ca./'
*   Trying 96.126.125.117:443...
*  SSL certificate verify ok.
> HEAD / HTTP/1.1
> Host: pyropus.ca
> User-Agent: curl/7.74.0
> Accept: */*
> 
< HTTP/1.1 301 Moved Permanently
< Location: https://pyropus.ca./

... so that loops until max-redirects is hit.

The RFCs seem to say that the location value should be used as presented by the server. It should only strip the dot from SNI, not from the Host: request header.

This has come up a couple of times before:
#716
#3022

... but those seem to be about requests to servers that accidentally (?) include the trailing dot, i.e. the servers are not expecting it. The server itself isn't issuing a redirect to the absolute version of its domain name. So accidental breakage in the cases listed shouldn't be a major issue, I think. While the current behaviour does break sites that are using this feature deliberately.

As for other user agents:

  • all GUI browers, to my knowledge, preserve the absolute domain name in the Host: header - I've tested Firefox, Chromium, Vivaldi, Konqueror, and Safari
  • wget also preserves the absolute domain name on redirect URLs
  • lynx, links, and elinks handle it like curl, erroring out on a redirect loop

I didn't find any other open/closed issues related to this, and the known bugs page doesn't have anything relevant.

I did this

curl -v -IL --max-redirs 1 https://pyropus.ca/

I expected the following

curl to resend the request with the Host: header value set to pyropus.ca. .

curl/libcurl version

curl 7.74.0 (x86_64-pc-linux-gnu) libcurl/7.74.0 OpenSSL/1.1.1k zlib/1.2.11 brotli/1.0.9 libidn2/2.3.0 libpsl/0.21.0 (+libidn2/2.3.0) libssh2/1.9.0 nghttp2/1.43.0 librtmp/2.3
Release-Date: 2020-12-09
Protocols: dict file ftp ftps gopher http https imap imaps ldap ldaps mqtt pop3 pop3s rtmp rtsp scp sftp smb smbs smtp smtps telnet tftp 
Features: alt-svc AsynchDNS brotli GSS-API HTTP2 HTTPS-proxy IDN IPv6 Kerberos Largefile libz NTLM NTLM_WB PSL SPNEGO SSL TLS-SRP UnixSockets

operating system

Linux 5.14.16 #1 SMP Thu Nov 4 11:18:07 CST 2021 x86_64 GNU/Linux

Debian 11 / Bullseye

@bagder
Copy link
Member

bagder commented Jan 17, 2022

I brought this subject to the httpbis group once, didn't help much. It is messy.

@ccazabon
Copy link
Author

While there hasn't been a lot of encouragement in that thread, there's also very little in the way of reports of real damage from handling absolute domains in URLs/Host request header fields. Given the massive number of requests are being done by the major browsers, if merely supporting absolute URLs caused problems, they would probably either have been fixed by now, or the browser makers would have changed the behaviour of their software. I've certainly never seen anyone complaining about wget handling it that way either.

As I see it, there are really four cases here:

  • webservers that use the absolute version of their domain name as the canonical URL, and redirect requests with the non-absolute version of their domain name to the absolute version, like pyropus.ca./ above. I've seen enough other examples of this over the years, though it's pretty much impossible to craft a search query to find them. Some sites like them because it speeds up the initial request handling, as the client's resolver skips all the search-domain prefix versions it would normally try before getting around to trying it against the root.
  • webservers that handle either the absolute or non-absolute version of their domain name, and serve either without redirecting. https://cbc.ca./ , https://gizmodo.com./ , https://jalopnik.com./ , even https://devblogs.microsoft.com./ ) Some will generate links using whatever domain name the original request used, others will always generate either non-absolute or absolute.
  • webservers that use the non-absolute version of their domain name as the canonical URL, redirecting requests which specify the absolute version in the Host: header.
  • webservers that only consider the non-absolute version of their domain name, and break in various ways (from 404 to certificate errors or other problems) if you send the absolute domain name in the Host: header.

The first three all work just fine if the user agent preserves the trailing dot in the Host header (either from a redirect, like with pyropus.ca., or from a user typing it in themselves, or from a link from elsewhere on the net). Only the fourth case breaks, and those servers never generate a redirect Host: header or link using the absolute version, so the only people that would get bitten in this case would be people who manually add a trailing dot to a link or domain name -- and pretty much nobody, to within rounding error, does that.

So given that there's no real problems being prevented by not supporting/preserving absolute hostnames in Host: headers, and that supporting them does actually fix a problem (as well as making the library's behaviour closer to the most common user-agents), it seems to me it would be worth making the change.

@ryandesign
Copy link
Contributor

It looks like this used to work in curl and changed at some point.

$ /usr/bin/curl --version
curl 7.30.0 (x86_64-apple-darwin13.0) libcurl/7.30.0 SecureTransport zlib/1.2.5
Protocols: dict file ftp ftps gopher http https imap imaps ldap ldaps pop3 pop3s rtsp smtp smtps telnet tftp
Features: AsynchDNS GSS-Negotiate IPv6 Largefile NTLM NTLM_WB SSL libz
$ /usr/bin/curl -IL http://pyropus.ca/ --max-redirs 3
HTTP/1.1 301 Moved Permanently
Date: Tue, 18 Jan 2022 02:34:37 GMT
Server: Apache/2.4.52 (Debian)
Location: https://pyropus.ca./
Content-Type: text/html; charset=iso-8859-1

HTTP/1.1 200 OK
Date: Tue, 18 Jan 2022 02:34:37 GMT
Server: Apache/2.4.52 (Debian)
Accept-Ranges: bytes
Vary: Accept-Encoding
X-Frame-Options: sameorigin
Content-Type: text/html
$
$ /usr/bin/curl --version
curl 7.43.0 (x86_64-apple-darwin14.0) libcurl/7.43.0 SecureTransport zlib/1.2.5
Protocols: dict file ftp ftps gopher http https imap imaps ldap ldaps pop3 pop3s rtsp smb smbs smtp smtps telnet tftp
Features: AsynchDNS IPv6 Largefile GSS-API Kerberos SPNEGO NTLM NTLM_WB SSL libz UnixSockets
$ /usr/bin/curl -IL http://pyropus.ca/ --max-redirs 3
HTTP/1.1 301 Moved Permanently
Date: Tue, 18 Jan 2022 02:35:10 GMT
Server: Apache/2.4.52 (Debian)
Location: https://pyropus.ca./
Content-Type: text/html; charset=iso-8859-1

HTTP/1.1 301 Moved Permanently
Date: Tue, 18 Jan 2022 02:35:11 GMT
Server: Apache/2.4.52 (Debian)
Location: https://pyropus.ca./
Content-Type: text/html; charset=iso-8859-1

HTTP/1.1 301 Moved Permanently
Date: Tue, 18 Jan 2022 02:35:11 GMT
Server: Apache/2.4.52 (Debian)
Location: https://pyropus.ca./
Content-Type: text/html; charset=iso-8859-1

HTTP/1.1 301 Moved Permanently
Date: Tue, 18 Jan 2022 02:35:11 GMT
Server: Apache/2.4.52 (Debian)
Location: https://pyropus.ca./
Content-Type: text/html; charset=iso-8859-1

curl: (47) Maximum (3) redirects followed
$

@bagder
Copy link
Member

bagder commented Jan 18, 2022

there are really four cases here:

I think you're forgetting cases though by looking at this as HTTP only.

For example how SNI says there should be no trailing dot in there, so you'd need to send different names in the separate fields (to comply) or not, and there will be servers assuming one or the other way,

But yes, I presume we should make curl work closer to what the browsers do with the trailing dots, since they are the main clients that people are generally adapting to. How do they populate the SNI field with trailing dots?

@ccazabon
Copy link
Author

The browsers leave the dot in the Host: header and strip it from the SNI field as that spec requires. This is what seems to work well for them. It seems like a reasonable approach, and as Ryan mentions above, curl may even have been doing this previously?

@ccazabon
Copy link
Author

Oh, and as far as other protocols go - this two-values-are-different issue can't come up unless the agent has to send the SNI field and also send the hostname-with-a-dot in a request header field or similar. I don't think that would apply to ftp or gopher or some of the other protocols curl supports - haven't thought about it long enough to eliminate all of them.

@bagder
Copy link
Member

bagder commented Jan 18, 2022

curl may even have been doing this previously?

... and we if we did, we changed because it caused a problem in the past...

@bagder
Copy link
Member

bagder commented Jan 18, 2022

Since we need to drop the dot from the SNI, a HTTPS server cannot differentiate between a host name with a trailing dot or not. That makes you question how HTTP and HTTPS can be made that different...

Update: I think we've even had issues with HTTPS servers not liking the trailing dot when the SNI (virtual host) is setup to not use a dot.

@ccazabon
Copy link
Author

I'm not an expert, but I believe SNI has to have the dot stripped - but the various domains listed above seem to work fine with and without it in the Host: header.

The only problems I've been able to find have been by me appending a dot to the domain name manually when typing in a URL with some webservers that can't handle it correctly (no domain configured errors etc). But those domains do not generate links or redirect URLs containing the trailing dot, so you can't run into that problem "normally" - i.e. you have to stick your hand in the saw blade to get the negative result.

Maybe some HTTPS servers don't like absolute host names in the Host: request header - but again, they won't be generating such URLs so you can only run into problems by adding it yourself. Plenty of other sites handle it just fine.

@ryandesign
Copy link
Contributor

changed at some point

5de8d84

@ccazabon
Copy link
Author

Okay, that commit message seems to be saying they needed to remove the trailing dot from the SNI value (which is true) but they also did the same for the Host: HTTP request header without including a reason for that. I think they just didn't consider the breakage doing that would cause.

The SNI change is correct, the Host: header field one is not. Can that part be reverted so curl users can execute requests with absolute domain names again?

@bagder
Copy link
Member

bagder commented Jan 22, 2022

Can that part be reverted

it is just source code, everything is possible. I doubt the commit in question will agree to "just be reverted" though after all this time and some later follow-up tweaks, so it needs some proper massaging and most likely also a test case or two added to verify.

bagder added a commit that referenced this issue Jan 22, 2022
Reverts 5de8d84 (May 2014, shipped in 7.37.0) and the
follow-up changes done afterward.

Keep the dot in names for everything except the SNI to make curl behave
more similar to current browsers. This means 'name' and 'name.' send the
same SNI for different 'Host:' headers.

Updated test 1322 accordingly

Fixes #8290
Reported-by: Charles Cazabon
Closes #
@curl curl deleted a comment from Sujooozz Jan 22, 2022
bagder added a commit that referenced this issue Jan 23, 2022
Reverts 5de8d84 (May 2014, shipped in 7.37.0) and the
follow-up changes done afterward.

Keep the dot in names for everything except the SNI to make curl behave
more similar to current browsers. This means 'name' and 'name.' send the
same SNI for different 'Host:' headers.

Updated test 1322 accordingly

Fixes #8290
Reported-by: Charles Cazabon
Closes #8320
@ccazabon
Copy link
Author

Thank you!

bagder added a commit that referenced this issue Jan 24, 2022
Reverts 5de8d84 (May 2014, shipped in 7.37.0) and the
follow-up changes done afterward.

Keep the dot in names for everything except the SNI to make curl behave
more similar to current browsers. This means 'name' and 'name.' send the
same SNI for different 'Host:' headers.

Updated test 1322 accordingly

Fixes #8290
Reported-by: Charles Cazabon
Closes #8320
@bagder bagder closed this as completed in b27ad8e Jan 27, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

Successfully merging a pull request may close this issue.

3 participants