Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DataURI (data URI scheme) redirect (Location data:[…]) is followed as origin-relative path #9503

Closed
myfonj opened this issue Sep 13, 2022 · 2 comments
Assignees

Comments

@myfonj
Copy link

myfonj commented Sep 13, 2022

I did this

λ curl --location --head --verbose http://tinyurl.com/selfcontained-editable-datauri

and got output (abbreviated to only significant lines truncated with […]):

> HEAD http://tinyurl.com/selfcontained-editable-datauri HTTP/1.1
< HTTP/1.1 308 Permanent Redirect
< Location: data:text/html;charset=utf-8;base64,PCFET0NUWVBFIEhUTU[…]
* Issue another request to this URL: 'http://tinyurl.com/data:text/html;charset=utf-8;base64,PCFET0NUWVBFIEhU[…]'
> HEAD http://tinyurl.com/data:text/html;charset=utf-8;base64,PCFET0NUWVBFIEhUTU[…]
< HTTP/1.1 404 Not Found

I expected the following

I am not completely sure what exactly should happen, but certainly not request to HEAD http://tinyurl.com/data:text/[...], since data: is a scheme (denoting pseudo-protocol), not a path segment.

See: Data URI Scheme.

It would probably make sense to ignore all data URI scheme redirects, since attempt to load data URI directly fails:

λ curl "data:text/plain,test"
curl: (3) URL using bad/illegal format or missing URL

But if it was supported, then for regular non---location --head invocation I'd expect it should (in the first case) decode the base64 payload and then save resulting (text) content into file. (Question is, what name that file should have.)

curl/libcurl version

curl 7.84.0
curl 7.84.0 (x86_64-w64-mingw32) libcurl/7.84.0 OpenSSL/1.1.1q (Schannel) zlib/1.2.12 brotli/1.0.9 zstd/1.5.2 libidn2/2.3.2 libssh2/1.10.0 nghttp2/1.48.0
Release-Date: 2022-06-27
Protocols: dict file ftp ftps gopher gophers http https imap imaps ldap ldaps mqtt pop3 pop3s rtsp scp sftp smb smbs smtp smtps telnet tftp
Features: alt-svc AsynchDNS brotli HSTS HTTP2 HTTPS-proxy IDN IPv6 Kerberos Largefile libz MultiSSL NTLM SPNEGO SSL SSPI threadsafe TLS-SRP zstd

operating system

Windows 10 v21h2 b19044


Notes

This is probably a niche issue. This kind of redirect (as well as any kind of indirect navigation to DataURI documents) is nowadays blocked in web browsers altogether (or at least in top-level frames) for security reasons — sadly, DataURIs were reportedly heavily abused for scams — but in the past versions this particular HTML doodle "stored" in the TinyURL redirect worked: it loaded HTML "quine" document with text editor that could save itself back into DataURI. As of today it kinda works when loaded into sub-frame and only in Firefox, like:

data:text/html,<iframe src="http://tinyurl.com/selfcontained-editable-datauri" style="width:80vw;height:99vh"></iframe>
@bagder
Copy link
Member

bagder commented Sep 14, 2022

Ugh, this is clearly tricking curl because it does not consider this an absolute URL...

@bagder bagder self-assigned this Sep 14, 2022
bagder added a commit that referenced this issue Sep 14, 2022
When the parser is not allowed to guess scheme, it should consider the
word ending at the first colon to be the scheme, independently of number
of slashes.

The parser now checks that the scheme is known before it counts slashes,
to improve the error messge for URLs with unknown schemes and maybe no
slashes.

When following redirects, no scheme guessing is allowed and therefore
this change effectively prevents redirects to unknown schemes such as
"data".

Fixes #9503
@bagder
Copy link
Member

bagder commented Sep 14, 2022

With my fix in #9504, this is the new behavior:

$ curl -LI http://tinyurl.com/selfcontained-editable-datauri
...
curl: (1) The redirect target URL could not be parsed: Unsupported URL scheme

bagder added a commit that referenced this issue Sep 14, 2022
When the parser is not allowed to guess scheme, it should consider the
word ending at the first colon to be the scheme, independently of number
of slashes.

The parser now checks that the scheme is known before it counts slashes,
to improve the error messge for URLs with unknown schemes and maybe no
slashes.

When following redirects, no scheme guessing is allowed and therefore
this change effectively prevents redirects to unknown schemes such as
"data".

Fixes #9503
@bagder bagder closed this as completed in 8466785 Sep 15, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

Successfully merging a pull request may close this issue.

2 participants