Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Restore old behavior for file:////foo/bar URLs #2438

Closed
wants to merge 1 commit into from

Conversation

nuxi
Copy link
Contributor

@nuxi nuxi commented Mar 30, 2018

This is a fairly common thing to end up in scripts via constructions that do
things like: file:///${VAR}

curl allowed this prior to 7.57.0 but it was broken by adding "support"
for UNC paths as defined in Appendix E.3.2 of RFC 8089. This appendix is
non-normative though and curl does not actually support it beyond rejecting
them. (In fact curl does not support any flavor of remote file: URLs.)

Given the fact that curl does not actually implement support for UNC paths I think it would be better to restore the old behavior.

@nuxi
Copy link
Contributor Author

nuxi commented Mar 30, 2018

This may be a case where the correct behavior of curl on Unix and the correct behavior of curl on Windows are not actually the same thing. Firefox and Chrome handle this case differently on different operating systems. Both of them treat it as a local filesystem path on Unix and a UNC path on Windows.

Although its not clear to me if this behavioral difference in Firefox and Chrome is something they explicitly implemented or if it is really just a difference in how Linux and Windows treat paths like //foo/bar. If it is the latter then perhaps curl should also just let the OS figure it out.

@jay jay added the URL label Mar 30, 2018
@jay
Copy link
Member

jay commented Mar 30, 2018

This is going to make file:////foo/bar -> /foo/bar which in Windows I don't think is what you want, and I'm not sure that it's right for Unix either. How are you supposed to access shares via file: in Unix?

@nuxi
Copy link
Contributor Author

nuxi commented Mar 30, 2018

This is going to make file:////foo/bar -> /foo/bar which in Windows I don't think is what you want

You're right. The old windows behavior appears to be passing it as //foo/bar (presumably getting converted to \\foo\bar somewhere along the way) to the Windows API which results in windows handling the SMB access for us. (I'm not a windows person, I confirmed this by using wireshark to watch for netbios queries)

So actually there is a Windows regression here too right now. Before version 7.57.0, curl already handled UNC paths like file:////server/volume/path because it would just pass \\server\volume\path to the OS which then handled the SMB access on behalf of curl. The check here is actually preventing that from happening.

And that also means my change to test2072 will fail on windows. Is there a way to make a test OS specific?

and I'm not sure that it's right for Unix either.

You're right, just like above it should just be passed to the OS as //foo/bar. I just noticed that later on there is a code block which handles stripping the duplicate / but it can't be reached because of this SMB code. This isn't actually necessary on Unix because the OS will handle that for you and in order to restore the old Windows behavior it needs to be removed.

How are you supposed to access shares via file: in Unix?

Windows file shares? smb://server/path

I think Appendix E.2.3 of RFC 8089 is really just explaining how Windows path resolution combines with file:// URLs. Its not so much a part of the specification, as much as it is a summary of how the specification interacts with the Windows API. Unix has its own path resolution behavior which is different.

@nuxi nuxi changed the title Allow file:////path/to/file URLs again [WIP] Restore old behavior for file:////foo/bar URLs Mar 30, 2018
@nuxi nuxi changed the title [WIP] Restore old behavior for file:////foo/bar URLs Restore old behavior for file:////foo/bar URLs Mar 30, 2018
@nuxi
Copy link
Contributor Author

nuxi commented Mar 30, 2018

I didn't see a way to flag tests as OS specific so I threw 2072 in the disabled list with a note saying it was OS specific behavior and that the test was for the Unix behavior.

I don't have a Windows box set up to be able to build stuff like this. With this patch it should behave like it did before the changes in 7.57.0 and trigger SMB access on Windows again.

@jay
Copy link
Member

jay commented Mar 30, 2018

@phluid61 could use your input

@phluid61
Copy link
Contributor

I think Appendix E.2.3 of RFC 8089 is really just explaining how Windows path resolution combines with file:// URLs. Its not so much a part of the specification, as much as it is a summary of how the specification interacts with the Windows API. Unix has its own path resolution behavior which is different.

I wanted it to be part of the specification. However updating such an ancient and underspecified scheme was already contentious enough, without making any outright changes -- so we're left with this gap. The informational appendix was my attempt at describing what seemed like the best chance for interoperability without actually writing any normative language; but that lack of normativity also meant it received less scrutiny (so more grains of salt are required along with it.)

If/when IETF regains the energy and appetite for it, I'm confident a future version of the file: spec will include normative requirements for handling 4+ slashes, but it's important to remember that implementation drives specification -- if the majority of implementations share common behaviour, that behaviour is what will be specified. It's also worth pointing out that there is another Skywalker, which changes a lot more rapidly than the IETF RFC publication cycle, so it could be worth trying to prize apart that spec to see if it shows a way forward.


In terms of this particular PR, is this a correct summary of what gets passed to the OS to open()?

input current proposed
file:////foo error //foo
file://///foo error ///foo
(etc. with more slashes)
file://localhost//foo /foo //foo
file://localhost///foo //foo ///foo

If so, it's fine with me. (I derived those bottom rows by reading the code. If that's how it actually works, then... 🤔)

@dfandrich
Copy link
Contributor

dfandrich commented Mar 30, 2018 via email

@nuxi
Copy link
Contributor Author

nuxi commented Mar 30, 2018

I wanted it to be part of the specification.

Sorry, it was getting late and I didn't actually phrase that well. I meant to say that it was probably only intended to be implemented on platforms that already use Windows/UNC path resolution behavior.

I don't know if that actually matches your intentions, but its what I intended to say.

So curl on Windows should follow Appendix E.2.3 because that is how Windos path resolution works. Whereas curl on Unix should just follow Unix path resolution behavior instead. This OS dependent behavior is how Mozilla and Chrome appear to behave.

In terms of this particular PR, is this a correct summary of what gets passed to the OS to open()?

Yes. In fact I hadn't thought to try the ones with the explicit localhost, but that is how they behave.

@nuxi
Copy link
Contributor Author

nuxi commented Mar 30, 2018

You can use an appropriate . There aren't any existing ones for OS detection, but perl is available so I'm sure there's a one-liner that would do it.

Thanks, I added a precheck based on the data in http://perldoc.perl.org/perlport.html#PLATFORMS

lib/url.c Outdated
* path, would have included a second slash. So if there are two
* slashes, swallow one.
* path, would have included a second slash. POSIX requires the OS to
* handle leading double slashes (or triple or quadruple) so there is
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

curl builds on numerous not-quite-POSIX platforms as well so I don't think the phrasing should this "absolute" here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is changing it from "there is no need" to "there should be no need" enough?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I could just remove this entire giant comment block since there is no longer any actual code here implementing something. I guess the only reason I left it here was to include the comment about how stripping the extra slash would actually break Windows UNC paths.

lib/url.c Outdated
* no need to do anything special here for this case.
*
* Swallowing the extra "/" will actually break the UNC path resolution
* on Windows which is described in RFC 8089, Appendix E, Section E.2.3
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks like a typo. It should be section 3.2, right? The same mistake exists in the commit message.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, my mistake.

curl 7.57.0 and up interpret this according to Appendix E.3.2 of RFC
8089 but then returns an error saying this is unimplemented. This is
actually a regression in behavior on both Windows and Unix.

Before curl 7.57.0 this URL was treated as a path of "//foo/bar" and
then passed to the relevant OS API. This means that the behavior of this
case is actually OS dependent.

The Unix path resolution rules say that the OS must handle swallowing
the extra "/" and so this path is the same as "/foo/bar"

The Windows path resolution rules say that this is a UNC path and
automatically handles the SMB access for the program. So curl on Windows
was already doing Appendix E.3.2 without any special code in curl.
@bagder
Copy link
Member

bagder commented Apr 6, 2018

Thanks!

@bagder bagder closed this in 695e96b Apr 6, 2018
@lock lock bot locked as resolved and limited conversation to collaborators Jul 5, 2018
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Development

Successfully merging this pull request may close these issues.

None yet

5 participants