cURL / Mailing Lists / curl-library / Single Mail


Re: patch for file:// encoding on Windows

From: <>
Date: Mon, 29 Sep 2014 07:31:40 -0600 (MDT)

----- Original Message -----
> On Sat, 27 Sep 2014, wrote:
> >> It feels like something with a much larger scope than just file:// URLs
> >> that I feel very scared of even considering. Please provide a proper
> >> motivation for why we want this! URLs are not UTF-8, they're a sequence of
> >> bytes/octets.
> >
> > The raw "sequence of bytes" idea doesn't work on Windows.
> Sure it does. See below.
> > From the current code page
> That's not a very workable approach. What if you copy the URL from somewhere?
> Assuming a "current code page" is asking for non-deterministic behaviors in
> how the input is treated.
> > Not all files are accessible this way when you have an NTFS file system
> > that
> > supports file names that can't be represented with the default 8 bit
> > encoding.
> It is a mistake to think that you should be able to feed in the "raw 8 bit
> encoding" in the URL to start with. Also, a URL should work the same no
> matter
> which OS you run where you enter it so treating it differently if you feed it
> on windows than on non-windows is asking for trouble.
> > This problem has been brought up before:
> ... and never properly dealt with in any of those situations.
> "This problem" is at least two separate ones: 1 - what the URL should look
> like to allow a unicode file name to get opened and 2 - have the actual file:
> code understand and work with a file name provide according to (1).
> So a question that would help me at least form my opinion on this better:
> given a unicode file name example like "ŕéüñíöñ", how does a file: URL that
> works with IE, Firefox and Chrome look like? I don't mean what it looks like
> in the URL bar, but if you copy it and paste it somewhere, what does that
> look
> like?
> In both Firefox and Chrome on Linux, such a file name in my home directory
> uses this URL:
> file:///home/daniel/%C5%95%C3%A9%C3%BC%C3%B1%C3%AD%C3%B6%C3%B1
> Percent-encoded UTF-8 it looks like to me.
> No "current code page" necessary. A single defined way how to decode it.

Thanks for the discussion. It seems like you only have concerns about the second patch.
If percent encoded UTF-8 is used, then only the first patch is needed to fix the problem.

Are you OK with that solution?


List admin:
Received on 2014-09-29