cURL / Mailing Lists / curl-users / Single Mail

curl-users

RE: file:// command line parsing

From: Roth, Kevin P. <KPRoth_at_MarathonOil.com>
Date: Wed, 6 Mar 2002 17:05:04 -0500

Warning - much rambling went on below!!! Read it if you dare.

Here's the short version - after realizing gdb wasn't so
hard to use after all, I realized that the function
urlglob.c:glob_word() was causing the grief I noticed
(wherein I have to use an extra set of backslashes,
even from the windows cmd.exe shell and the win32 native
version of curl.exe). Here's a patch that corrects the
problem, if you'd care to accept it:

$ diff -uN1 urlglob.c.orig urlglob.c
--- urlglob.c.orig Thu Nov 29 07:47:42 2001
+++ urlglob.c Wed Mar 6 16:50:10 2002
@@ -45,3 +45,3 @@
  *
- * Input a full globbed string, set the forth argument to the amount of
+ * Input a full globbed string, set the fourth argument to the amount of
  * strings we get out of this. Return GlobCode.
@@ -228,2 +228,6 @@
     if (*pattern == '\\') { /* escape character, skip '\' */
+#if defined(WIN32)||defined(__CYGWIN__)
+ if (*(pattern+1) == '{' || *(pattern+1) == '[' ||
+ *(pattern+1) == '}' || *(pattern+1) == ']') {
+#endif
       ++pattern;
@@ -233,2 +237,5 @@
       }
+#if defined(WIN32)||defined(__CYGWIN__)
+ }
+#endif
     }

What it does is allow single backslashes to reside
in the URL (for win32 and cygwin platforms only),
while still honoring the fact that '\{' is no-longer
a glob character, etc... And it also seems to still
work properly even if the slashes were doubled. So,
the following 4 ALL work correctly now (this is
from a bash prompt):

 $ curl file://c:\\Temp\\test1.txt
 $ curl 'file://c:\Temp\test1.txt'

 $ curl file://c:\\\\Temp\\\\test1.txt
 $ curl 'file://c:\\Temp\\test1.txt'

while this fails, as expected:

 $ curl file://c:\Temp\test1.txt
 curl: (37) Couldn't open file c:Temptest1.txt

--Kevin

p.s. - since I bothered typing all of the following
before I found this fix, and since I don't have time
right now to edit it, but since there were a couple
other interesting points, I'm leaving it here for
posterity's sake. You may want to look through it,
or perhaps not...

> -----Original Message-----
> From: Daniel Stenberg [mailto:daniel_at_haxx.se]
> Sent: Wednesday, March 06, 2002 8:25 AM
>
> On Tue, 5 Mar 2002, Roth, Kevin P. wrote:
> > from cygwin's bash prompt:
> > $ curl file:///c/temp/test.txt works ok (posix path)
> >
> > $ curl file://c:\\temp\\test.txt NOT ok (escaped dos path)
> > curl: (37) Couldn't open file c:temptest.txt
>
> The URL is wrongly formatted. The syntax should be
> file://[host]/[path]
>
> Where [host] is always ignored by curl and widely never used,
> so thus most commonly just empty:
> file:///[path]
>
> Thus the above should've been
> $ curl file:///c:\\temp\\test.txt
>
> curl contains some weird code that attemps to work-around
> this syntax error though.

Indeed. If I try as you suggest, I get:

 $ curl file:///c:\\\\temp\\\\test.txt'
 curl: (37) Couldn't open file /c:\temp\test.txt

same with $ curl file://localhost/c:\\\\temp\\\\test.txt.

On this topic, I'm curious - if I tried something like
file://other_machine_name//home/kevin/test.txt, what
network protocol would be used to access that other
machine? To me (although I realize this topic has been
hashed out previously on this list), it should be:
 file://[path-spec]
where "file://" is the URI prefix, similar to http://,
ftp://, etc..., and "[path-spec]" would be the path to
the requested file, in whatever syntax is valid for the
current machine, which should be handled by the local machine.
So, for unix, path-spec would be "/tmp/test.txt", yielding
a "URL" of "file:///tmp/test.txt". On windows, path-spec
would be "c:\temp\test.txt", yielding a URL of
file://c:\temp\test.txt. Also on windows, a "UNC" style
of file path, which already includes a remote machine name,
might be "\\machine_name\temp\test.txt", yielding a URL
of file://\\machine_name\temp\test.txt". This should be
handed straight to the file open function to handle.
MS Internet Explorer actually accepts
file://machine_name/temp/test.txt and must internally
convert this to the equivalent UNC-style of path...

OK, that was a tangent. Back to the question at hand...

> > It appears that somewhere inside its command line parsing,
> > curl.exe is losing a set of back-slashes. [...]

> Also, when I try this out on unix, I get outputs like this:
>
> $ curl "moo\\\\\\"
> curl: (6) Couldn't resolve host 'moo\\\'
>
> Which shows that after the shell has converted each pair of
> backslashes into
> single ones, curl doesn't seem to mock with them. The similar goes for
> file:// of course:
>
> $ curl "file:///moo\\\\\\"
> curl: (37) Couldn't open file /moo\\\

OK, but please explain why the following two are different:

 $ curl file:///moo\\foo
 curl: (37) Couldn't open file /moofoo

 $ curl file:///moo\\foo\\
 curl: (37) Couldn't open file /moo\foo\

The only different seems to be the trailing slash...
And if you add second trailing slash, we're back
to the initial behavior:

 $ curl file:///moo\\foo\\\\
 curl: (37) Couldn't open file /moofoo\

> Fire up gdb. Set a break-point in main(). type 'run [url]'
>
> Does the argv[1] contain the properly looking string? If so,
> then the problem
> is within curl.

Yep. Thanks for forcing me to try using gdb. It wasn't nearly
as painful as I'd imagined...

After getting partway in, I observed the following:
 (gdb) p config->url_list->url
 $55 = 0x10088ff0 "file://c:\\\\\\\\Temp\\\\\\\\test.txt"

 (gdb) p config->url_list->url[9]
 $56 = 92 '\\'

So this version of gdb doubles up embedded back-slashes
when displaying them. Not the nicest, but I can live with it.

But after I observed it step through the de-globbing code
inside of url_glob.c:glob_word(), it turns out your loop
in there to pass by escaped instances of "{","[",etc is
also turning '\\' into '\'... After the deglobbing, we're
down to:

 $57 = .... "file://c:\\\\Temp\\\\test.txt"

which has half as many back-slashes (the string actually
is "file://c:\\Temp\\test.txt" once you account for gdb's
funny way of displaying things...).

I also realized if I specify -g on the command-line
(e.g. $ curl -g file://c:\\Temp\\test.txt, or
$ curl -g 'file://c:\Temp\test.txt'), it no longer
removes a set of back-slashes, and I get the proper
file (albeit with a few extra lines of output, like:

 $ curl -g file://c:\\temp\\test.txt

 [1/2147347448]: file://c:\temp\test.txt --> <stdout>
 --_curl_--file://c:\temp\test.txt
 hello, world.

Are these extra lines intentional? I'd almost have
expected them when I *WAS* globbing, rather than
when I'm not...

Hmm. Well, I now am aware that I can use gdb, and
I have a bit better understanding of how curl parses
file:// urls, but something still doesn't quite seem right.
This issue seems to affect the Win32 version as well as
the cygwin version, so it's more of an issue. If it were
just the cygwin version, I wouldn't mind so much.

For the Win32 version, I ought to be able to make
file://c:\Temp\test.txt work correctly, from the
cmd.exe shell (which does not do any globbing
of parameters, so it doesn't need doubled-up
back-slashes)... I can, but only by specifying
--globoff. Which seems a bit unintuitive to me.

It seems like we should be able to recognize when
a backslash isn't being used to actually quote a
glob special character, and be able to keep it around,
at least when running a win32 version.

--Kevin
Received on 2002-03-06