cURL / Mailing Lists / curl-library / Single Mail

curl-library

Re: ftp enhancement - FTP wildcard download

From: Daniel Stenberg <daniel_at_haxx.se>
Date: Fri, 1 Jan 2010 16:18:41 +0100 (CET)

On Thu, 31 Dec 2009, Pavel Raiskup wrote:

>> I would prefer that metadata callback to get called before any data is
>> delivered, so that an app could get to know the file name etc before any
>> data is downloaded.
>
> Yes, calling this callback means starting of new file transfer and means end
> of previous file transfer (gives info about download result, bytes
> transferred, etc.)

Another approach would be to require a new curl_easy_perform() for each
subsequent transfer, and just have a not modified URL mean that it will
continue on the wildcard-transfer. It would solve how to deal with return
codes and progress meter etc. I think I'd prefer that.

>> Also, if done client-side it might be easier to provide the same matching
>> concept not only between different FTP servers but also in the future
>> between FTP and SFTP downloads etc.
>
> I think we should generally expect different type of LIST response too :-(
> ..

Hm. Are you considering doing a LIST parsing? In my book that is really nasty
stuff that will take time and lots of work to get working even half decent.
I was kind of hoping we could get away with doing NLST...

> I thought about it .. I propose, that there could be new CURLOPT_RECURSIVE
> or something like this and default should be "false" probably?

I propose we simply ignore recursiveness to start with. Recursive is not a
true/false option anwyay since a client most likely would like to control
depth or even what dirs to traverse etc.

> As Kamil said you on IRC, I'd like to implement matching with fnmatch(3),
> but there could be (maybe) small problem. Function fnmatch is POSIX-like,
> anyway there is problem with linking on windows -> and probably could be on
> other platforms too.

Yes, libcurl builds and runs on numerous non-POSIX platforms, we will face the
problem with how to provide a fnmatch alternative for all that don't. It's not
just windows, but a large amount of RTOSes, VMS, amiga, OS/2 etc.

> On windows this function is implemented in libiberty.a - it is fine, but
> there is no header file "fnmatch.h".

Is that really "windows" or is it cygwin/mingw that provides it? Anything
named .a sounds very non-original-windows to me.

> The main reason why this header is missing is that there could occur an
> arror with "windows-like" backslashes in path (I think it now, not sure).

We deal with URLs, we don't have backslashes!

> It does not mean problem for libcurl+FTP and many other protocols (having no
> backslashes in url), but for "file://" under windows it could make troubles.
> (it is true only when user wants to get "file://c:\something\*" .. not if
> user asked "fnmatch friendly" "file://c:/something/*" of course)

We should just make sure our parser is strict enough, backslashes are not path
separators in URLs.

> My priority is "no new dependencies" and "no breaking backward
> compatibility", anyway do fnmatch "effect" by myself could be relatively
> complex .. :-/

Right, but that's also a reason to consider a more moderate approach from the
beginning as I would really dislike adding a feature that would depend on a
POSIX function - with the knowledge that a lot of existing targets don't have
it. Writing a basic fnmatch() alternative that features a fair amount of
wildcard matching abilities is not hard. The downside is of course that it
won't be an exact fnmatch() clone...

> In my e-mail attachement is conceptual pseudocode.c and diagram.png of about
> enhancement libcurl's easy_API, multi will be adequate, but non-blocking
> version. It is not necessary study "in detail" this simple concept, but if
> somebody here could see this and check it (if I'm not completely wrong), I'd
> be very happy. This should only describe you how I want do this.

Reading the psuedo-code example, it struck me that I think you should rather
start with only doing wildcard-matching on the file part of the URL and not
for the directory. Like "ftp://site/dir/to/*.c" and not support
"ftp://site/*/*/*.c".

This is primarily to keep the code simpler and smaller. Also, if we just feed
the wildcard callback enough info it will be easy enough for an application to
do more fancy wildcard and recursive fetches with libcurl providing the
support for the core file entry matching.

> Note that I haven't solved progress bar! I think there are 3 cases:
> 1) Probably I should update progress bar for each file separately... xor I
> think worse case:

I think it should be done for each single file. The wildcard callback could
possibly include details like how many files there are, how many that matched
and what number this particular file is among the matching ones.

-- 
  / daniel.haxx.se
-------------------------------------------------------------------
List admin: http://cool.haxx.se/list/listinfo/curl-library
Etiquette:  http://curl.haxx.se/mail/etiquette.html
Received on 2010-01-01