cURL / Mailing Lists / curl-library / Single Mail

curl-library

Re: ftp enhancement - FTP wildcard download

From: Pavel Raiskup <xraisk00_at_gmail.com>
Date: Sat, 02 Jan 2010 00:06:05 +0100

>>> ... CLIENT SIDE MATCH
>>> Also, if done client-side it might be easier to provide the same
>>> matching concept not only between different FTP servers but also in
>>> the future between FTP and SFTP downloads etc.
>>
>> I think we should generally expect different type of LIST response too
>> :-( ..
>
> Hm. Are you considering doing a LIST parsing? In my book that is really
> nasty stuff that will take time and lots of work to get working even
> half decent. I was kind of hoping we could get away with doing NLST...

I proposed it because I'd like to know if each filename (in NLST or LIST)
is a directory or regular file. And NLST is quite limiting for that. I can
imagine it could be relatively difficult to parse LIST, but there could be
other advantages like known file-size (could save one ftp command for each
matched file!) last-change time (e.g. for selective download maybe).

Even if we use NLST we should tell the the app information if filename
means
directory or regular file - it is almost necessary for recursive download
--> even if is it not implemented in libcurl and recursive download is done
on users application (using wildcard as the base). If I use NLST only,
this information (dir/file) will depend only on ftp's state-code --> I'll
try to download regular file (matched filename) and if it fails (550) it is
directory .. :-(

I'm not sure, if there is 100% way to get necessary information from the
LIST response (where to find filename and permission bits there) and if
servers are so incompatible to do this job impossible. But I'm sure that I
could make the parsing method more than "usable" and if there occurs an
error with LIST (response unanalysable), simply return an error. I think
there could exist achievable way.

>> I thought about it .. I propose, that there could be new
>> CURLOPT_RECURSIVE or something like this and default should be "false"
>> probably?
>
> I propose we simply ignore recursiveness to start with. Recursive is not
> a true/false option anwyay since a client most likely would like to
> control depth or even what dirs to traverse etc.

Yep that's true..

>> As Kamil said you on IRC, I'd like to implement matching with
>> fnmatch(3), but there could be (maybe) small problem. Function fnmatch
>> is POSIX-like, anyway there is problem with linking on windows -> and
>> probably could be on other platforms too.
>
> Yes, libcurl builds and runs on numerous non-POSIX platforms, we will
> face the problem with how to provide a fnmatch alternative for all that
> don't. It's not just windows, but a large amount of RTOSes, VMS, amiga,
> OS/2 etc.
>> On windows this function is implemented in libiberty.a - it is fine,
>> but there is no header file "fnmatch.h".

> Is that really "windows" or is it cygwin/mingw that provides it?
> Anything named .a sounds very non-original-windows to me.

Sorry, I didn't thought that it implements directly windows, I tried to
say: "Under windows this function can be used from libiberty.a" which is
in my opinion part of MinGW.

I have compiled it standard way .. shortcut win+r => "cmd" .. command "gcc
main.c -o program -pedantic -Wall -std=c99 -liberty"

>> The main reason why this header is missing is that there could occur an
>> arror with "windows-like" backslashes in path (I think it now, not
>> sure).
>
> We deal with URLs, we don't have backslashes!

Sorry again, I was confused. I was trying curl (client) on windows
commandline, and there can be used "file://c:\somewhere\someting.xxx" too.
That compatibility is probably implemented in client and libcurl recieves
transformated url..

>> My priority is "no new dependencies" and "no breaking backward
>> compatibility", anyway do fnmatch "effect" by myself could be
>> relatively complex .. :-/
>
> Right, but that's also a reason to consider a more moderate approach
> from the beginning as I would really dislike adding a feature that would
> depend on a POSIX function - with the knowledge that a lot of existing
> targets don't have it. Writing a basic fnmatch() alternative that
> features a fair amount of wildcard matching abilities is not hard. The
> downside is of course that it won't be an exact fnmatch() clone...

We can use fnmatch "wrapper" which has the same interface as fnmatch has
and inside of it will be #ifdef FNMATCH_EXISTS .. for fully pattern match
support, and #else my_easier_fnmatch() wrote by myself.

Other possibility is that #ifndef FNMATCHEXISTS disables all wildcard
support .. (there is minimum of platforms not supported fnmatch)

or combine it..

Or make only our_fnmatch, platform independent and easier then fnmatch
(maybe only '*' and '?' at start).
Ok, it is not the worst case, because there could be one special feature
that could by consider. As I saw in curl client, there is possible to
download matched files this way:
curl http://{site,host}.host[1-5].com -o "#1_#2"
(get from man pages)
And this "case" will open the door to extend the pattern syntax in
future.

>> In my e-mail attachement is conceptual pseudocode.c ...
>
> Reading the psuedo-code example, it struck me that I think you should
> rather start with only doing wildcard-matching on the file part of the
> URL and not for the directory. Like "ftp://site/dir/to/*.c" and not
> support "ftp://site/*/*/*.c".
>
> This is primarily to keep the code simpler and smaller. Also, if we just
> feed the wildcard callback enough info it will be easy enough for an
> application to do more fancy wildcard and recursive fetches with libcurl
> providing the support for the core file entry matching.

This is true! Even "ftp://site/*/*/*.c" could be downloaded this way (in
app):
ftp://site/*
N*ftp://site/*/*
N*N*ftp://site/*/*/*.c
and MAYBE
N*N*N*ftp://site/*/*/*.c/*

... If the callback provides skipping downloading files that are not
needed.

>> Note that I haven't solved progress bar! ...
>
> I think it should be done for each single file. The wildcard callback
> could possibly include details like how many files there are, how many
> that matched and what number this particular file is among the matching
> ones.

... and that could be used e.g. for the second "progressbar" in app. Great!

Pavel
-------------------------------------------------------------------
List admin: http://cool.haxx.se/list/listinfo/curl-library
Etiquette: http://curl.haxx.se/mail/etiquette.html
Received on 2010-01-02