cURL / Mailing Lists / curl-library / Single Mail

curl-library

Re: Handling broken ftp REST over 2 GB

From: Dave Meyer <meyer_at_paracel.com>
Date: Wed, 10 Dec 2003 17:38:27 -0800 (PST)

> > The ftp server that I'm communicating with supports the REST command,
> > although it has slightly broken support (in my opinion), in that when curl
> > asks for REST with a file offset larger than 2 GB, the server responds
> > saying that's OK, but that it's going to resume the download starting at
> > 2^32 - 1 bytes into the file, regardless of what offset larger than 2 GB I
> > use.
>
> Eeek. That surely is a nasty behavior that isn't mandated by anything in
> RFC959. REST is supposed to set the restart position, and the server sends 350
> to acknowledge the request.
>
> Out of curiosity, what server is this?

The sample URL is ftp://ftp.ncbi.nih.gov/blast/db/FASTA/nt.gz. I've also
contacted them about this problem, so I suppose it's possible that the
NCBI folks might decide to fix their ftp server. But I wouldn't count on
it...

> > From my browsing of the code, it doesn't appear that curl pays attention to
> > the actual response information from the ftp server for the REST command --
> > it only watches the response code.
>
> Right, since I've been totally unaware of any server doing this craziness. And
> does it really start at the position it claims it starts at or is it just an
> output flaw?

I tried downloading the last 1000 bytes, and stopped it after ~1 MB of
data had been received. I also downloaded ~1 MB of data starting 2 GB
into the file. cmp claims that the contents of the files are the same
(aside from the minor difference in size, of course, since I just halted
the download manually). My expectation would be that a gzipped file
wouldn't have a 1 MB long repeat; further, I would not expect to be able
to download over 1 MB of data starting at a position 1000 bytes from the
end of the file. Thus, I must conclude that for all offsets >= 2 GB, the
ftp server starts sending at 2 GB.

For kicks, I've tried asking for various REST values on a bunch of other
ftp servers (although I have *not* downloaded any data from any of these).
Several servers simply don't support REST (code 502). One supported REST
(code 350) until I asked for an offset beyond 2 GB, at which point it
complained that I didn't give it a numeric value (code 501). Several
servers claimed that the REST was OK (code 350), but said in their
messages that they were starting at 2 GB for any offset beyond that.
Several did the same thing (code 350), but showed a negative offset for
their starting point in their return messages. Who knows what would
happen there. :)

Finally, I did find a couple of servers whose responses were OK (code 350)
and whose messages actually claimed they were going to start where I
asked, even for offsets above 4 GB. On one of those (ftp.fsn.hu) I also
asked for an offset above 2^63 - 1, at which point the server again
replied OK (code 350), but said it was starting at 2^63 - 1. I didn't try
this on the other one that was ok above 4 GB.

> Then, I figure we can attempt to find the restart position by parsing the
> response string for an exact match of the text you get. It should be pretty
> harmless, as if the full text matches, the position should be correct, even if
> another server would output the text in the exact same way.

The exact match might not work all the time. Here's a few of the 350
response codes that I've gotten from various servers:

350 Restarting at 2147483647. Send STORE or RETRIEVE to initiate transfer.
350 Will attempt to restart at position 2500000000.
350 Restarting at -1794967296. Send STORE or RETRIEVE to initiate transfer.
350 Restarting at 2500000000. Send STORE or RETRIEVE to initiate transfer.
350 Restarting at 50102

(The last one was from the server that returned 501 for offsets above 2
GB, by the way, hence the lower offset request.)

It seems like we might just be able to skip the 350 and then start reading
the next available number. Then it would be a pretty simple thing to see
if it matched our request or not. Although I have no clue as to what the
behavior of servers which report negative offsets would be. I guess it
may not matter -- if the offset reported isn't the same as our request, we
could just reset the request to 2 GB and ask again...

Does that make sense? Also, any parsing like that would probably be best
done by using the data in conn->data->state.buffer after the call to
Curl_GetFTPResponse, correct? I see that done in a couple of other
places...

Thanks,

Dave

-------------------------------------------------------
This SF.net email is sponsored by: IBM Linux Tutorials.
Become an expert in LINUX or just sharpen your skills. Sign up for IBM's
Free Linux Tutorials. Learn everything from the bash shell to sys admin.
Click now! http://ads.osdn.com/?ad_id=1278&alloc_id=3371&op=click
Received on 2003-12-11