Bugs item #1779054, was opened at 2007-08-22 00:38
Message generated for change (Comment added) made by bagder
You can respond by visiting:
https://sourceforge.net/tracker/?func=detail&atid=100976&aid=1779054&group_id=976
Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: ftp
Group: bad behaviour
Status: Open
Resolution: None
Priority: 6
Private: No
Submitted By: Kris (tinker105)
Assigned to: Daniel Stenberg (bagder)
Summary: very long ftp replies confuse curl
Initial Comment:
On an upload using either the curl command line or a client using the "easy" API, if the FTP reply to the STOR command exceeds about 16K (e.g. due to a custom server including debug info), curl will stop reading it and will subsequently falsely report that the connection is dead. The rest of the long reply will be emitted (if verbose set) after subsequent commands and curl will create a new connection (to perform a 2nd transfer).
I tried to fix the problem by doubling the size of UrlState.buffer and the (BUFSIZE) limit checks in ftp_readresp and Curl_GetFTPResponse but encountered a different "reckoned dead" problem, later, after the 2nd STOR, this one emitted from Curl_retry_request.
> curl --version
curl 7.16.4 (sparc-sun-solaris2.6) libcurl/7.16.4 OpenSSL/0.9.8a zlib/1.1.3
Protocols: tftp ftp telnet dict ldap http file https ftps
Features: Largefile NTLM SSL libz
----------------------------------------------------------------------
>Comment By: Daniel Stenberg (bagder)
Date: 2007-09-05 23:45
Message:
Logged In: YES
user_id=1110
Originator: NO
Yet another fix has now been committed...
----------------------------------------------------------------------
Comment By: Kris (tinker105)
Date: 2007-09-05 22:19
Message:
Logged In: YES
user_id=1437892
Originator: YES
OK, I've solved my (2nd) problem but don't know if it'll help you. My
client program transfers the same file twice and I forgot to "rewind"
(fseek) the fd between the 2 uploads. :-{
BTW, I think my debug symbols problem was that I was using CC for my main
program and gcc for the curl build. :-[
----------------------------------------------------------------------
Comment By: Daniel Stenberg (bagder)
Date: 2007-09-05 22:14
Message:
Logged In: YES
user_id=1110
Originator: NO
I've found a bug in the FTP response function and on a system that showed
the bug a quick fix cured the problem. I'm now working on a nice fix for it
and I hope to commit it within a few hours. I'll get back.
----------------------------------------------------------------------
Comment By: Dan Fandrich (dfandrich)
Date: 2007-09-05 19:13
Message:
Logged In: YES
user_id=236775
Originator: NO
Try building and using the static version of the library only, i.e.
configure with --disable-shared --enable-debug
----------------------------------------------------------------------
Comment By: Kris (tinker105)
Date: 2007-09-05 17:26
Message:
Logged In: YES
user_id=1437892
Originator: YES
I haven't gotten very far trying to figure this bug out. I looked at the
terms in Curl_retry_request that cause curl to reckon the connection as
dead but don't know enough about how the code works to know if the
(differing)values make sense or not (e.g. bytecount == 0, reuse == 1).
I also couldn't figure out how to create a libcurl.so with debug info (on
Solaris w/ gcc). For example, I tried manually adding -g to the link line
but that didn't help. A 'file' says the library isn't stripped but both
dbx and gdb tell me there are no debugging symbols. Any ideas?
----------------------------------------------------------------------
Comment By: Daniel Stenberg (bagder)
Date: 2007-09-03 13:40
Message:
Logged In: YES
user_id=1110
Originator: NO
Yeah, it seems Curl_GetFTPResponse() sometimes fail to get the end of the
response and then it times out and marks the connection as "dead"...
Can you single-step through it in one of these situations and see if you
get any bright ideas about what's going on? The test cases work for me all
the time and I've tried a couple of things on "live" servers as well
without being able to repeat this problem.
You're not alone with this problem though, as it seems my attempts to fix
your bug made this problem appear on more places. See
http://curl.haxx.se/mail/lib-2007-08/0460.html
----------------------------------------------------------------------
Comment By: Kris (tinker105)
Date: 2007-08-31 16:41
Message:
Logged In: YES
user_id=1437892
Originator: YES
Well, I've got good news and bad news. The good news is that the initial
problem is solved. The bad news is that curl now reckons the connection
dead during the 2nd put. This looks like the same problem I described in
my initial post (after hacking the code). Here's some verbose output:
[...]
< 226- %WORLD-S-XWOK, A request was successfully completed.
< 226 ***** Xfer finished ***** code=<0>
* Connection #0 to host sthdev2 left intact
* Re-using existing connection! (#0) with host sthdev2
* Connected to sthdev2 (10.1.4.7) port 1985 (#0)
[...]
> EPSV
* Connect data stream passively
< 229 Entering Extended Passive Mode (|||63051|)
* Trying 10.1.4.7... * connected
* Connecting to 10.1.4.7 (10.1.4.7) port 63051
> STOR confirm,dbn=kk33,loadid=kk9
< 150 Opening BINARY mode data connection.
* Connection died, retrying a fresh connect
< 226- %L-I-XLDINFO, \Load op: started Fri Aug 31 10:29:49 2007\
< 226- %WORLD-S-XWOK, A request was successfully completed.
< 226 ***** Xfer finished ***** code=<0>
> QUIT
< 221 Goodbye.
* Closing connection #0
* Issue another request to this URL:
'ftp://sysadm:sysadm@sthdev2:1985/confirm,dbn=kk33,loadid=kk9'
* About to connect() to sthdev2 port 1985 (#0)
* Trying 10.1.4.7... * connected
* Connected to sthdev2 (10.1.4.7) port 1985 (#0)
< 220 sthdev2 FTP server (Version 3.3) ready.
> USER xxx
[... 2nd put is re-issued (and succeeds) even though it succeeded above]
----------------------------------------------------------------------
Comment By: Daniel Stenberg (bagder)
Date: 2007-08-30 16:07
Message:
Logged In: YES
user_id=1110
Originator: NO
ok, I now committed my take at unifying the code so that there's now only
one place that do the actual response reading logic so this should
hopefully finally address this problem...
I didn't create a new test case but since they all now use the same
function for reading the responses I think the existing ones should be good
enough. The test suite runs fine for me and I've tested the binary on a few
servers in the wild!
----------------------------------------------------------------------
Comment By: Daniel Stenberg (bagder)
Date: 2007-08-30 14:32
Message:
Logged In: YES
user_id=1110
Originator: NO
Eeek, yes! :-O
I'll work up a test case and see if I can unify these functions better.
This latter one being a left-over from libcurl's blocking days while the
other is for non-blocking operations but I should be able to do this in a
nicer way...
----------------------------------------------------------------------
Comment By: Kris (tinker105)
Date: 2007-08-28 21:22
Message:
Logged In: YES
user_id=1437892
Originator: YES
OK, I stuck some debug code in and see part of the problem - the previous
fixes are in a different routine. It looks like Curl_GetFTPResponse is
used to handle a STOR.
----------------------------------------------------------------------
Comment By: Kris (tinker105)
Date: 2007-08-28 20:43
Message:
Logged In: YES
user_id=1437892
Originator: YES
To answer badger's question - Yes, the server might be sending the lines
"slowly" (and thus perline might usually be zero, preventing the
"clipamount" logic around ftp.c line 457 from triggering).
----------------------------------------------------------------------
Comment By: Dan Fandrich (dfandrich)
Date: 2007-08-28 19:53
Message:
Logged In: YES
user_id=236775
Originator: NO
Increasing the size of that data causes it to hit a limit in the test
suite--not libcurl itself. The libcurl solution should be general.
----------------------------------------------------------------------
Comment By: Kris (tinker105)
Date: 2007-08-28 19:36
Message:
Logged In: YES
user_id=1437892
Originator: YES
FWIW, you may want to make it clear in tests/README that 'make test' does
something other than just run all the tests.
I don't know if I've recreated _my_ failure, but by making the long
response in test1005 just a few lines longer (to about 18K), I did get an
error. Here's the last half of stderr1005:
[...]
13:12:34.567975 > PWD
13:12:34.570934 < 257 "/nowhere/anywhere" is current directory
13:12:34.571474 * Entry path is '/nowhere/anywhere'
13:12:34.572265 > CWD path
13:12:34.575249 * FTP response reading failed
% Total % Received % Xferd Average Speed Time Time Time
Current
Dload Upload Total Spent Left
Speed
0 0 0 0 0 0 0 0 --:--:-- --:--:-- --:--:--
013:12:34.580183 * Closing connection #0
curl: (56) FTP response reading failed
----------------------------------------------------------------------
Comment By: Dan Fandrich (dfandrich)
Date: 2007-08-28 18:55
Message:
Logged In: YES
user_id=236775
Originator: NO
You need to run 'make test' to build the test support code before you can
execute runtests.pl manually.
----------------------------------------------------------------------
Comment By: Kris (tinker105)
Date: 2007-08-28 18:42
Message:
Logged In: YES
user_id=1437892
Originator: YES
I tried to run test 1006 yesterday but got an error and couldn't figure
out how to fix it:
> ./runtests.pl -v 1006
********* System characteristics ********
* curl 7.17.0-20070827 (sparc-sun-solaris2.6)
* libcurl/7.17.0-20070827 OpenSSL/0.9.8a zlib/1.1.3
* Features: Largefile NTLM SSL libz
* Host: rmdev1
* System: SunOS rmdev1 5.6 Generic_105181-16 sun4u sparc
SUNW,Ultra-Enterprise
* Server SSL: OFF
* libcurl SSL: ON
* libcurl debug: OFF
* valgrind: OFF
* HTTP IPv6 OFF
* FTP IPv6 OFF
* HTTP port: 8990
* FTP port: 8992
* FTP port 2: 8995
* TFTP port: 8997
* SCP/SFTP port: 8999
* SOCKS port: 9000
* SSL library: OpenSSL
* Libtool lib: ON
*****************************************
startnew: perl -I. ./ftpserver.pl --pidfile .ftp.pid -s "." --port 8992
open2: exec of ./server/sockfilt --port 8992 --logfile log/sockctrl.log
--pidfile .sockfilt.pid failed at ./ftpserver.pl line 161
Failed to start sockfilt! at ./ftpserver.pl line 173.
startnew: Warning: child process has died
RUN: failed to start the FTP server
test 1006 SKIPPED: failed starting FTP server
TESTFAIL: No tests were performed
TESTDONE: 1 tests were considered during 4 seconds.
TESTINFO: 1 tests were skipped due to these restraints:
TESTINFO: "failed starting FTP server" 1 times (1006)
----------------------------------------------------------------------
Comment By: Daniel Stenberg (bagder)
Date: 2007-08-28 09:50
Message:
Logged In: YES
user_id=1110
Originator: NO
ok, any chance you can help us by trying to craft up a test case that
repeats your problem? I can't see how your case is different than the ones
we've fixed recently...
See test case 1003, 1005 and 1006 and use as basis for an attempt. They
are all doing different tests with very long ftp server responses.
Your server isn't responding this in a very slow manner, is it?
The "left intact" message means that libcurl aborted the transfer and left
the connection "alive". And yes, leaving such a connection alive assuming
it is okay will mess up the subsequent reconnection quite a bit so that is
a plain wrong action.
----------------------------------------------------------------------
Comment By: Kris (tinker105)
Date: 2007-08-27 16:59
Message:
Logged In: YES
user_id=1437892
Originator: YES
Still no joy here (using the 8/24 snapshot). I'm including some snippets
of the verbose output to show what's happening. To elaborate, after about
16K of reply, curl stops reading the reply, emits the "left intact" msg,
there's a short delay, then it emits the "dead" msg, the "QUIT", and (most
of) the remainder of the reply. It then establishes a new connection to
handle the 2nd put in the script.
[...]
> EPSV
* Connect data stream passively
< 229 Entering Extended Passive Mode (|||57707|)
* Trying 10.1.4.7... * connected
* Connecting to 10.1.4.7 (10.1.4.7) port 57707
> TYPE I
< 200 Type set to I.
> STOR myfile
< 150 Opening BINARY mode data connection.
< 226- %L-I-XLDINFO, \Xfer started Mon Aug 27 10:16:45 2007\
[... about 16K bytes of reply, then]
< 226- %L-I-XLDINFO, \ C000604 (*:*+19) INTEXT(20) TERM(:)\
< 226- %L-I-XLDINFO, \ C000605 (*:*+19) INTEXT(20) TERM(:)\
* Connection #0 to host sthdev2 left intact
* Connection #0 seems to be dead!
> QUIT
< FO, \ C000606 (*:*+19) INTEXT(20) TERM(:)\
< 226- %L-I-XLDINFO, \ C000607 (*:*+19) INTEXT(20) TERM(:)\
< 226- %L-I-XLDINFO, \ C000608 (*:*+19) INTEXT(20) TERM(:)\
[... about 100 more lines]
< 226- %WORLD-S-XWOK, A request was successfully completed.
< 226 ***** Xfer finished ***** code=<0>
* Closing connection #0
* About to connect() to sthdev2 port 1985 (#0)
* Trying 10.1.4.7... * connected
* Connected to sthdev2 (10.1.4.7) port 1985 (#0)
< 220 sthdev2 FTP server (Version 3.3) ready.
> USER xxxx
[... 2nd PUT in script succeeds OK]
----------------------------------------------------------------------
Comment By: Daniel Stenberg (bagder)
Date: 2007-08-24 16:01
Message:
Logged In: YES
user_id=1110
Originator: NO
Committed now. tinker, can you please try again?
----------------------------------------------------------------------
Comment By: Daniel Stenberg (bagder)
Date: 2007-08-24 15:38
Message:
Logged In: YES
user_id=1110
Originator: NO
Great Dan! I've used this now to nail the problem and I have a working
version now, I'll just polish it up a bit and commit soon...
----------------------------------------------------------------------
Comment By: Dan Fandrich (dfandrich)
Date: 2007-08-24 02:01
Message:
Logged In: YES
user_id=236775
Originator: NO
File Added: test1006
----------------------------------------------------------------------
Comment By: Dan Fandrich (dfandrich)
Date: 2007-08-24 01:45
Message:
Logged In: YES
user_id=236775
Originator: NO
I've added test case 1005 that tests many short lines from the server, and
it passes, too. However, if you shorten the reply in test case 1005 by 583
characters, then the test hangs. curl hits a boundary condition when it
truncates the long line, and ends up parsing a line with a code of
"250250", which is an unexpected code.
----------------------------------------------------------------------
Comment By: Daniel Stenberg (bagder)
Date: 2007-08-23 22:33
Message:
Logged In: YES
user_id=1110
Originator: NO
Oh well, I found it obvious how it was a problem with a single line that
long, with multiple lines I don't exactly see how it is a problem in the
code...
Any chance you can offer me access to such a server setup to debug
against?
----------------------------------------------------------------------
Comment By: Kris (tinker105)
Date: 2007-08-23 18:22
Message:
Logged In: YES
user_id=1437892
Originator: YES
No, it isn't all one line - it's a standard "long reply" with a large
number of normal-length lines. I downloaded and tried the latest CVS
(7/23) and the problem is still there.
----------------------------------------------------------------------
Comment By: Daniel Stenberg (bagder)
Date: 2007-08-22 16:18
Message:
Logged In: YES
user_id=1110
Originator: NO
Thanks, this is now fixed in CVS and test case 1003 was added to verify
it.
----------------------------------------------------------------------
Comment By: Daniel Stenberg (bagder)
Date: 2007-08-22 09:37
Message:
Logged In: YES
user_id=1110
Originator: NO
Is that >16K as in one single line with only CRLF at the end of the huge
chunk?
----------------------------------------------------------------------
You can respond by visiting:
https://sourceforge.net/tracker/?func=detail&atid=100976&aid=1779054&group_id=976
Received on 2007-09-05