Bugs item #3048197, was opened at 2010-08-19 00:59
Message generated for change (Comment added) made by bagder
You can respond by visiting:
https://sourceforge.net/tracker/?func=detail&atid=100976&aid=3048197&group_id=976
Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: ftp
Group: bad behaviour
Status: Open
Resolution: None
Priority: 5
Private: No
Submitted By: catalin (catalinr)
Assigned to: Daniel Stenberg (bagder)
Summary: Incorrect data uploaded in case of CURLE_SEND_ERROR
Initial Comment:
These are my [hopefully right] conclusions:
- When doing an FTP upload, in the progressCallback function the uploadedTillNow argument (last one) holds the correct size for the uploaded data (well, rather "sent data");
- the uploaded data is sent in chunks of at most CURL_MAX_WRITE_SIZE bytes (btw, CURL_MAX_WRITE_SIZE is not documented anywhere and it'd be useful to know that that is the maximum size attempted to be uploaded at one time);
- in case the link is broken/disconnected, the remote destination file gets appended with a chunk of the size reported by progressCallback but the last KB are composed of NULL bits and not the real data. So a subsequent APPE[-nd] to that file will render a file of the same size with the original one, but somewhere in the middle it'll have the wrong NULL bytes.
The workaround for this was to add the size reported by the progressCallback in a totalTillNow variable, and after any network error that would require a resume subtract CURL_MAX_WRITE_SIZE*8 from totalTillNow, and setup a REST upload with the calculated starting point. (See #3048174 for issues with CURLOPT_RESUME_FROM).
I'm not sure if my workaround is the best solution, but it works. OTOH, that behavior when uploading and getting disconnected leads to corrupted files at destination and that is IMO very wrong...
libcurl 7.21.1
msw Vista
mingw-gcc 3.4.5
----------------------------------------------------------------------
>Comment By: Daniel Stenberg (bagder)
Date: 2010-09-12 23:29
Message:
If you want to check how send() works, then I figure you should study a
socket API book or possibly a TCP/IP stack source code. Given the nature of
TCP I think one could make an educated guess.
No, there's no additional way for libcurl to know what data that has been
sent or not (that doesn't involve the remote side sending back an
application layer ack of some sorts).
We're going in circles in this report without any added info. I will close
this soon if nothing else pop up. If you feel like debating source code,
how send() works, what we can do to check what send() returns or what not,
I recommend you read up on the underlying stuff and take the discussions
and questions to the curl-library mailing list. This is the wrong place for
that.
----------------------------------------------------------------------
Comment By: catalin (catalinr)
Date: 2010-09-12 22:15
Message:
How exactly is that "successful completion" determined? Can this be checked
in the code?
Why would you need to rule out a network failure?
Let the network get all the blame for not transmitting the data. What is
important is why it is all counted as correct by the sender.
----------------------------------------------------------------------
Comment By: Daniel Stenberg (bagder)
Date: 2010-09-12 19:51
Message:
libcurl only has one way to know if the data should be counted or not. It
counts all data that send() returns information for, as having been sent
correctly. Send's return value is documented like this:
"Upon successful completion, send() shall return the number of bytes
sent"
In this particular case, it has not been made clear that libcurl thinks
that more data has been sent than what actually were sent as we still
haven't been able to rule out A) a bad behaving server end or B) a bad
"intermediate". It is all just speculations still.
----------------------------------------------------------------------
Comment By: catalin (catalinr)
Date: 2010-09-10 01:20
Message:
My current thought is that curl sends the correct data, but it also counts
it as correctly received too early.
IOW if a network error occurs it will not be accounted for by the progress
callback. Can this be verified? Weather the sent data is also completely
received at destination?
If not, I only suggest that the data reported as "already transmitted" to
not include the current [potentially partial] buffer.
I can't say anymore as I really don't know where these things happen in
the code. If you could point out some places in the sources that sound
relevant to you regarding this issue, it may help understanding it.
----------------------------------------------------------------------
Comment By: Daniel Stenberg (bagder)
Date: 2010-09-09 15:31
Message:
Thanks for reporting this issue and helping us improve curl and libcurl.
We're awaiting feedback in this issue. Due to this, I have set the state
of this issue to pending and it will automatically get closed later on
unless we get further info.
Please consider answering the outstanding questions or providing the
missing info so that we can proceed to resolve this issue!
----------------------------------------------------------------------
Comment By: Daniel Stenberg (bagder)
Date: 2010-09-02 13:12
Message:
Thanks for the details. Unfortunately the log doesn't reveal much more than
what you've already explain. We can see the sending of the data fail, but
we can't see how much curl thinks it sent and how much was actually ACKed
at TCP level (which would be the amount of data it thinks it has sent) or
how much the server thinks it has received and stored.
As for your rant: I've only requested that you provide proper evidence or
logs that there really is a curl problem here - I tried to repeat the this
problem but failed. If you think I'm wrong in doing so, then so be it. You
don't have to agree with me. I still haven't been able to repeat this
problem so I've not been able to detect or track any problems.
If you can repeat this problem on demand, I'd like more logs. Most
probably stracing of libcurl's function calls and a wireshark snoop of the
data transfer TCP stream on the client side would be enough to see who's to
blame for the quirks you experience.
----------------------------------------------------------------------
Comment By: catalin (catalinr)
Date: 2010-09-02 03:25
Message:
File libcurl_3048197.txt is a log from a failing transfer.
Relevant lines are 93-201.
I have implemented a resume so when a failure occurs the connection is
reestablished and the transfer is resumed from a calculated position
(rather than the one reported by libcurl). Some names and the IP have been
replaced using find/replace.
The rest is addressed to the maintainer(s) and actually off topic:
Sorry, is that your answer? "When I don't have time or I don't feel like
investigating a bug I just set it to 'Invalid' because if I don't intend to
do anything about it it can't possibly be a bug" ? And then name it "common
sense"? It's just not worth commenting that one...
I don't really understand your request for me to test what actually could
reproduce this on your end...
Well, doh [childish but really deserved], I have, thus the bug report...
"Let me remind you" that whether it is a libcurl issue or network issue,
libcurl does report that everything was ok at a point where not everything
was ok.
I have been running it for a while, found that it was not working (see
description in previous posts), got a workaround, simplified the process so
I can get a more concise description of what is going on and what seems to
be the cause, tried to find _a way for you to reproduce it_, signaled it
here. I don't remember asking for anything, was I? Even if you want nothing
to do with this wouldn't it be common sense just to be helpful and
constructive? To give some directions where a willing soul could chase
this? I believe I have mentioned that I don't know this code, the lack of
comments is not helping, and when I tried I couldn't find a starting point
for this. On the other hand you seem to be exploiting the comments to just
bail out of this one...
I'll have to apologize if anything I said made you feel offended regarding
your spare time, I believe it was not the case though. I wouldn't afford to
make any such comments about other people's time. It just took me several
days to understand this [wrong] functionality, find a way around it, clean
it up etc. And that was out of my spare time, but being my choice I don't
feel like complaining about _that_, other people must be doing better at
it. Not to mention that (who knows..) you may not be the only one(s)
contributing to open source projects in their spare time... only maybe...
Now, to put an end to this, whether you consider it is worth any of your
spare time or not, just have it your way. I don't see what else I could do
here by myself.
----------------------------------------------------------------------
Comment By: Daniel Stenberg (bagder)
Date: 2010-08-31 10:59
Message:
Your bug report here is based on an assumption from your end. You want me
to put in a lot of work to try to test that assumption to rule it out. I
rather reverse the argument and instead assume that things are good until
you can provide me with something that indicate that curl indeed does
behave wrongly.
If it truly is that easy to setup two systems with a router and then
unplug the router to get this problem to trigger so please proceed and do
exactly that and tell us what happens, preferably with some logs.
Allow me to remind you that curl is an open source project with very
little company backing. You're asking me to my spend spare time to look for
a problem you assume exist in curl. (and yes perhaps it does)
I look for and fix dozens of curl bugs every months already, I say we
scale much better if you do the larger piece of the work here and I'll help
analyse the results of your tests and experiments.
I don't think this is bad practice. I think it is common sense. You're of
course free to disagree.
----------------------------------------------------------------------
Comment By: catalin (catalinr)
Date: 2010-08-31 00:34
Message:
I'm sorry, I can't see how you could get any "more info and points" without
you testing this... I really don't think it is that difficult nowadays to
setup a source and a destination separated by a router and just unplug the
router in the middle of a transfer.
This [lib]curl is a great piece of work but dismissing such issues on a
pure _assumption_ is by any means bad practice.
I believe this is due to network problems, but then curl_progress_callback
should not return the size of data that is not certain to have been
correctly transferred. So what would be wrong there if reporting only the
size of the data that is confirmed to have been correctly transmitted? IOW
to report only the total data transferred so far except the chunk being
transmitted at that point in time.. It'll be a couple of KB less exact, but
then it can never guarantee at any point that the reported numbers are the
final ones anyway, can it?
I don't have the necessary knowledge to debug this nor any urgent need for
it since the workaround described in my initial message is doing ok, nor
the knowledge (..did I already say that?). So I'll at least be happy with
the fact that google indexes this and it will be easily found by any other
having the same issue.
----------------------------------------------------------------------
Comment By: Daniel Stenberg (bagder)
Date: 2010-08-29 00:24
Message:
Well, I'm sorry but unless we get more info and points that indicate that
curl in fact does something wrong for this case I will consider this a
proxy/intermediate/server problem/artifact.
----------------------------------------------------------------------
Comment By: catalin (catalinr)
Date: 2010-08-23 01:39
Message:
I believe the way it happens in my case is because of an intermediate
network element. I can't be 100% sure, but maybe if a router is used
between the source and the destination and the connection is broken by
disconnecting the router, then the same thing would happen. Maybe you
already tried like this...
I don't really know where to start investigating this in the curl code -
i.e. where the numbers in the callback come from...
----------------------------------------------------------------------
Comment By: Daniel Stenberg (bagder)
Date: 2010-08-23 00:24
Message:
I've tried, and I've not seen any zeroes in my broken uploads when using
vsFTPd.
The progress callback gets the amount told that the system calls have
reported were successfully sent. Unless of course there's a bug somewhere
but I've not been able to find any such. Can you?
----------------------------------------------------------------------
Comment By: catalin (catalinr)
Date: 2010-08-22 04:13
Message:
I am trying to make myself understood as best as I can already. Sorry if I
should do more but just fail at it... OTOH I'm not sure I can describe this
any better than I already have, my English is not that good.
"When the connection breaks, libcurl CANNOT [...] check anything else on
the remote site"
What I'm saying is not for libcurl to do more, but for you (a person) to
check the uploaded data after a CURLE_SEND_ERROR occurs. IOW I've asked you
to check for the problem I'm signaling, not to implement something.
Again, consider this a test case: an upload is in progress,
CURLE_SEND_ERROR occurs, transfer is aborted; desired outcome: the uploaded
data is the same as the source (partial, but identical so far); actual
outcome (at my end at least): last part of the data is incorrect (null
bits).
I feel the need to express this yet again: this is not a request for
automatically doing anything, but for reproducing what I'm experiencing.
"libcurl cannot guarantee what the server does, nor can it assume
anything"
Ok, so reading the last part I can only think that on the contrary, the
progressCallback reports the uploaded size _assuming_ it all was correctly
received at destination. But it should probably report only the data that
is confirmed to have been sent correctly so far.
If the upload consists of lets say 5 chunks of CURL_MAX_WRITE_SIZE bytes,
when a call to progressCallback is triggered i.e. while uploading chunk 4
it should only report the bytes sent in the first 3 chunks, and not adding
the [unconfirmed so far] bytes of the 4th chunk, which seems to be done
now.
"1. I can't see any error in libcurl's side"
I'd say the partialUpload reported by progressCallback is not always
correct, see above.
"2. It sounds like bad behavior on the server side "
It may very well be like that, but is the "good behavior" defined in a RFC
or just as a cURL concept?
_If_ cURL does assume that everything sent is also correctly received,
then this is a rather arbitrary call.
If this is an impossible to change fact, it should be better described in
the docs.
"3. you have not presented any way to repeat this problem"
I believe I have, even if not by using a piece of code. If it was not
understood from my previous post maybe this time will be luckier. If still
not, I'll probably give up...
----------------------------------------------------------------------
Comment By: Daniel Stenberg (bagder)
Date: 2010-08-21 21:20
Message:
When the connection breaks, libcurl CANNOT send anything further as the
connection is no more, nor can it check anything else on the remote site as
the connection... broke! Having libcurl try to reconnect just to check the
end of the file in case it got disconnected just previously is completely
out of the question.
Alas, the problem you see at disconnect depends on what the server does on
a disconnect. libcurl cannot guarantee what the server does, nor can it
assume anything. Some servers are likely to act differently than others on
disconnect. Appending zero-bytes to the file does sound like a case of bad
behaviour ON THE SERVER END.
1. I can't see any error in libcurl's side
2. It sounds like bad behavior on the server side
3. you have not presented any way to repeat this problem
I can't see what libcurl can do about this.
Anyone who decides to append data to an existing file because it got
aborted in a previous upload attempt may of course consider to check the
end of the file to see that the end looks OK before blindly appending more
data to it. libcurl will not do that automatically though but does provide
the powers to get the data etc.
----------------------------------------------------------------------
Comment By: catalin (catalinr)
Date: 2010-08-21 07:02
Message:
I'm sure I fail to see a lot more than you do, but I'm just signaling what
looks like bad behavior to me. Maybe you can find a way to try and
reproduce this as I don't think I can make a sample program that will make
a server disconnect (or an ISP to interrupt it etc), can I?
Of course that ftp server (comes with a BusyBox linux on a NAS device) may
be broken but I have some doubts about that and IMO it's worth
investigated...
The error received at my end was CURLE_SEND_ERROR and IIRC once I also got
CURLE_RECV_ERROR (although only uploads were being done, but maybe it was
about receiving some response from the server).
A long-shot interpretation would be that curl sends the size of the packet
being uploaded, but only part of the actual data gets to destination.
Again, I may be far away with my guess...
I don't think those zeroes are exactly random neither... Comparing the
source and destination files, the difference is made by the zeroes in the
destination file, not some _random_ bits being there.
Maybe a shorter way would be to get an ftp upload to break with
CURLE_SEND_ERROR and then check the end of the uploaded file part?
HTH
----------------------------------------------------------------------
Comment By: Daniel Stenberg (bagder)
Date: 2010-08-19 15:15
Message:
I don't see how libcurl sends any data as zeroes. Also, I fail to see how
it could send that block of zeroes if the connection disconnected?
To me it sounds like your server behaves oddly and add random data to the
file being written at the time of the disconnect. I don't think libcurl can
do anything about it.
If this is not the case, can you please clarify your point for us?
----------------------------------------------------------------------
You can respond by visiting:
https://sourceforge.net/tracker/?func=detail&atid=100976&aid=3048197&group_id=976
Received on 2010-09-12