curl-users
Re: Réf. Re: UTF-8 encoding during transfer
Date: Fri, 24 Sep 2010 11:58:32 -0600
At 11:13 +0200 9/24/10, vincent.preau_at_bnpparibas.com wrote:
Thanks for your answer and sorry for the late reply.
Actually I don"t really know what to understand when you ask if the files are the same length at each end...
---------- Message transféré ----------
De : "Doug McNutt" <<mailto:douglist_at_macnauchtan.com>douglist_at_macnauchtan.com>
Date : 2010 9 16 00:17
Are the files the same length at each end? Utf-8 converted to one of the Windoze extended ASCII would get shorter.
Also, FTP, in ACSII or TEXT mode, can attempt to change line ends from 0A to 0D0A for Windows and the rest of the internet. That also changes the length of files. I doubt that curl would use FTP TEXT mode unless you worked at it.
UTF-8 encoding usually requires 3 bytes for storage of a single grapheme. curl will simply send off the 3 bytes without changing anything.
FileZilla, it has been suggested, makes an attempt to be intelligent about transferring between divers operating systems. It might convert line ends in a way that makes a single line end into two characters, hex 0D0A. It might also convert well known three-character UTF-8 graphemes into Windows or Apple equivalents that would be only a single byte in the "high ASCII" range between 128 and 255 (10) which are not allowed in UTF-8. The FTP protocol in TEXT mode is well known for problems associated with conversion of line ends that appear accidentally in binary files.
Comparing the lengths of the various files using directory listing software, ls -l, would help to identify the source of problems. When you send one byte to represent three the file will get shorter.
-- --> If it's not on fire it's a software problem. <-- ------------------------------------------------------------------- List admin: http://cool.haxx.se/list/listinfo/curl-users FAQ: http://curl.haxx.se/docs/faq.html Etiquette: http://curl.haxx.se/mail/etiquette.htmlReceived on 2010-09-24