cURL / Mailing Lists / curl-library / Single Mail

curl-library

Re: Poor HTTP POST upload performance

From: Bryan Christ <bryan.christ_at_gmail.com>
Date: Mon, 22 Jun 2015 10:12:56 -0500

On Mon, Jun 22, 2015 at 2:31 AM, Ray Satiro <raysatiro_at_yahoo.com> wrote:

> On 5/20/2015 11:35 AM, Bryan Christ wrote:
>
> Ray,
>
> Here is a sample program that illustrates the problem. I tested the
> performance with an 11MB file. This sample program consistently takes
> 11-17 seconds to complete. If I upload the same file through Firefox, it
> takes about 4.5 seconds.
>
> http://www.mediafire.com/view/9rb4aac4hnhma47/curl_filedrop_upload.c
>
>
> On Tue, Apr 14, 2015 at 9:14 PM, Bryan Christ <bryan.christ_at_gmail.com>
> wrote:
>
>> Ray,
>>
>> Thanks for the reply. It would be quite difficult to create an
>> isolated test case due the inherit cost of setting up a RESTful POST to the
>> server.
>>
>> The problem is very much reproducible. Several users have reported
>> this issue. It's not hard to see the problem if you download and build
>> the MediaFire FUSE client over at GitHub.
>>
>> I have seen the notes about the TODO item and I have seen the posts
>> that seem to regard this as a SFTP only problem. I suspect that if I build
>> libcurl from source and change the define, the performance will go. If
>> that be the case, would you accept a patch for curl_easy_setopt() to allow
>> this to be configured at run-time?
>>
>> As for the server, it doesn't support compression or the user-agent
>> header. Also, I have a direct connection to our data center. Those are
>> definitely not issues.
>>
>>
>> On 4/13/2015 10:01 PM, Bryan Christ wrote:
>> > I've been trying to figure out why http POST uploads are so slow with
>> > libcurl. Upload speeds continually perform at about 1/10th of the
>> > expected performance (or less). Many users have reported this behavior
>> > on our forum. I suspect it has a lot to do with CURL_MAX_WRITE_SIZE
>> > being set to 16k. Uploads to these same servers through other means
>> > (JavaScript for example) reach their expected throughput. The code in
>> > question can be seen here:
>> >
>> > https://github.com/MediaFire/mediafire-fuse/blob/master/utils/http.c
>> > (at approx line 314)
>> >
>> > Assuming the issue is the 16K buffer limit, are there any other
>> > options? Asking users to recompile a custom libcurl with a larger
>> > buffer size is not very palatable.
>>
>> If you want help on the list your best bet is a self contained example
>> that can be used to reproduce and the details at [1]. The buffer issue
>> is in the TODO [2] but from what I see there and elsewhere the
>> significance is SFTP related.
>>
>> Continually or continuously? Is it 100% reproducible? A few ideas:
>> Maybe your uploads are compressed when they go through the browser, but
>> they are not compressed when uploaded through libcurl. There is no
>> compression built in libcurl upload (as far as I know), you would have
>> to do it manually and attach the header for content encoding gzip.
>> A different user agent (or lack of one -- the default) when you use
>> libcurl causes different treatment by the server. This applies to any
>> header, really.
>> The IP address returned to the browser is different than the IP address
>> returned to curl via DNS because the DNS request was made differently.
>> Or you just got a different IP address because they rotate.
>> I/O in your program. eg posting a FILE but the I/O is backed up.
>> Proxy setting is different in the browser than it is in libcurl.
>>
>> Once you have a way to reproduce try using the curl tool and see if you
>> get the same result. Also try the latest version.
>>
>>
>> [1]: http://curl.haxx.se/docs/bugs.html#What_to_report
>> [2]: http://curl.haxx.se/docs/todo.html#Modified_buffer_size_approach
>>
>>
>> On Tue, Apr 14, 2015 at 3:52 AM, Aleksandar Lazic <al-curllibrary_at_none.at
>> > wrote:
>>
>>> Dear Bryan
>>>
>>> Am 14-04-2015 04:01, schrieb Bryan Christ:
>>>
>>> I've been trying to figure out why http POST uploads are so slow with
>>>> libcurl. Upload speeds continually perform at about 1/10th of the expected
>>>> performance (or less). Many users have reported this behavior on our forum.
>>>> I suspect it has a lot to do with CURL_MAX_WRITE_SIZE being set to 16k.
>>>> Uploads to these same servers through other means (JavaScript for example)
>>>> reach their expected throughput. The code in question can be seen here:
>>>>
>>>> https://github.com/MediaFire/mediafire-fuse/blob/master/utils/http.c
>>>> [1] (at approx line 314)
>>>>
>>>> Assuming the issue is the 16K buffer limit, are there any other
>>>> options? Asking users to recompile a custom libcurl with a larger buffer
>>>> size is not very palatable.
>>>>
>>>
>>> Which version of libcurl is used?
>>>
>>> http://curl.haxx.se/libcurl/c/curl_version.html
>>>
>>> BR Aleks
>>>
>>
>
>
> Usually I limit what I quote but it's been a while so I've left most of it
> to give everyone enough context. It would be helpful in the future if you
> would not top post!
>
> Bryan, thanks for the example. I had some SSL problem with mediafire a
> while ago (I may have pinged you about that) and I put this on hold. The
> SSL issue now solved and the release out I have tested the example.
> Unfortunately the TL;DR here is I cannot reproduce what you describe in any
> of several versions curl and openssl in Ubuntu 14.04 LTS x64.
>
> I tried fully updated and also went back 6 months:
> Linux ubuntu1404-x64-vm 3.13.0-45-generic #74-Ubuntu SMP Tue Jan 13
> 19:36:28 UTC 2015 x86_64 x86_64 x86_64 GNU/Linux
> Linux ubuntu1404-x64-vm 3.13.0-55-generic #94-Ubuntu SMP Thu Jun 18
> 00:27:10 UTC 2015 x86_64 x86_64 x86_64 GNU/Linux
>
> For the upload file I used 10MB of random data [1].
>
> Tested 5 times each, always shared libs:
> - packaged libcurl 7.35 and OpenSSL 1.0.1f (libcurl4-openssl-dev)
> - libcurl 7.41.0 and OpenSSL 1.0.1f
> - libcurl 7.44.0-DEV (master f44b803 2015-06-21) and OpenSSL 1.0.2c
> The average was 19 seconds and the maximum deviation half a second from
> that.
>
> Tested 5 times each (upload was slightly different, more on that later):
> - Firefox 31 ESR
> - Firefox 35.0.1
> - Firefox 38.0
> The average was 33 seconds (from the time it started to the time it went
> to 100% 'Waiting for notification') and the maximum deviation ~2 seconds.
> Also I tried in Windows 7 x64 for comparison with 31ESR and Nightly and the
> average was the same.
>
> So as you can see Firefox was slower by a significant amount for me. I
> used debugging proxy Fiddler [2] to investigate why Firefox was taking
> longer and it looks like it's because when uploading through the browser
> mediafire js breaks the file into chunks of 1MB: upload 1MB (takes about ~2
> sec), wait for confirmation, and repeat.
>
> X-Filename: block0.rng
> X-Filesize: 10485767
> X-Filehash:
> 117ff99fec590418e5880512895940ed86f05ac2b2cc25d1dbd888083f632ab0
> X-Unit-Id: 0
> X-Unit-Hash:
> 1fd1ad3c9c83baa7f4005edef10834dd60087588398b6844f4cb2b4e283430ba
> X-Unit-Size: 1048576
>
> Also, there is one thing I had to do differently when uploading via
> Firefox. Because mediafire uses deduplication I could not upload the random
> data file (block0.rng) as is. When I tried to do that it wouldn't work
> because mediafire's javascript sends the hash of the file to the server and
> since it already exists it the data isn't actually sent, even if I delete
> all references to the file. Instead what I did was append a few random
> bytes to the end of the file before each upload so that I would have a
> unique hash for each upload:
> dd if=/dev/urandom count=10 bs=1 >> block0.rng
>
> I wonder if what you are seeing has something to do with deduplication? I
> would use a debugging proxy and wireshark or something to see if all the
> data is actually uploaded to the server and get a better idea what is
> happening. Also I'd try the random data file at [1] for both curl and
> Firefox and see if your results are different. If you still can't figure
> out the problem then next I'd try what you were going to do to change the
> size of the curl buffer from 16384. Also play with SO_SNDBUF.
>
> About your example, please note if you post more than 2GB, use
> CURLOPT_POSTFIELDSIZE_LARGE. It uses a curl_off_t which is 64 bits if you
> have a platform that supports it and large file support enabled. Right now
> you are using CURLOPT_POSTFIELDSIZE which is documented to take a long [3]
> not uint64_t and long size will vary. Another thing is you have your read
> callback returning size * ret which is wrong even though it works (because
> libcurl passes size as 1 but that's undocumented I believe so it could
> change). What I would return is ret if it's valid or in this case the abort
> if not. See CURLOPT_READFUNCTION [4] for more.
>
> For anyone else who read through all this and is curious enough to try
> Bryan's example [5] you have to create a mediafire account (free) and get a
> filedrop key. It took me a while to figure out what filedrop_key was. I
> searched the API docs but I never found it would say where to get one. What
> you do is click the Create Folder button to the right of the upload button
> and choose 'Make this folder a FileDrop'. Then there is a popup 'This
> folder is not enabled as a FileDrop. Would you like to enable the folder as
> a FileDrop? ' click OK for that. Then it will show 'Create Customized
> FileDrop'. Scroll down to 'Deploy Your FileDrop' and the filedrop_key is in
> 'Hosted FileDrop' after drop=. Replace the MF_FILEDROP key in the example
> with that one. Click Save and Close on the filedrop.
>
>
> [1]: http://www.rngresearch.com/download/block0.rng
> [2]: http://www.telerik.com/fiddler
> [3]: http://curl.haxx.se/libcurl/c/CURLOPT_POSTFIELDSIZE.html
> [4]: http://curl.haxx.se/libcurl/c/CURLOPT_READFUNCTION.html
> [5]: http://www.mediafire.com/view/9rb4aac4hnhma47/curl_filedrop_upload.c
>
>
Ray,

First, let me say thanks for taking the time to look into this. Sorry
about the reply at the top. That's standard GMail behavior. I'm appending
this reply manually and hopefully it will format right on your end. Here
are some comments in no particular order:

You are correct about the JS uploader chopping the file into chunks. For
the purposes of my test, I disabled that behavior. You can go to Account
Settings -> Upload Options and turn off the Flash and HTML uploaders. This
is the only way to get comparable results. I think you will then find
Firefox faster.

Every test does in fact require new, unique data. When the hash is sent we
will always try to perform an "instant" upload for the user. Even when a
user moves a file into the trash we are still aware of its existence until
they purge the "trash can".

I still plan to modify the curl buffer from 16384 to see if performance
improves. I have already tried adjusting SO_SNDBUF but that didn't seem to
matter much. Perhaps I'll revisit that test code again and make sure I
coded it properly.

I am aware of the differences between CURLOPT_POSTFIELDSIZE_LARGE
and CURLOPT_POSTFIELDSIZE. I initially wrote the code for the purposes of
sample code for this email thread but then added it to the project as an
example for anyone else who would like to code up an upload. It's probably
a good idea to switch it in the project or at least add some commenting so
that no one is lead astray on large files.

For those of you who might care, version 1.5 of the MF API will provide a
means for account owners to programatically configure their file-drops.

-------------------------------------------------------------------
List admin: http://cool.haxx.se/list/listinfo/curl-library
Etiquette: http://curl.haxx.se/mail/etiquette.html
Received on 2015-06-22