curl-library
Re: CURLOPT_READFUNCTION performance issue
Date: Tue, 23 Jul 2013 12:50:08 -0700
Hi Daniel,
Thanks for your reply and looking into the issue! I apologize if this
doesn't continue the email thread properly on the list. I accidentally
signed up for the digest mail, but I've updated my subscription now :)
> So this version pair on Linux has problems while not on Mac? And if you run
another version set on your Mac, you get the perceived problems?
I haven't tried using different versions of curl or PHP on my Mac. But
I can give it a shot.
> Which versions on Mac are fine ?
PHP 5.3.15 and cURL 7.21.4
> Doing transfer performance measurements on 10 bytes is going to be very shaky
and unreliable. Send 10 million bytes or something and you can start getting
something to measure!
I think that would be true if I was only sending a couple requests,
but I'm sending a bunch of requests and then using the average upload
and download speeds (I've taken the average of sending 100 and the
average of sending 1000, both of which are poor).
> Also, I'm not convinced both ways will count the numbers exactly the same
internally since the postfields approach will send the body as part of the
initial request send.
Agreed. In my test script, I also start a timer before running the
test to see how long it takes. My tests show that using POSTFIELDS is
about 300% faster than using CURLOPT_READFUNCTION (e.g 0.7 secs/50
reqs vs 2 secs/50 reqs).
> It isn't exactly following best practices when it comes to using libcurl's API
but I doubt it matters a lot in your case. (It is written to use older
libcurl, and it has no timeout in the curl_multi_select use and if
curl_multi_exec returns something else than OK it'll busy-loop etc.)
This was meant to quickly illustrate the problem. (just FYI, it does
use a default timeout of 1 second for the select call). I maintain a
very popular PHP HTTP client library that runs on cURL. I'd be very
curious if you could tell me what else is wrong with this loop so that
I can make sure I'm doing the right thing.
> I converted your test case to plain C and used the plain easy API instead
Sweet! I should mention that I tested using multi and easy handles
through PHP and I saw the same performance issues, so this is fine.
> Can you modify this test code to show the differences you saw?
I'll give it a shot and try running it against the node.js server.
Thanks,
Michael
---- Message: 4 Date: Tue, 23 Jul 2013 20:18:28 +0200 (CEST) From: Daniel Stenberg <daniel_at_haxx.se> To: libcurl development <curl-library_at_cool.haxx.se> Subject: Re: CURLOPT_READFUNCTION performance issue Message-ID: <alpine.DEB.2.00.1307231345210.21425_at_tvnag.unkk.fr> Content-Type: text/plain; charset="us-ascii"; Format="flowed" On Mon, 22 Jul 2013, Michael Dowling wrote: > I've noticed that there appears to be a significant performance hit when > using CURLOPT_READFUNCTION. This issue seems to be platform dependent as > I've only been able to get poor performance on Linux (Amazon Linux m1.large > 64-bit ami-0358ce33) across multiple versions of cURL and PHP. I've not seen > any performance issues on my Mac running PHP 5.3.15 and cURL 7.21.4. Hi Michael, Thanks for your email and detailed report. I have some troubles to sort it all out, and the many levels of different software with unknown behaviours doesn't really make things easier. Let me start out with a bunch of questions... So this version pair on Linux has problems while not on Mac? And if you run another version set on your Mac, you get the perceived problems? Which versions on Mac are fine ? > When sending PUT requests containing a 10 byte body (testing123) to a node.js > server (others have reported issues with Jetty as well) using > CURLOPT_READFUNCTION, the read and write times returned from > CURLINFO_SPEED_UPLOAD and CURLINFO_SPEED_DOWNLOAD are very poor: ~833 upload and > 1333 download. Doing transfer performance measurements on 10 bytes is going to be very shaky and unreliable. Send 10 million bytes or something and you can start getting something to measure! Also, I'm not convinced both ways will count the numbers exactly the same internally since the postfields approach will send the body as part of the initial request send. I suggest you use an external measuring method! > I wrote a very simple test script that demonstrates the performance issue: > https://gist.github.com/mtdowling/6059009. You'll need to have a node.js > server running to handle the requests. I've written up a simple bash script > that will install PHP, node.js, start the test server, and run the > performance test: https://gist.github.com/anonymous/6059035. I would really prefer to have a test case without anything at all required than just a libcurl-using appliction in the client side. I don't want PHP in there, it makes my life far too complicated and things are much harder to follow. For the server side, we can just send it to whatever that can just eat what we send to it. > Thinking that this might be an issue with a specific version of cURL or PHP, > I manually compiled different versions of PHP and cURL and ran the > performance tests. There was no improvement using the version combination I > had success with on my mac or using the latest version of cURL (7.31) and > PHP (5.5.1). This does not appear to be version dependent. Here are the > results of that testing: > https://github.com/guzzle/guzzle/issues/349#issuecomment-21284834 For the plain HTTP (without SSL) POST case, there's basically no difference between the Mac and the Linux version. They run the same code. But if you saw a machine specific difference, then surely you'd see the same differences even when you run other versions. > I ran strace on the PHP script and found that using CURLOPT_POSTFIELDS > appears to send the headers and the entire payload before receiving anything > from the server, while CURLOPT_READFUNCTION appears to send the request > headers, receive the response headers, then sends the body afterwards. Yes, and that seems quite natural to me. If you send a small POST with POSTFIELDS, you will then get away with less system calls and less checking on the socket as everything is sent off in a single go. When using the callback approach, we don't have the data around so it has to be split up in multiple writes. > The loop used to execute the curl_multi handles is very simple and can be > found in the test script at > https://gist.github.com/mtdowling/6059009#file-readfuction_perf-php-L5. It isn't exactly following best practices when it comes to using libcurl's API but I doubt it matters a lot in your case. (It is written to use older libcurl, and it has no timeout in the curl_multi_select use and if curl_multi_exec returns something else than OK it'll busy-loop etc.) I converted your test case to plain C and used the plain easy API instead [1], and then I had it send the POST 10000 times and measured how long time it took on my old laptop, sending the data to the curl test suite's HTTP server (which certainly isn't in any way a fast server implementation). The response to the request is very small, just a bunch of headers and a couple of bytes of body. My results contradict your results quite significantly: $ time ./debugit real 0m9.412s user 0m1.752s sys 0m1.732s $ time ./debugit 1 runs fixed string version real 0m9.457s user 0m1.528s sys 0m1.712s Roughly 1000 requests per second with both solutions. This test ran on a dual-core 1.83GHz thing, Linux kernel 3.9.8 in 32bit mode. curl -V: curl 7.32.0-DEV (i686-pc-linux-gnu) libcurl/7.32.0-DEV OpenSSL/1.0.1e zlib/1.2.8 c-ares/1.9.2-DEV libidn/1.25 libssh2/1.4.3_DEV librtmp/2.3 Protocols: dict file ftp ftps gopher http https imap imaps ldap ldaps pop3 pop3s rtmp rtsp scp sftp smtp smtps telnet tftp Features: AsynchDNS Debug TrackMemory IDN IPv6 Largefile NTLM NTLM_WB SSL libz TLS-SRP Can you modify this test code to show the differences you saw? [1] = I chose the easy interface just out of laziness since it made it have to write less code, we can of course make it use the multi API instead to make it mimic your code even closer - but I seriously doubt it will make any performance difference. -- / daniel.haxx.se ------------------------------------------------------------------- List admin: http://cool.haxx.se/list/listinfo/curl-library Etiquette: http://curl.haxx.se/mail/etiquette.htmlReceived on 2013-07-23