curl-library
Re: CURLOPT_READFUNCTION performance issue
Date: Thu, 25 Jul 2013 10:25:20 -0700
Hi Daniel,
Thanks again for the test script. I was able to reproduce the
performance issue by simply pointing the test script you provided to a
simple web server that runs on node.js (change the URL in the test
script to "http://localhost:8124"). I created a gist that contains the
source of the test server so that you can easily reproduce the issue:
https://gist.github.com/mtdowling/6081827
Here is the output of running the test script against a node.js server
with 1000 requests each:
$ time ./debugit
real 0m39.970s
user 0m0.088s
sys 0m0.048s
$ time ./debugit 1
runs fixed string version
real 0m0.287s
user 0m0.024s
sys 0m0.080s
$ curl --version
curl 7.27.0 (x86_64-redhat-linux-gnu) libcurl/7.27.0 NSS/3.14.0.0
zlib/1.2.5 libidn/1.18 libssh2/1.4.2
Protocols: dict file ftp ftps gopher http https imap imaps ldap
ldaps pop3 pop3s rtsp scp sftp smtp smtps telnet tftp
Features: AsynchDNS GSS-Negotiate IDN IPv6 Largefile NTLM NTLM_WB SSL libz
$ node --version
v0.10.13
I was able to quickly install node.js and run the test server on an
EC2 instance with the following:
cd /tmp
wget https://gist.github.com/mtdowling/6081827/raw/94697cce9ec75338d712c29c9f108e040e85b4e7/simple.js
wget http://nodejs.org/dist/$nodeversion/node-v0.10.13-linux-x64.tar.gz
tar xvf node-v0.10.13-linux-x64.tar.gz
/tmp/node-v0.10.13-linux-x64/bin/node simple.js &
Any ideas on why cURL is performing so poorly when contacting a
node.js server with CURLOPT_READFUNCTION?
Thanks,
Michael
On Mon, Jul 22, 2013 at 8:07 PM, Michael Dowling <mtdowling_at_gmail.com> wrote:
> I've noticed that there appears to be a significant performance hit when using
> CURLOPT_READFUNCTION. This issue seems to be platform dependent as I've only
> been able to get poor performance on Linux (Amazon Linux m1.large 64-bit
> ami-0358ce33) across multiple versions of cURL and PHP. I've not seen any
> performance issues on my Mac running PHP 5.3.15 and cURL 7.21.4.
>
> When sending PUT requests containing a 10 byte body (testing123) to a node.js
> server (others have reported issues with Jetty as well) using
> CURLOPT_READFUNCTION, the read and write times returned from
> CURLINFO_SPEED_UPLOAD and CURLINFO_SPEED_DOWNLOAD are very poor: ~833 upload and
> 1333 download.
>
> If you send the same request using CURLOPT_CUSTOMREQUEST => PUT and send the
> body using CURLOPT_POSTFIELDS then the transfer times are significantly
> improved: ~34,000 upload and ~55,000 download.
>
> Note: In both tests, I disabled the Expect: 100-Continue header by setting an
> "Expect:" header in CURLOPT_HTTPHEADER. I am also utilizing persistent HTTP
> connections by using the same multi handle and computing the average upload and
> download times across many different requests.
>
> I wrote a very simple test script that demonstrates the performance issue:
> https://gist.github.com/mtdowling/6059009. You'll need to have a node.js server
> running to handle the requests. I've written up a simple bash script that will
> install PHP, node.js, start the test server, and run the performance test:
> https://gist.github.com/anonymous/6059035.
>
> Thinking that this might be an issue with a specific version of cURL or PHP,
> I manually compiled different versions of PHP and cURL and ran the performance
> tests. There was no improvement using the version combination I had success with
> on my mac or using the latest version of cURL (7.31) and PHP (5.5.1). This does
> not appear to be version dependent. Here are the results of that testing:
> https://github.com/guzzle/guzzle/issues/349#issuecomment-21284834
>
> I ran strace on the PHP script and found that using CURLOPT_POSTFIELDS appears
> to send the headers and the entire payload before receiving anything from the
> server, while CURLOPT_READFUNCTION appears to send the request
> headers, receive the
> response headers, then sends the body afterwards.
>
> I've provided the strace output below. For brevity and easier comprehension, I
> removed the various calls to
> "clock_gettime(CLOCK_MONOTONIC, {17579, 343661534}) = 0".
>
> CURLOPT_READFUNCTION strace:
>
> sendto(3, "PUT /guzzle-server/perf HTTP/1.1"..., 78, MSG_NOSIGNAL, NULL, 0) = 78
> poll([{fd=3, events=POLLIN|POLLPRI|POLLRDNORM|POLLRDBAND}, {fd=3,
> events=POLLOUT|POLLWRNORM}], 2, 0) = 1 ([{fd=3,
> revents=POLLOUT|POLLWRNORM}])
> sendto(3, "testing123", 10, MSG_NOSIGNAL, NULL, 0) = 10
> select(4, [3], [], [], {1, 0}) = 1 (in [3], left {0, 964661})
> poll([{fd=3, events=POLLIN|POLLPRI|POLLRDNORM|POLLRDBAND}], 1, 0) = 1
> ([{fd=3, revents=POLLIN|POLLRDNORM}])
> recvfrom(3, "HTTP/1.1 200 OK\r\nContent-Length:"..., 16384, 0, NULL, NULL) = 116
> poll([{fd=3, events=POLLIN|POLLPRI|POLLRDNORM|POLLRDBAND}], 1, 0) = 0 (Timeout)
>
> CURLOPT_POSTFIELDS strace:
>
> sendto(3, "PUT /guzzle-server/perf HTTP/1.1"..., 137, MSG_NOSIGNAL,
> NULL, 0) = 137
> poll([{fd=3, events=POLLIN|POLLPRI|POLLRDNORM|POLLRDBAND}], 1, 0) = 1
> ([{fd=3, revents=POLLIN|POLLRDNORM}])
> recvfrom(3, "HTTP/1.1 200 OK\r\nContent-Length:"..., 16384, 0, NULL, NULL) = 116
> poll([{fd=3, events=POLLIN|POLLPRI|POLLRDNORM|POLLRDBAND}], 1, 0) = 0 (Timeout)
>
> The loop used to execute the curl_multi handles is very simple and can be found
> in the test script at
> https://gist.github.com/mtdowling/6059009#file-readfuction_perf-php-L5.
>
> Does anyone have any insight on why I'm seeing such a performance hit? Is there
> some way I can get better performance, perhaps by rearranging my CURLOPT_*
> options or changing my loop that executes the cURL handles? Based on the strace
> output, I would assume that this is a cURL issue and not a PHP issue.
>
> Please let me know if I can supply any additional information to help
> troubleshoot.
>
> Thanks,
> Michael
-------------------------------------------------------------------
List admin: http://cool.haxx.se/list/listinfo/curl-library
Etiquette: http://curl.haxx.se/mail/etiquette.html
Received on 2013-07-25