curl-library
Re: libcurl and Perl's WWW::Curl::Easy slower than LWP on small HTTP POSTs
Date: Wed, 14 Sep 2011 15:31:50 +0100
On 14/09/11 15:15, Martin J. Evans wrote:
> On 14/09/11 12:27, Daniel Stenberg wrote:
>> On Wed, 14 Sep 2011, Martin J. Evans wrote:
>>
>>> I am using libcurl under Perl's WWW::Curl::Easy module because it
>>> works out a lot faster than the mostly pure Perl LWP module. For
>>> HTTP GETs Curl works out a lot faster but with small POSTs Curl
>>> works out a lot slower (in real time although faster in CPU time).
>>
>> Thanks a lot for these interesting results. Let's try to figure out
>> why they look the way they do!
>>
>> I think you need to do some wiresharking of the TCP traffic during
>> the tests to properly understand what's going on.
>
> ok, I'll look into that but see below.
>
>>> Through experimentation I have discovered setting either
>>> CURLOPT_TCP_NODELAY or CURLOPT_FORBID_REUSE speeds up the POSTs so
>>> Curl is faster than LWP.
>>
>> Have you tried with _only_ CURLOPT_TCP_NODELAY ? And in the other
>> end, do you know if LWP disables the Nagle algorithm and does it
>> re-use the same connection or not?
>
> I did try CURLOPT_TCP_NODELAY and CURLOPT_FORBID_REUSE separately but did not include the numbers. Obviously these numbers are not exactly the same as before since I've re-run the tests. Here they are:
>
> No special settings:
> Benchmark: timing 1000 iterations of curl, lwp...
> curl: 56 wallclock secs ( 1.73 usr + 0.46 sys = 2.19 CPU) @ 456.62/s (n=1000)
> lwp: 22 wallclock secs ( 4.15 usr + 0.59 sys = 4.74 CPU) @ 210.97/s (n=1000)
>
> CURLOPT_FORBID_REUSE:
> Benchmark: timing 1000 iterations of curl, lwp...
> curl: 19 wallclock secs ( 0.35 usr + 0.31 sys = 0.66 CPU) @ 1515.15/s (n=1000)
> lwp: 22 wallclock secs ( 4.29 usr + 0.62 sys = 4.91 CPU) @ 203.67/s (n=1000)
>
> CURLOPT_TCP_NODELAY:
> Benchmark: timing 1000 iterations of curl, lwp...
> curl: 19 wallclock secs ( 0.43 usr + 0.09 sys = 0.52 CPU) @ 1923.08/s (n=1000)
> lwp: 22 wallclock secs ( 4.11 usr + 0.63 sys = 4.74 CPU) @ 210.97/s (n=1000)
>
> LWP does not reuse the connection by default (and in my case) although it can be made to do so. It involves creating a shared object and adding the LWP objects to it. I didn't really go that far down that path because I saw curl was MUCH faster so LWP is only a fallback in my case for when we don't have curl available.
>
> I don't think LWP sets TCP_NODELAY - at least I cannot see it does from the strace.
>
> socket(PF_INET, SOCK_STREAM, IPPROTO_TCP) = 3
> ioctl(3, SNDCTL_TMR_TIMEBASE or TCGETS, 0xbfc77968) = -1 EINVAL (Invalid argument)
> _llseek(3, 0, 0xbfc779b0, SEEK_CUR) = -1 ESPIPE (Illegal seek)
> ioctl(3, SNDCTL_TMR_TIMEBASE or TCGETS, 0xbfc77968) = -1 EINVAL (Invalid argument)
> _llseek(3, 0, 0xbfc779b0, SEEK_CUR) = -1 ESPIPE (Illegal seek)
> fcntl64(3, F_SETFD, FD_CLOEXEC) = 0
> fcntl64(3, F_GETFL) = 0x2 (flags O_RDWR)
> fcntl64(3, F_SETFL, O_RDWR|O_NONBLOCK) = 0
> connect(3, {sa_family=AF_INET, sin_port=htons(82), sin_addr=inet_addr("172.20.0.102")}, 16) = -1 EINPROGRESS (Operation now in progress)
>
> select(8, NULL, [3], NULL, {180, 0}) = 1 (out [3], left {179, 999993})
> connect(3, {sa_family=AF_INET, sin_port=htons(82), sin_addr=inet_addr("172.20.0.102")}, 16) = 0
> fcntl64(3, F_GETFL) = 0x802 (flags O_RDWR|O_NONBLOCK)
> fcntl64(3, F_SETFL, O_RDWR) = 0
> fcntl64(3, F_GETFL) = 0x2 (flags O_RDWR)
> fcntl64(3, F_SETFL, O_RDWR|O_NONBLOCK) = 0
>
> write(3, "POST /xx/xxxx HTTP/1.1\r\nTE: deflate,gzip;q=0.3\r\nConnection: TE, close\r\nHost: xxxxxxxx.xxxxxxxx.xxxxx:82\r\nUser-Agent: libwww-perl/6.02\r\nContent-Length: 46\r\nContent-Type: application/x-www-form-urlencoded;charset=utf-8\r\nCookie: XXXXXXX=ACE6843B40A358E0E040007F01007083%3A634c10dc53fe309a299e4989e544fcea6d6e6f85155ba563c7d8c735f5a6938b\r\nCookie2: $Version=\"1\"\r\n\r\npassword=xxxxxxxx&method=login&username=xxxxxx", 406) = 406
>
> I note post data written in one go and www-form-urlencoded so I agree that is a difference between the Curl test case and the LWP one - which I will rectify.
>
>>> I was a little surprised setting CURLOPT_TCP_NOLDELAY made a
>>> difference but as it does it sort of suggested to me that maybe the
>>> POST headers are sent first then followed up by the form data
>>> (i.e., at least 2 writes to the socket).
>>
>> Without NODELAY I would rather suspect that you'll see a short wait
>> when a single request has been sent and the Nagle waits a short while
>> to see if you send more data to merge into the same packet. When it
>> doesn't come, it sends off the request anyway.
>
> yeah, I know which is why I highlighted the double write.
>
>>> Also, if CURLOPT_FORBID_REUSE is set I suppose it is possible
>>> libcurl does a shutdown(write) on the socket after the last write
>>> which would also expedite any unset data.
>>
>> Did you try with _only_ CURLOPT_FORBID_REUSE set and not NODELAY
>> then? I can't recall any shutdown(write) done by libcurl...
>
> I tried with CURLOPT_FORBID_REUSE and that was the strace from the original email.
>
> The strace for CURPLOPT_TCP_NODELAY only is:
>
> socket(PF_INET, SOCK_STREAM, IPPROTO_TCP) = 3
>
> # NOTE TCP_NODELAY set:
>
> setsockopt(3, SOL_TCP, TCP_NODELAY, [1], 4) = 0
> fcntl64(3, F_GETFL) = 0x2 (flags O_RDWR)
> fcntl64(3, F_SETFL, O_RDWR|O_NONBLOCK) = 0
> connect(3, {sa_family=AF_INET, sin_port=htons(82), sin_addr=inet_addr("172.20.0.102")}, 16) = -1 EINPROGRESS (Operation now in progress)
> poll([{fd=3, events=POLLOUT}], 1, 1000) = 1 ([{fd=3, revents=POLLOUT}])
> getsockopt(3, SOL_SOCKET, SO_ERROR, [0], [4]) = 0
> getpeername(3, {sa_family=AF_INET, sin_port=htons(82), sin_addr=inet_addr("172.20.0.102")}, [16]) = 0
> getsockname(3, {sa_family=AF_INET, sin_port=htons(49737), sin_addr=inet_addr("172.20.0.102")}, [16]) = 0
> send(3, "POST /v1/user HTTP/1.1\r\nHost: xxxxxxxx.xxxxxxxx.xxxxx:82\r\nAccept: */*\r\nAccept-Encoding: gzip\r\nCookie: XXXXXXX=ACE6AD3069DBBA12E040007F01007149%3A6d95293bd5e789f88e87a5f000ff9250a1a3cec81524612f8686e1b5618bbeda\r\nContent-Length: 354\r\nContent-Type: multipart/form-data; boundary=----------------------------30bbaa93aace\r\n\r\n", 320, MSG_NOSIGNAL) = 320
> poll([{fd=3, events=POLLIN|POLLPRI}, {fd=3, events=POLLOUT}], 2, 0) = 1 ([{fd=3, revents=POLLOUT}])
> poll([{fd=3, events=POLLIN|POLLPRI}, {fd=3, events=POLLOUT}], 2, 0) = 1 ([{fd=3, revents=POLLOUT}])
> poll([{fd=3, events=POLLIN|POLLPRI}, {fd=3, events=POLLOUT}], 2, 1000) = 1 ([{fd=3, revents=POLLOUT}])
> poll([{fd=3, events=POLLIN|POLLPRI}, {fd=3, events=POLLOUT}], 2, 0) = 1 ([{fd=3, revents=POLLOUT}])
> recv(3, "HTTP/1.1 200 OK\r\nServer: nginx/1.0.4\r\nDate: Wed, 14 Sep 2011 13:55:57 GMT\r\nContent-Type: application/json; charset=UTF-8\r\nConnection: keep-alive\r\nSet-Cookie: XXXXXXX=ACE690F10782A262E040007F010070C2%3Ad02e9863b75438bd469c60ffaa5bca0eb2d02020e56721e0b34ac77f467a81aa; path=/; expires=Thu, 15-Sep-2011 13:55:57 GMT\r\nContent-Length: 68\r\nAccess-Control-Allow-Headers: *\r\nAccess-Control-Allow-Methods: GET, POST, OPTIONS\r\nAccess-Control-Max-Age: 1728000\r\nAccess-Control-Allow-Origin: *\r\n\r\n[\"[XXX-XXXXXXX-XXXX]\",0,0,\"\",1,[\"ACE690F10782A262E040007F010070C2\"]]", 16384, 0) = 552
>
> and for the second and subsequent posts all to the same url:
>
> send(3, "POST /v1/user HTTP/1.1\r\nHost: xxxxxxxx.xxxxxxxx.xxxxx:82\r\nAccept: */*\r\nAccept-Encoding: gzip\r\nCookie: XXXXXXX=ACE690F10783A262E040007F010070C2%3A1f43327f8fe791f6ecdc056ce9ef34782f400ddd6077d5a46f5afe1beb44598e\r\nContent-Length: 156\r\nContent-Type: multipart/form-data; boundary=----------------------------cbb999035fb2\r\n\r\n", 320, MSG_NOSIGNAL) = 320
> send(3, "------------------------------cbcf74d4560b\r\nContent-Disposition: form-data; name=\"method\"\r\n\r\nsend_keep_alive\r\n------------------------------cbcf74d4560b--\r\n", 156, MSG_NOSIGNAL) = 156
> poll([{fd=3, events=POLLIN|POLLPRI}, {fd=3, events=POLLOUT}], 2, 0) = 1 ([{fd=3, revents=POLLOUT}])
> poll([{fd=3, events=POLLIN|POLLPRI}, {fd=3, events=POLLOUT}], 2, 0) = 1 ([{fd=3, revents=POLLOUT}])
> poll([{fd=3, events=POLLIN|POLLPRI}, {fd=3, events=POLLOUT}], 2, 1000) = 1 ([{fd=3, revents=POLLOUT}])
> poll([{fd=3, events=POLLIN|POLLPRI}, {fd=3, events=POLLOUT}], 2, 0) = 1 ([{fd=3, revents=POLLOUT}])
> poll([{fd=3, events=POLLIN|POLLPRI}], 1, 1000) = 1 ([{fd=3, revents=POLLIN}])
> poll([{fd=3, events=POLLIN|POLLPRI}], 1, 0) = 1 ([{fd=3, revents=POLLIN}])
> recv(3, "HTTP/1.1 200 OK\r\nServer: nginx/1.0.4\r\nDate: Wed, 14 Sep 2011 13:59:22 GMT\r\nContent-Type: application/json; charset=UTF-8\r\nConnection: keep-alive\r\nSet-Cookie: XXXXXXX=ACE690F10783A262E040007F010070C2%3A1f43327f8fe791f6ecdc056ce9ef34782f400ddd6077d5a46f5afe1beb44598e; path=/; expires=Thu, 15-Sep-2011 13:59:22 GMT\r\nContent-Length: 81\r\nAccess-Control-Allow-Headers: *\r\nAccess-Control-Allow-Methods: GET, POST, OPTIONS\r\nAccess-Control-Max-Age: 1728000\r\nAccess-Control-Allow-Origin: *\r\n\r\n[\"[XXX-XXXXXXX-XXXX]\",0,0,\"\",1,[\"1316012362\",\"ACE690F10783A262E040007F010070C2\"]]", 16384, 0) = 565
>
>>> When I strace the code I do in fact see 2 separate writes to the
>>> socket for the header and the form data.
>>
>> Oh right, because you use multipart formpost. If you'd change your
>> test to instead use a "normal" small POST you'll see that libcurl
>> won't do that. It might make sense to change the test to use that. It
>> should make the speed very similar to a plain GET test.
>
> You are of course right and I did not spot that.
>
>> The reason multipart formpost causes two (or more) writes instead of
>> a single one, even when the post is very small, is mostly due to the
>> layout of the internal APIs and convenience/laziness. It could very
>> well be optimized to do better...
>>
>>> Ideally I'd like to reuse the socket but I am wary of setting
>>> CURLOPT_TCP_NOLDELAY.
>>
>> For this kind of use case, I'd say using CURLOPT_TCP_NODELAY makes
>> sense. Of course, in a real life application you might not send so
>> small posts and you might not fire them off that rapidly immediately
>> next after each other I guess...
>
> Unfortunately, our "real" life is not that far removed from this test case - a lot of posts are being fired off to the same url with relatively small amounts of data.
>
>>> If libcurl breaks the post header and the form data into 2 writes
>>> why?
>>
>> You'll have to see the lib/formdata.c and how the data is presented
>> to the libcurl uploader for that to be clear. I wouldn't mind having
>> libcurl optimized to prefer a single write() for small multipart
>> formposts, but I'm not sure I'm willing to waste a lot of time or
>> code readability for it. It's really a corner case.
>
> Point taken - I think my first steps are to make the 2 cases the same - the same form encoding that is.
>
> Thanks for your help.
>
> I'll get back with results.
>
> Martin
I'm not sure I can send the form in the same way from WWW::Curl::Easy. It only has:
use WWW::Curl::Form;
my $curlf = WWW::Curl::Form->new;
$curlf->formaddfile($filename, 'attachment', "multipart/form-data");
$curlf->formadd("FIELDNAME", "VALUE");
$curl->setopt(CURLOPT_HTTPPOST, $curlf);
See - http://search.cpan.org/~szbalint/WWW-Curl-4.15/lib/WWW/Curl.pm
I don't see a way to control how the form data is added - I realise this bit is probably Perl specific for the Perl mappings to libcurl. There is no code in WWW::Curl for formadd, I think they just map into the C library via a small amount of Perl XS. If there is a way to change the way forms are added in libcurl I might have to change the Perl XS code. hmm, getting more complicated now.
Any ideas?
Martin
-- Martin J. Evans Easysoft Limited http://www.easysoft.com ------------------------------------------------------------------- List admin: http://cool.haxx.se/list/listinfo/curl-library Etiquette: http://curl.haxx.se/mail/etiquette.htmlReceived on 2011-09-14