curl-library
Re: maximizing performance when issuing a large number of GET/PUT requests to a single server
Date: Thu, 11 Aug 2011 12:02:00 -0700
From what it seems right now the problem lies in not reusing
connections and system runs out of available tcp ports. A simple mod
of reusing the same handle has helped somewhat.
Thanks
Alex
On Thu, Aug 11, 2011 at 11:40 AM, Alan Wolfe <alan.wolfe_at_gmail.com> wrote:
> also hey you might want to profile things to see where your bottle necks are
> before attempting a solution.
>
> like for instance, if your hard drive is the slow part, and it's 100%
> utilized, adding more threads isnt going to make the read head move any
> faster.
>
> just being able to see if it's CPU, storage access, or network that is the
> bottleneck, and the characteristics of the bottleneck (ie spikes vs
> sustained bottleneck) should help you formulate a better solution if you are
> really looking to maximize your perf.
>
> On Thu, Aug 11, 2011 at 11:29 AM, Alex Loukissas <alex_at_maginatics.com>
> wrote:
>>
>> On Thu, Aug 11, 2011 at 10:18 AM, Daniel Stenberg <daniel_at_haxx.se> wrote:
>> > On Thu, 11 Aug 2011, Alex Loukissas wrote:
>> >
>> >>> In many situations you won't gain any performance by doing parallell
>> >>> uploads to a single server, and you'll get a simpler implementation by
>> >>> doing
>> >>> serial uploads so I'd recommend that. At least for a first shot.
>> >>>
>> >>> Just make sure you re-use the same CURL handle _without_ doing
>> >>> cleanup/init between each individual transfer.
>> >>
>> >> Seems like a good plan of attack to use the multi interface as a first
>> >> shot.
>> >
>> > Well, I meant to use the easy interface single-threaded as a first
>> > attempt,
>> > but by all means use the multi interface!
>> >
>>
>> I did try reusing the same handle, which (as expected) had a
>> tremendous positive effect in the number of TIME_WAIT sockets and the
>> initial tests on performance show some improvement (although not
>> drastic). Is there a benefit over what I'm doing now (i.e. looping
>> through the URIs and issuing a curl_easy_perform) versus having a
>> number of handles in a curl_multi_handle? From what I understand, the
>> multi interface is doing the same thing (i.e. serially doing a
>> curl_easy_perform for each handle).
>>
>> >> Another option would be HTTP pipelining but from what I see it's not in
>> >> its most mature stage in libcurl. The reason I'm considering pipelining
>> >> is
>> >> that the files I'm up/downloading are small size (few to couple hundred
>> >> KB)
>> >> and I'm guessing that squeezing many HTTP requests in a single round
>> >> trip
>> >> I'll gain some performance. Any thoughts on that?
>> >
>> > It is not as mature, no, but perhaps more importantly it is only enabled
>> > for
>> > downloading. If you use the multi interface, you can easily just enable
>> > pipelining and see how it works/performs.
>>
>> Great thanks. Reads are more important than writes, so that's OK. I'll
>> give it a shot.
>>
>> >
>> > --
>> >
>> > / daniel.haxx.se
>> > -------------------------------------------------------------------
>> > List admin: http://cool.haxx.se/list/listinfo/curl-library
>> > Etiquette: http://curl.haxx.se/mail/etiquette.html
>> >
>>
>> -------------------------------------------------------------------
>> List admin: http://cool.haxx.se/list/listinfo/curl-library
>> Etiquette: http://curl.haxx.se/mail/etiquette.html
>
>
> -------------------------------------------------------------------
> List admin: http://cool.haxx.se/list/listinfo/curl-library
> Etiquette: http://curl.haxx.se/mail/etiquette.html
>
-------------------------------------------------------------------
List admin: http://cool.haxx.se/list/listinfo/curl-library
Etiquette: http://curl.haxx.se/mail/etiquette.html
Received on 2011-08-11