cURL / Mailing Lists / curl-library / Single Mail

curl-library

Re: perf with libcurl over time

From: Nick Gerner <nick_at_seomoz.org>
Date: Thu, 14 Jan 2010 14:46:26 -0800

>
>
> > I'm seeing some perf issues out of libcurl (7.19.3)
>
> I think it would be interesting if you could upgrade to the latest and see
> how
> things run. To me it makes little sense of chasing problems that have a
> chance
> of not being present in the current version.
>

I'll give that a go...or the latest CVS (to get your dns cache fix)

Here's the same oprofile data for the beginning of the run:
samples % linenr info image name app
name symbol name
720474 28.7626 hostip.c:0 libcurl.so.4.1.1
libcurl.so.4.1.1 hostcache_timestamp_remove
383402 15.3061 (no location information) libcurl.so.4.1.1
libcurl.so.4.1.1 Curl_hash_clean_with_criterium
172446 6.8843 (no location information) libcurl.so.4.1.1
libcurl.so.4.1.1 Curl_hash_pick
163536 6.5286 (no location information) libstdc++.so.6.0.9
libstdc++.so.6.0.9 (no symbols)
133681 5.3368 (no location information) libcares.so.2.0.0
 libcares.so.2.0.0 (no symbols)
120374 4.8055 charset.h:204 retrieve
retrieve CleanNulls(char*, char*, unsigned long, unsigned
long, StatisticSet*)

so the time spent in timestamp_remove and clean_with_criterium go up toward
the end of the run (by a lot).

> > We've got a caching DNS server running locally, so we don't need any more
> > DNS cache (our perf is much worse with any of the above reversed).
>
> I'm sorry but that doesn't make sense at all. If it is indeed true, it
> would
> rather indicate bugs in your testing or in libcurl more than anything else.

I'll try it again with the latest code, and see if I can get better profile
data.

keep in mind that we might hit a million (or 10s of millions) of hosts. Is
the dns cache a hash table or a tree? What about the connection cache?

> We've however tracked down a fixed a bug in the DNS cache recently (present
> only in CVS and thus next release if I'm not mistaking) that makes the
> entries
> in the cache get kept too long while connections against the hosts are
> still
> in use. That shouldn't affect lookup speed though, it should only make DNS
> entries stay in the cache longer than specified.
>

I'll grab the latest CVS if I get a chance.

>
> > * Am I right that hostip.c, that hostcache_timestamp_remove,
> > Curl_hash_clean_with_criterium, Curl_hash_pick are all related to the DNS
> > cache?
>
> No it isn't, but for the curl_mult_perform() case I think that's the only
> use.
> The same set of functions are only used for the socket hash, but that's
> only
> used for the multi_socket API.
>

we are using the multi_socket API, sorry if got that confused. here's
curl-config --features:

SSL
IPv6
libz
AsynchDNS
NTLM

Here's a sketch of the code:

fetch_GlobalInfo *g = new our_data_structure();
curl_multi_setopt(m, CURLMOPT_PIPELINING, (long)0);
curl_multi_setopt(m, CURLMOPT_SOCKETFUNCTION, our_socket_callback);
curl_multi_setopt(m, CURLMOPT_SOCKETDATA, g);
curl_multi_setopt(m, CURLMOPT_TIMERFUNCTION, our_timer_callback);
curl_multi_setopt(m, CURLMOPT_TIMERDATA, g);

while(more_work_to_do())
{
  populate_easy_handles_and_add_to_multi_handle();
  nfds = get_all_the_waiting_sockets(&fds);
  ret = poll(fds, nfds, timeout_from_multi_timeout);
  for(int fdIndex = 0; fdIndex < nfds; fdIndex++)
  {
    while(handle_socket_via_curl_multi_socket_action(fds[fdIndex]) ==
CURLM_CALL_MULTI_PERFORM);
  }
  handle_all_completed_handles();
}

Thanks.

--Nick

-------------------------------------------------------------------
List admin: http://cool.haxx.se/list/listinfo/curl-library
Etiquette: http://curl.haxx.se/mail/etiquette.html
Received on 2010-01-14