curl / Mailing Lists / curl-library / Single Mail
Buy commercial curl support from WolfSSL. We help you work out your issues, debug your libcurl applications, use the API, port to new platforms, add new features and more. With a team lead by the curl founder himself.

DNS cache performance

From: Sergey Polovko via curl-library <curl-library_at_cool.haxx.se>
Date: Thu, 25 Mar 2021 18:28:34 +0300

Hi,

I use libcurl 7.74.0 in my program (written in C++, running on Linux), responsible for downloading metrics data from many different hosts. A single instance of my program requesting information from 40K - 60K of hosts every 15 seconds, performing around 3K requests per second. I use libcurl multi interface + custom epoll poller to make these requests asynchronously. My code is based on this example https://curl.se/libcurl/c/multi-uv.html, but instead of using libuv, I wrote a custom wrapper around epoll to manage socket events.

Everything works great, but I see pretty high CPU usage by libcurl to manage DNS cache.

Here is the perf report of single thread handling socket events. You can see that around 56% of the overall time thread spends in Curl_hostcache_prune function.
https://gist.github.com/jamel/caf6007805fb1e8148d2620c10c4748b

I see the current implementation of DNS cache uses a pretty small hash table (with only seven slots) which never expands (at least I didn't find this code):
https://github.com/curl/curl/blob/master/lib/hostip.c#L850-L851

Also, invalidation of cache entries implemented as linear traversal through the whole table:
https://github.com/curl/curl/blob/master/lib/hash.c#L241-L254

When there are so many hosts, as in my situation, such implementation will not work by design. I even want to disable DNS cache and use already resolved IP addresses instead of FQDNs, but there is no way to do this. In libcurl code, I only see the possibility of disabling cache invalidation.

Is there any plans to add functionality to
 1) allow disabling DNS cache at all (I mean not just invalidation, but also addition and lookups)
 2) allow configuring the initial size of the hash table used by the DNS cache
 3) use a better data structure to handle the lifetime of many cache entries (e.g., hashed wheel timer or ebtree)
 3) alow to inject a custom DNS cache implementation

Best,
Sergey
-------------------------------------------------------------------
Unsubscribe: https://cool.haxx.se/list/listinfo/curl-library
Etiquette: https://curl.se/mail/etiquette.html
Received on 2021-03-25