Buy commercial curl support from WolfSSL. We help you work
out your issues, debug your libcurl applications, use the API, port to new
platforms, add new features and more. With a team lead by the curl founder
himself.
DNS cache performance
- Contemporary messages sorted: [ by date ] [ by thread ] [ by subject ] [ by author ] [ by messages with attachments ]
From: Sergey Polovko via curl-library <curl-library_at_cool.haxx.se>
Date: Thu, 25 Mar 2021 18:28:34 +0300
Hi,
I use libcurl 7.74.0 in my program (written in C++, running on Linux), responsible for downloading metrics data from many different hosts. A single instance of my program requesting information from 40K - 60K of hosts every 15 seconds, performing around 3K requests per second. I use libcurl multi interface + custom epoll poller to make these requests asynchronously. My code is based on this example https://curl.se/libcurl/c/multi-uv.html, but instead of using libuv, I wrote a custom wrapper around epoll to manage socket events.
Everything works great, but I see pretty high CPU usage by libcurl to manage DNS cache.
Here is the perf report of single thread handling socket events. You can see that around 56% of the overall time thread spends in Curl_hostcache_prune function.
https://gist.github.com/jamel/caf6007805fb1e8148d2620c10c4748b
I see the current implementation of DNS cache uses a pretty small hash table (with only seven slots) which never expands (at least I didn't find this code):
https://github.com/curl/curl/blob/master/lib/hostip.c#L850-L851
Also, invalidation of cache entries implemented as linear traversal through the whole table:
https://github.com/curl/curl/blob/master/lib/hash.c#L241-L254
When there are so many hosts, as in my situation, such implementation will not work by design. I even want to disable DNS cache and use already resolved IP addresses instead of FQDNs, but there is no way to do this. In libcurl code, I only see the possibility of disabling cache invalidation.
Is there any plans to add functionality to
1) allow disabling DNS cache at all (I mean not just invalidation, but also addition and lookups)
2) allow configuring the initial size of the hash table used by the DNS cache
3) use a better data structure to handle the lifetime of many cache entries (e.g., hashed wheel timer or ebtree)
3) alow to inject a custom DNS cache implementation
Best,
Sergey
-------------------------------------------------------------------
Unsubscribe: https://cool.haxx.se/list/listinfo/curl-library
Etiquette: https://curl.se/mail/etiquette.html
Received on 2021-03-25
Date: Thu, 25 Mar 2021 18:28:34 +0300
Hi,
I use libcurl 7.74.0 in my program (written in C++, running on Linux), responsible for downloading metrics data from many different hosts. A single instance of my program requesting information from 40K - 60K of hosts every 15 seconds, performing around 3K requests per second. I use libcurl multi interface + custom epoll poller to make these requests asynchronously. My code is based on this example https://curl.se/libcurl/c/multi-uv.html, but instead of using libuv, I wrote a custom wrapper around epoll to manage socket events.
Everything works great, but I see pretty high CPU usage by libcurl to manage DNS cache.
Here is the perf report of single thread handling socket events. You can see that around 56% of the overall time thread spends in Curl_hostcache_prune function.
https://gist.github.com/jamel/caf6007805fb1e8148d2620c10c4748b
I see the current implementation of DNS cache uses a pretty small hash table (with only seven slots) which never expands (at least I didn't find this code):
https://github.com/curl/curl/blob/master/lib/hostip.c#L850-L851
Also, invalidation of cache entries implemented as linear traversal through the whole table:
https://github.com/curl/curl/blob/master/lib/hash.c#L241-L254
When there are so many hosts, as in my situation, such implementation will not work by design. I even want to disable DNS cache and use already resolved IP addresses instead of FQDNs, but there is no way to do this. In libcurl code, I only see the possibility of disabling cache invalidation.
Is there any plans to add functionality to
1) allow disabling DNS cache at all (I mean not just invalidation, but also addition and lookups)
2) allow configuring the initial size of the hash table used by the DNS cache
3) use a better data structure to handle the lifetime of many cache entries (e.g., hashed wheel timer or ebtree)
3) alow to inject a custom DNS cache implementation
Best,
Sergey
-------------------------------------------------------------------
Unsubscribe: https://cool.haxx.se/list/listinfo/curl-library
Etiquette: https://curl.se/mail/etiquette.html
Received on 2021-03-25