best practices: c-ares vs threaded resolver
Date: Mon, 16 Sep 2013 13:34:59 -0400 (EDT)
I've been using libcurl with c-ares in our commercial application
for several years and some robustness issues are showing up in
the area of name resolution on the big three platforms (win, mac,
linux). I'm looking for suggestions/experiences on whether going
to the threaded resolver might be the better dns strategy today.
Some problems I'm seeing:
* General unreliability on platforms with DNS timeouts on all
three platforms. Manual configuration of DNS servers to
override DHCP (typically pointing them at Google) often but
not always fixes lookups.
* OS X is having trouble keeping /etc/resolv.conf valid. Either
the symlink is damaged or the target (/var/run/resolv.conf)
isn't being regenerated reliably on network reconfigurations.
Result is a broken server list in c-ares.
* Magical failures. I ran an instrumented adig with printing
around c-ares socket syscalls and watched a lookup go out and
the response come back on win7. Response source address was
other than target server and had no reasonable explanation
in the system configuration. So while the response was actually
correct, anti-poisoning defenses dropped it leading to timeout.
So these and other issues make me want to use the system resolver
library. When it fails, there's a general system/network
configuration issue that doesn't become a support issue for us.
But looking at the implementation of the threaded resolver makes
me ask a few questions. It's a thread-per-request scheme. Good
for avoiding a stall behind a request that will timeout or be
answered slowly. But this makes unbounded demands on thread count.
Although in practice they will be few and short-lived, in an
app that makes 100K's of HTTP requests to 100's of hosts subject
to burst patterns, that looks more fragile with respect to
resources and quotas.
The c-ares we're using *is* old, 1.7.1, and that will get bumped
up but maybe it's time to change. (And curse the bsd folks for
sticking the world with synchronous APIs...) Has anyone else
given both schemes a real-world test in a million-seat application?
-- Monty Brandenberg ------------------------------------------------------------------- List admin: http://cool.haxx.se/list/listinfo/curl-library Etiquette: http://curl.haxx.se/mail/etiquette.htmlReceived on 2013-09-17