Buy commercial curl support from WolfSSL. We help you work
out your issues, debug your libcurl applications, use the API, port to new
platforms, add new features and more. With a team lead by the curl founder
himself.
Re: Question about DNS timeout in libCurl
- Contemporary messages sorted: [ by date ] [ by thread ] [ by subject ] [ by author ] [ by messages with attachments ]
From: Timothe Litt <litt_at_acm.org>
Date: Thu, 16 Dec 2021 19:53:59 -0500
To be clear, my point is that a DNS server performance cache belongs
("inband") in c-ares/libcurl, not the client code. There is no need to
update system files to implement this. It's pretty simple - "avoid
timeouts by retaining what you learn".
The objective is to connect to the functional/best server as often as
possible. Once a timeout happens - however long it is, you've already
lost. And every time you retry a slow/broken server and encounter the
same timeout, you lose again.
Keeping a resolver-wide cache of which servers to use for a domain
maximizes the chance that a resolution will be fast. (And caching the
result is even better, TTL permitting.) Unless you're implementing a
web crawler or enterprise/ISP-scale users, the required server cache
will be surprisingly small. You can trade some performance for even
less memory by caching only the "bad" servers.
The other point, as I and others have made previously, is that the best
(no new code) solution is to incorporate an existing caching nameserver
in your product - bind, powerdns, systemd-resolved, and others. They
have all the necessary logic & support.
Where that isn't possible, the next best approach is to upgrade the
libraries that do their own resolution to do the same thing.
Another useful project would be for someone to make a standard library
that implements the happy eyeballs algorithm, which would simplify life
for everyone - and greatly reduce the case for c-ares to exist. I'm
surprised that so many products have rolled their own...
Tweaking timeouts will not solve your problem.
Timothe Litt
ACM Distinguished Engineer
--------------------------
This communication may not represent the ACM or my employer's views,
if any, on the matters discussed.
On 16-Dec-21 16:37, Dmitry Karpov wrote:
>
> * The main thing that would help, especially in the scenario that
> started this discussion, is having a way to cache & persist server
> performance so that each query / activation of the library doesn't
> start from scratch.
>
> While I agree that such “out-of-band” approach might help in general,
> it is not easy to implement, and sometimes maybe not even possible if
> the client doesn’t have any way to affect the system level – i.e.
> modifying system files like resolv.conf.
>
> So, both libcurl and c-ares should provide a good way to quickly skip
> over not working name servers when they have “read only” permissions
> for system DNS settings.
> You are right that this means that the libraries always have to do
> name resolutions without dependencies on previous name resolution
> failures, so they should try potentially failing servers again.
>
> But in my opinion, for most practical scenarios it is better to
> optimize libcurl/c-ares interactions with available DNS settings than
> build complex “out-of-band” caching solutions, with locking protocol
> and other complexities.
>
> And nothing prevents clients/projects having full access to their
> systems implement advanced caching/name server filtering solutions on
> top of libcurl/c-ares optimizations.
> So, I would rather focus this discussion only on libcurl/c-ares
> improvements, which once they are done, will be sufficient to cover
> typical use cases and most problems with name servers, especially in
> dual-stack area.
>
> Thanks,
> Dmitry Karpov
>
> *From:* curl-library <curl-library-bounces_at_lists.haxx.se> *On Behalf
> Of *Timothe Litt via curl-library
> *Sent:* Thursday, December 16, 2021 12:30 PM
> *To:* curl-library_at_lists.haxx.se
> *Cc:* Timothe Litt <litt_at_acm.org>
> *Subject:* Re: Question about DNS timeout in libCurl
>
> On 16-Dec-21 15:07, Dmitry Karpov via curl-library wrote:
>
> How do other common getaddrinfo implementations handle (timeouts for) non-responding name reservers? It seems like behavior we should be able to mimick.
>
> The regular getaddrinfo() on Linux systems follows the resolv.conf spec and uses 5s timeout by default when switching from one name server to the other.
>
> C-ares currently follows the same convention, so both c-ares and getaddrinfo() provide the same expected behavior if name servers in the resolv.conf don't use the 'timeout' option.
>
> The 'timeout' name server option in the resolv.conf is supposed to specify a timeout for some specific name server, but c-ares currently ignores it.
>
> That's probably where c-ares can also improve and start honoring this option in the future releases.
>
> If we can't improve c-ares to do this better then I think this is a change to consider, yes. I want us to first explore fixing this in the resolver code.
>
> Yes, c-ares definitely can be improved, but I think that it can't probably ignore the resolv.conf spec and change the default 5s timeout imposed by the spec.
>
> Doing so will create a difference between regular getaddrinfo() and c-ares behaviors, be against the expected resolver behavior and can be considered as c-ares regression from the apps that rely on resolver compliance with the resolv.conf spec.
>
> So, c-ares team may have all good reasons to reject the idea of decreasing the default timeout, I am afraid, just for the sake of libcurl, arguing that c-ares does provide API to change the timeout if some app doesn't like its default value, but by default they should follow the resolv.conf expectations.
>
> Let me summarize the c-ares improvements which might help libcurl to better handle resolution issues, so we can compile a list of suggestions/feature requests for the c-ares mailing list.
>
>
>
> 1. Honor 'timeout' option in the resolv.conf.
>
> - This will make c-ares fully compliant with the resolv.conf spec and allow to specify DNS timeouts for some specific name servers on a system level.
>
> This may be enough to work around some DNS timeout issues if some project has control over resolv.conf
>
> 2. Provide c-ares option to run name resolution queries for different name servers in parallel instead of doing it sequentially, which should help to find a good name resolution much faster.
>
> One flavor of this approach, if it can help to simplify things, can be running queues for IPv4 and IPv6 name servers in parallel, while iterating each family sequentially - thus following Happy Eyeballs philosophy.
>
> These changes will keep the default c-ares timeout (as resolv.conf prescribes) but should let libcurl to better handle problems with name servers without doing some explicit timeout manipulation stuff.
>
> Please comment/add more to the suggestions, and I will put them into the c-ares mailing list.
>
> Thanks,
>
> Dmitry Karpov
>
> The main thing that would help, especially in the scenario that
> started this discussion, is having a way to cache & persist server
> performance so that each query / activation of the library doesn't
> start from scratch.
>
> Currently, any application that runs a command & exits loses knowledge
> of slow/bad servers, so if the list(s) happen to put the slowest
> server(s) first, you get the worst case behavior. Persisting a cache
> of server response times would allow queries to be issued to the
> fastest (functional) server first. (Of course you have to timestamp
> the entries & retry once in a while...)
>
> I provided the rough outline of how caching servers do this in a
> previous note. For a loadable library such as curl/c-ares, you need
> to have a persistent store for the performance cache - perhaps a
> memory-mapped file that can be shared across an users of the library.
> You need a locking protocol - but you can use one of the optimistic
> ones if you're careful. And fairly fine-grained locks. The state of
> a remote server should not change frequently, so updates will be
> driven by first-accesses to a zone cut, with retry attempts and
> timeouts being minor contributors.
>
> Getting this right reduces the need for tuning the timeouts - if a
> timeout happens after first access, your selection criteria and/or
> cache have failed.
>
> P.S. You don't need to over-engineer the performance cache - if it
> becomes corrupt or the file format changes, starting over isn't the
> end of the world.
>
> Timothe Litt
> ACM Distinguished Engineer
> --------------------------
> This communication may not represent the ACM or my employer's views,
> if any, on the matters discussed.
Received on 2021-12-17
Date: Thu, 16 Dec 2021 19:53:59 -0500
To be clear, my point is that a DNS server performance cache belongs
("inband") in c-ares/libcurl, not the client code. There is no need to
update system files to implement this. It's pretty simple - "avoid
timeouts by retaining what you learn".
The objective is to connect to the functional/best server as often as
possible. Once a timeout happens - however long it is, you've already
lost. And every time you retry a slow/broken server and encounter the
same timeout, you lose again.
Keeping a resolver-wide cache of which servers to use for a domain
maximizes the chance that a resolution will be fast. (And caching the
result is even better, TTL permitting.) Unless you're implementing a
web crawler or enterprise/ISP-scale users, the required server cache
will be surprisingly small. You can trade some performance for even
less memory by caching only the "bad" servers.
The other point, as I and others have made previously, is that the best
(no new code) solution is to incorporate an existing caching nameserver
in your product - bind, powerdns, systemd-resolved, and others. They
have all the necessary logic & support.
Where that isn't possible, the next best approach is to upgrade the
libraries that do their own resolution to do the same thing.
Another useful project would be for someone to make a standard library
that implements the happy eyeballs algorithm, which would simplify life
for everyone - and greatly reduce the case for c-ares to exist. I'm
surprised that so many products have rolled their own...
Tweaking timeouts will not solve your problem.
Timothe Litt
ACM Distinguished Engineer
--------------------------
This communication may not represent the ACM or my employer's views,
if any, on the matters discussed.
On 16-Dec-21 16:37, Dmitry Karpov wrote:
>
> * The main thing that would help, especially in the scenario that
> started this discussion, is having a way to cache & persist server
> performance so that each query / activation of the library doesn't
> start from scratch.
>
> While I agree that such “out-of-band” approach might help in general,
> it is not easy to implement, and sometimes maybe not even possible if
> the client doesn’t have any way to affect the system level – i.e.
> modifying system files like resolv.conf.
>
> So, both libcurl and c-ares should provide a good way to quickly skip
> over not working name servers when they have “read only” permissions
> for system DNS settings.
> You are right that this means that the libraries always have to do
> name resolutions without dependencies on previous name resolution
> failures, so they should try potentially failing servers again.
>
> But in my opinion, for most practical scenarios it is better to
> optimize libcurl/c-ares interactions with available DNS settings than
> build complex “out-of-band” caching solutions, with locking protocol
> and other complexities.
>
> And nothing prevents clients/projects having full access to their
> systems implement advanced caching/name server filtering solutions on
> top of libcurl/c-ares optimizations.
> So, I would rather focus this discussion only on libcurl/c-ares
> improvements, which once they are done, will be sufficient to cover
> typical use cases and most problems with name servers, especially in
> dual-stack area.
>
> Thanks,
> Dmitry Karpov
>
> *From:* curl-library <curl-library-bounces_at_lists.haxx.se> *On Behalf
> Of *Timothe Litt via curl-library
> *Sent:* Thursday, December 16, 2021 12:30 PM
> *To:* curl-library_at_lists.haxx.se
> *Cc:* Timothe Litt <litt_at_acm.org>
> *Subject:* Re: Question about DNS timeout in libCurl
>
> On 16-Dec-21 15:07, Dmitry Karpov via curl-library wrote:
>
> How do other common getaddrinfo implementations handle (timeouts for) non-responding name reservers? It seems like behavior we should be able to mimick.
>
> The regular getaddrinfo() on Linux systems follows the resolv.conf spec and uses 5s timeout by default when switching from one name server to the other.
>
> C-ares currently follows the same convention, so both c-ares and getaddrinfo() provide the same expected behavior if name servers in the resolv.conf don't use the 'timeout' option.
>
> The 'timeout' name server option in the resolv.conf is supposed to specify a timeout for some specific name server, but c-ares currently ignores it.
>
> That's probably where c-ares can also improve and start honoring this option in the future releases.
>
> If we can't improve c-ares to do this better then I think this is a change to consider, yes. I want us to first explore fixing this in the resolver code.
>
> Yes, c-ares definitely can be improved, but I think that it can't probably ignore the resolv.conf spec and change the default 5s timeout imposed by the spec.
>
> Doing so will create a difference between regular getaddrinfo() and c-ares behaviors, be against the expected resolver behavior and can be considered as c-ares regression from the apps that rely on resolver compliance with the resolv.conf spec.
>
> So, c-ares team may have all good reasons to reject the idea of decreasing the default timeout, I am afraid, just for the sake of libcurl, arguing that c-ares does provide API to change the timeout if some app doesn't like its default value, but by default they should follow the resolv.conf expectations.
>
> Let me summarize the c-ares improvements which might help libcurl to better handle resolution issues, so we can compile a list of suggestions/feature requests for the c-ares mailing list.
>
>
>
> 1. Honor 'timeout' option in the resolv.conf.
>
> - This will make c-ares fully compliant with the resolv.conf spec and allow to specify DNS timeouts for some specific name servers on a system level.
>
> This may be enough to work around some DNS timeout issues if some project has control over resolv.conf
>
> 2. Provide c-ares option to run name resolution queries for different name servers in parallel instead of doing it sequentially, which should help to find a good name resolution much faster.
>
> One flavor of this approach, if it can help to simplify things, can be running queues for IPv4 and IPv6 name servers in parallel, while iterating each family sequentially - thus following Happy Eyeballs philosophy.
>
> These changes will keep the default c-ares timeout (as resolv.conf prescribes) but should let libcurl to better handle problems with name servers without doing some explicit timeout manipulation stuff.
>
> Please comment/add more to the suggestions, and I will put them into the c-ares mailing list.
>
> Thanks,
>
> Dmitry Karpov
>
> The main thing that would help, especially in the scenario that
> started this discussion, is having a way to cache & persist server
> performance so that each query / activation of the library doesn't
> start from scratch.
>
> Currently, any application that runs a command & exits loses knowledge
> of slow/bad servers, so if the list(s) happen to put the slowest
> server(s) first, you get the worst case behavior. Persisting a cache
> of server response times would allow queries to be issued to the
> fastest (functional) server first. (Of course you have to timestamp
> the entries & retry once in a while...)
>
> I provided the rough outline of how caching servers do this in a
> previous note. For a loadable library such as curl/c-ares, you need
> to have a persistent store for the performance cache - perhaps a
> memory-mapped file that can be shared across an users of the library.
> You need a locking protocol - but you can use one of the optimistic
> ones if you're careful. And fairly fine-grained locks. The state of
> a remote server should not change frequently, so updates will be
> driven by first-accesses to a zone cut, with retry attempts and
> timeouts being minor contributors.
>
> Getting this right reduces the need for tuning the timeouts - if a
> timeout happens after first access, your selection criteria and/or
> cache have failed.
>
> P.S. You don't need to over-engineer the performance cache - if it
> becomes corrupt or the file format changes, starting over isn't the
> end of the world.
>
> Timothe Litt
> ACM Distinguished Engineer
> --------------------------
> This communication may not represent the ACM or my employer's views,
> if any, on the matters discussed.
-- Unsubscribe: https://lists.haxx.se/listinfo/curl-library Etiquette: https://curl.haxx.se/mail/etiquette.html
- application/pgp-signature attachment: OpenPGP digital signature