Buy commercial curl support from WolfSSL. We help you work
out your issues, debug your libcurl applications, use the API, port to new
platforms, add new features and more. With a team lead by the curl founder
himself.
Re: Question about DNS timeout in libCurl
- Contemporary messages sorted: [ by date ] [ by thread ] [ by subject ] [ by author ] [ by messages with attachments ]
From: Timothe Litt <litt_at_acm.org>
Date: Thu, 16 Dec 2021 15:30:28 -0500
On 16-Dec-21 15:07, Dmitry Karpov via curl-library wrote:
>> How do other common getaddrinfo implementations handle (timeouts for) non-responding name reservers? It seems like behavior we should be able to mimick.
> The regular getaddrinfo() on Linux systems follows the resolv.conf spec and uses 5s timeout by default when switching from one name server to the other.
> C-ares currently follows the same convention, so both c-ares and getaddrinfo() provide the same expected behavior if name servers in the resolv.conf don't use the 'timeout' option.
>
> The 'timeout' name server option in the resolv.conf is supposed to specify a timeout for some specific name server, but c-ares currently ignores it.
> That's probably where c-ares can also improve and start honoring this option in the future releases.
>
>> If we can't improve c-ares to do this better then I think this is a change to consider, yes. I want us to first explore fixing this in the resolver code.
> Yes, c-ares definitely can be improved, but I think that it can't probably ignore the resolv.conf spec and change the default 5s timeout imposed by the spec.
> Doing so will create a difference between regular getaddrinfo() and c-ares behaviors, be against the expected resolver behavior and can be considered as c-ares regression from the apps that rely on resolver compliance with the resolv.conf spec.
>
> So, c-ares team may have all good reasons to reject the idea of decreasing the default timeout, I am afraid, just for the sake of libcurl, arguing that c-ares does provide API to change the timeout if some app doesn't like its default value, but by default they should follow the resolv.conf expectations.
>
> Let me summarize the c-ares improvements which might help libcurl to better handle resolution issues, so we can compile a list of suggestions/feature requests for the c-ares mailing list.
>
> 1. Honor 'timeout' option in the resolv.conf.
> - This will make c-ares fully compliant with the resolv.conf spec and allow to specify DNS timeouts for some specific name servers on a system level.
> This may be enough to work around some DNS timeout issues if some project has control over resolv.conf
>
> 2. Provide c-ares option to run name resolution queries for different name servers in parallel instead of doing it sequentially, which should help to find a good name resolution much faster.
> One flavor of this approach, if it can help to simplify things, can be running queues for IPv4 and IPv6 name servers in parallel, while iterating each family sequentially - thus following Happy Eyeballs philosophy.
>
> These changes will keep the default c-ares timeout (as resolv.conf prescribes) but should let libcurl to better handle problems with name servers without doing some explicit timeout manipulation stuff.
> Please comment/add more to the suggestions, and I will put them into the c-ares mailing list.
>
> Thanks,
> Dmitry Karpov
The main thing that would help, especially in the scenario that started
this discussion, is having a way to cache & persist server performance
so that each query / activation of the library doesn't start from scratch.
Currently, any application that runs a command & exits loses knowledge
of slow/bad servers, so if the list(s) happen to put the slowest
server(s) first, you get the worst case behavior. Persisting a cache of
server response times would allow queries to be issued to the fastest
(functional) server first. (Of course you have to timestamp the entries
& retry once in a while...)
I provided the rough outline of how caching servers do this in a
previous note. For a loadable library such as curl/c-ares, you need to
have a persistent store for the performance cache - perhaps a
memory-mapped file that can be shared across an users of the library.
You need a locking protocol - but you can use one of the optimistic ones
if you're careful. And fairly fine-grained locks. The state of a
remote server should not change frequently, so updates will be driven by
first-accesses to a zone cut, with retry attempts and timeouts being
minor contributors.
Getting this right reduces the need for tuning the timeouts - if a
timeout happens after first access, your selection criteria and/or cache
have failed.
P.S. You don't need to over-engineer the performance cache - if it
becomes corrupt or the file format changes, starting over isn't the end
of the world.
Timothe Litt
ACM Distinguished Engineer
--------------------------
This communication may not represent the ACM or my employer's views,
if any, on the matters discussed.
Received on 2021-12-16
Date: Thu, 16 Dec 2021 15:30:28 -0500
On 16-Dec-21 15:07, Dmitry Karpov via curl-library wrote:
>> How do other common getaddrinfo implementations handle (timeouts for) non-responding name reservers? It seems like behavior we should be able to mimick.
> The regular getaddrinfo() on Linux systems follows the resolv.conf spec and uses 5s timeout by default when switching from one name server to the other.
> C-ares currently follows the same convention, so both c-ares and getaddrinfo() provide the same expected behavior if name servers in the resolv.conf don't use the 'timeout' option.
>
> The 'timeout' name server option in the resolv.conf is supposed to specify a timeout for some specific name server, but c-ares currently ignores it.
> That's probably where c-ares can also improve and start honoring this option in the future releases.
>
>> If we can't improve c-ares to do this better then I think this is a change to consider, yes. I want us to first explore fixing this in the resolver code.
> Yes, c-ares definitely can be improved, but I think that it can't probably ignore the resolv.conf spec and change the default 5s timeout imposed by the spec.
> Doing so will create a difference between regular getaddrinfo() and c-ares behaviors, be against the expected resolver behavior and can be considered as c-ares regression from the apps that rely on resolver compliance with the resolv.conf spec.
>
> So, c-ares team may have all good reasons to reject the idea of decreasing the default timeout, I am afraid, just for the sake of libcurl, arguing that c-ares does provide API to change the timeout if some app doesn't like its default value, but by default they should follow the resolv.conf expectations.
>
> Let me summarize the c-ares improvements which might help libcurl to better handle resolution issues, so we can compile a list of suggestions/feature requests for the c-ares mailing list.
>
> 1. Honor 'timeout' option in the resolv.conf.
> - This will make c-ares fully compliant with the resolv.conf spec and allow to specify DNS timeouts for some specific name servers on a system level.
> This may be enough to work around some DNS timeout issues if some project has control over resolv.conf
>
> 2. Provide c-ares option to run name resolution queries for different name servers in parallel instead of doing it sequentially, which should help to find a good name resolution much faster.
> One flavor of this approach, if it can help to simplify things, can be running queues for IPv4 and IPv6 name servers in parallel, while iterating each family sequentially - thus following Happy Eyeballs philosophy.
>
> These changes will keep the default c-ares timeout (as resolv.conf prescribes) but should let libcurl to better handle problems with name servers without doing some explicit timeout manipulation stuff.
> Please comment/add more to the suggestions, and I will put them into the c-ares mailing list.
>
> Thanks,
> Dmitry Karpov
The main thing that would help, especially in the scenario that started
this discussion, is having a way to cache & persist server performance
so that each query / activation of the library doesn't start from scratch.
Currently, any application that runs a command & exits loses knowledge
of slow/bad servers, so if the list(s) happen to put the slowest
server(s) first, you get the worst case behavior. Persisting a cache of
server response times would allow queries to be issued to the fastest
(functional) server first. (Of course you have to timestamp the entries
& retry once in a while...)
I provided the rough outline of how caching servers do this in a
previous note. For a loadable library such as curl/c-ares, you need to
have a persistent store for the performance cache - perhaps a
memory-mapped file that can be shared across an users of the library.
You need a locking protocol - but you can use one of the optimistic ones
if you're careful. And fairly fine-grained locks. The state of a
remote server should not change frequently, so updates will be driven by
first-accesses to a zone cut, with retry attempts and timeouts being
minor contributors.
Getting this right reduces the need for tuning the timeouts - if a
timeout happens after first access, your selection criteria and/or cache
have failed.
P.S. You don't need to over-engineer the performance cache - if it
becomes corrupt or the file format changes, starting over isn't the end
of the world.
Timothe Litt
ACM Distinguished Engineer
--------------------------
This communication may not represent the ACM or my employer's views,
if any, on the matters discussed.
-- Unsubscribe: https://lists.haxx.se/listinfo/curl-library Etiquette: https://curl.haxx.se/mail/etiquette.html
- application/pgp-signature attachment: OpenPGP digital signature