cURL / Mailing Lists / curl-library / Single Mail

curl-library

Re: dns cache doesn't refresh entries while they are in use

From: Stefan Bühler <buehler_at_teamviewer.com>
Date: Fri, 20 Mar 2015 08:02:38 +0000

Hi Ray,

> -----Ursprüngliche Nachricht-----
> Von: curl-library [mailto:curl-library-bounces_at_cool.haxx.se] Im Auftrag von
> Ray Satiro via curl-library
> Gesendet: Freitag, 20. März 2015 08:10
> An: curl-library_at_cool.haxx.se
> Betreff: Re: dns cache doesn't refresh entries while they are in use

> On 3/17/2015 7:14 AM, Stefan Bühler wrote:
> > the dns cache doesn't refresh entries while they are in use, which can lead
> to really bad behavior: your requests time out because the service isn't
> available anymore, and you try them again (too fast or too many requests),
> and there is always at least one request using the entry, and it keeps
> looping...
>
> Hi Stefan thanks for your work on this. I don't suppose there is an easy way
> to reproduce the failure to refresh entries? Can you give us a step by step
> example of how this may happen (or happened) to you? I assume you had
> some easy handles in a multi handle?

Yes, we are working with multi handles under a boost::asio wrapper. We usually limit the number of concurrent requests ("easy handles") for one multi handle to 2 (and queue the others), as was the case when we hit this bug.
In this case we were using it to post data to a webservice, and we really want to get it through, so we always retry requests basically forever. We have some timeout before a request gets retried.

Now the webservice had a problem and didn't handle the requests anymore (the connect() probably timed out after some seconds), and there got some requests queued up; so even a single request wasn't retried for some time, there were probably always two requests active.

Now the webservice was redeployed to a new IP address; but as the cached address was always in use it didn't refresh the address.

The solution was to block the outgoing TCP port at the firewall, leading to faster failures in the requests until all were waiting for the retry timeout and no request was active anymore.

(A lot of this analysis is based on guesses, and we didn't actually try to reproduce it yet or verify the fix actually works)

> > See https://github.com/tvbuehler/curl/commits/fix-dns-cache-refresh
> (especially the "fix refreshing of obsolete dns cache entries") for a possible
> fix, which also contains CURLOPT_RESOLVE related bug fixes.
> >
> > It would be nice to get feedback for this specific commit, and maybe for
> the others as well.
>
> I read through it on github (right now on the web , I didn't compile the
> changes yet) and I think they look good. The version I looked at is the one
> where the refcount bug is fixed. The only thing that concerns me is artifacts,
> like maybe there is somewhere in the code base that depends on timestamp
> 0 meaning the entry is not in the cache (as it did prior to your commit). I will
> check on that though.

I did a full text search for "inuse" and "timestamp" (that is why I removed the unused "timestamp" var in another commit, and tried to move all "inuse" modifications to hostip.c).

> As you know I'm working on the resolve removal and I rewrote that function
> to handle individual removals. Your implementation is pretty straightforward
> and I don't see a problem with it, but if my changes are later accepted yours
> will end up being replaced.
>
> If a maintainer adds your 3 commits I will rebase off your work. I cannot add
> tests for your work though, I have enough to do with what I'm working on.

I hope my work on inuse should simplify your patch. I like your is_user_specified member, and see no problem removing the special timestamp == 0 value completely.

The sscanf call looks weird somehow, but I guess it works.

Best regards,
Stefan Bühler

-------------------------------------------------------------------
List admin: http://cool.haxx.se/list/listinfo/curl-library
Etiquette: http://curl.haxx.se/mail/etiquette.html
Received on 2015-03-20