cURL / Mailing Lists / curl-library / Single Mail

curl-library

Re: Threaded resolver spins bug + patch

From: Constantine Sapuntzakis <csapuntz_at_gmail.com>
Date: Sun, 9 Aug 2009 11:56:47 -0700

Yep, we're seeing it across all of our XP machines. It's a busy wait and
name resolution is usually fast (Windows caches name resolutions) so it's
hard to notice the CPU spike. We saw it because we were running at early
boot before the network stack was initialized, so name resolution was taking
seconds to time out and the CPU was spinning, slowing down XP boot.

I'm liking Jamie's suggestion of using two local TCP sockets as a pipe, as
it makes CURLMOPT_NODUMMY is unnecessary. We would return one end of the
socket to the application and give the other end to the resovler thread. The
resolver thread would write a byte to the pipe to indicate that it's done
resolving, which would wake up select which is selecting on the other
socket. That way, everybody could keep their old select() code and there
would always be some sort of socket associated with the curl handle.

I have some ideas for simplifying the threaded resolver and will fix bug #64
at the same time. One of the
reasons that code is so hairy is that the resolver thread is doing too much.
As a result, lots of synchronization
is needed. Here is how to simplify:

  * the resolver thread should not access the connectdata struct at all. It
should only modify variables in thread_data
  * the resolver thread should not call Curl_addrinfo*_callback . It should
just return what the OS gave it and notify the main thread which will then
call Curl_is_resolved/Curl_wait_resolved which will then call the callbacks.
The one trickiness is that the hostent structure return by gethostbyname is
thread local and gets deallocated when the thread ends. The solution is to
copy the contents of the structure into the thread_data.
The getaddrinfo API does not suffer from this problem.
  * the struct thread_data should be reference counted. The connection and
the thread will both keep a reference. Whoever takes the reference count to
zero deallocates the structure.
  * use a critical section instead of Windows Mutex to protect fields

This should get rid of the need for all the event variables and Windows
mutexes currently in the code. A critical
section should be introduced though to serialize accesses to the fields of
the thread_data structure (e.g. the reference count, a 'bool done' flag).

One nastiness with the threaded async resolver is that there is no way to
abort a name lookup. Let's say the caller is using curl_multi and decides to
abandon their easy op (curl_easy_cleanup()). curl_easy_cleanup
could

   * block and wait for the background resolver thread to finish. This would
prevent other handles in the curl_multi from being serviced but is the
simplest approach. This is what the current code does.

   * not block and let the thread run in the background and terminate. Many
threads could potentially accumulate. A solution is to limit the number of
resolver threads to some number and start doing synchronous lookups after
that (again blocking servicing of other handles).

     Probably the gold plated solution here is to keep a queue of name
resolution requests and a bounded thread pool. If an operation is abandoned
while it is in the queue waiting for a thread, it gets removed from the
queue.

    There is one more annoying corner case in the not block case - in
curl_global_cleanup, we must wait for all the resolver threads to terminate
before returning - otherwise the caller of curl_global_cleanup may unload
the DLL while there are still threads in it and unhappiness will result.

Once the simplification is done, the code can be ported to Linux/Mac OS X
fairly easily:
   * critical section -> pthread_mutex
   * CreateThread/ WaitSingleObject(hThread) ->
pthread_create()/pthread_join()
   * local TCP socket pair --> Unix socketpair()

I've got othe rprojects at work, so it will be a little while before I can
generate this patch. If someone feels motivated to do it instead, they will
have my admiration and gratitude. :-)

-Costa

On Sun, Aug 9, 2009 at 1:03 AM, Daniel Stenberg <daniel_at_haxx.se> wrote:

> On Fri, 7 Aug 2009, Constantine Sapuntzakis wrote:
>
> Under Windows XP SP3, my app spins when using threaded async name
>> resolution with curl_multi. The dummy socket allocated in the threaded
>> resolver (hostthre.c) is always selectable for writing, so select always
>> returns instantly.
>>
>
> Strange. It does this _always_ ? I find it strange that nobody else has
> found or reported this before then as the code has been like this for quite
> some time now...
>
> The dummy socket was there because select() on Windows doesn't support
>> being passed no descriptors.
>>
>
> Well, it was more strictly a way to prevent the app from having to
> special-case the name resolver period. However, apps already need to do that
> since during name resolving when the default sync resolver is used on *nix
> we don't provide a "fake socket" so it gets maxfd -1 and select() on *nix
> platforms don't like that...
>
> The patch below allows an app to indicate to curl that it does not require
>> a dummy socket (CURLMOPT_NODUMMY). If the set of descriptors is empty then
>> the app will call a sleep function (e.g. SleepEx) instead.
>>
>
> If it busy-loops with the socket, is there really any particular reason to
> keep that broken code/behavior and only allow the nicer behavior with a new
> option?
>
> In other words: why not just rip out the dummy socket code?
>
> In addition, the patch implements polling for async resolv done starting
>> at 1ms interval and decaying expontentially up to 250ms intervals. It
>> retains the old behavior of not timing out name resolution.
>>
>
> That sounds... fair, assuming you can't make the resolver thread tell the
> main thread instead.
>
> The threaded resolver is KNOWN_BUG #64 already of course, so I'm a bit
> afraid that touching it will make it break even more.
>
> P.S. I considered an approach of using WSAWaitForMultipleEvents instead
>> off select and having the async name resolver call a user-defined callback
>> that woul do a QueueUserApc on the waiting thread to wake up the
>> WSAWaitForMultipleEvents when the name resolution was done. This has the
>> advantage of not polling but the disadvantage of requiring W2K or above and
>> rewriting my select-based to code to use
>> WSACreateEvent/WSAEventSelect/WSAWaitForMultipleEvents.
>>
>
> But XP is after W2K, right? Maybe that requirement isn't that bad if it
> makes the code better? And also, can't the code be made #ifdef'ed on the
> correct version so that you could still build with the old approach on older
> windows?
>
> I also considered using ARES but my app requires religiously following the
>> Windows DNS suffix rules for non-FQDN names.
>>
>
> Ah. We'd of course like c-ares to work as a proper replacement so we would
> like to have those things work in c-ares as well. Although of course
> "someone" needs to do the job... :-O
>
> --
>
> / daniel.haxx.se
>
Received on 2009-08-09