cURL / Mailing Lists / curl-library / Single Mail

curl-library

Re: problem with NSS backend and threads

From: Kamil Dudka <kdudka_at_redhat.com>
Date: Thu, 08 Sep 2016 08:18:29 +0200

On Wednesday, September 07, 2016 21:07:25 K. Frank wrote:
> Hi Kamil!
>
> On Mon, Aug 22, 2016 at 9:27 AM, Kamil Dudka <kdudka_at_redhat.com> wrote:
> > On Thursday, August 18, 2016 11:45:24 Peter Wang wrote:
> >> ...
> >> which is due to this branch in nssSlot_IsTokenPresent()
> >>
> >> /* avoid repeated calls to check token status within set interval */
> >> if (within_token_delay_period(slot)) {
> >>
> >> return ((slot->ckFlags & CKF_TOKEN_PRESENT) != 0);
> >>
> >> }
> >>
> >> Peter
> >
> > Thank you for analyzing the cause of it! After having a brief look at the
> > code, I believe the race condition is obvious:
> >
> > PK11_FindSlotByName() acquires only a "read" lock, which does not ensure
> > mutual exclusion of readers. So multiple calls of PK11_FindSlotByName()
> > can be executed in parallel. As there is no additional lock on the way,
> > nssSlot_IsTokenPresent() can also be executed in parallel.
> >
> > If there is a schedule where one thread has assigned slot->lastTokenPing
> > but not yet set the CKF_TOKEN_PRESENT flag in slot->ckFlags, all other
> > threads will keep returning SEC_ERROR_NO_TOKEN (without actually checking
> > presence of the token) until the first thread continues (or the specified
> > timeout elapses).
> >
> > In order to fix it, we need to add a mutex to nssSlot_IsTokenPresent() or
> > at least reorder mutation of slot's state such that slot->lastTokenPing is
> > not updated before the CKF_TOKEN_PRESENT flag in slot->ckFlags is updated.
>
> A quick caution:
>
> (First, a disclaimer: I have not looked at the code, and I am making some
> assumptions about the details of the code and what you mean by "reorder
> mutation." That is, I am guessing what you mean, and I apologize if I have
> guessed wrongly.)
>
> Simply requiring -- in the source code -- that (pseudocode):
>
> slot->ckFlags.CKF_TOKEN_PRESENT = 1; // this happens first
> slot->lastTokenPing = new_value; // this happens second
>
> will not eliminate your race condition in a multithreaded environment. This
> is because a second thread is not guaranteed to see these two updates in
> the desired order.

Yes. That is basically what the comment in the proposed patch [1] is about:

@@ -201,8 +205,11 @@
     if (nssrv != PR_SUCCESS) {
         slot->token->base.name[0] = 0; /* XXX */
         slot->ckFlags &= ~CKF_TOKEN_PRESENT;
+ /* TODO: insert a barrier here to avoid reordering of the assignments */
+ slot->lastTokenPing = PR_IntervalNow();
         return PR_FALSE;
     }
+ slot->lastTokenPing = PR_IntervalNow();
     return PR_TRUE;
 }

So you are correct. The patch does not guarantee the ordering in all
environments in all possible situations. But compared to status quo (where
the assignments are far away from each other in a wrong order in the source
code already), it makes the spurious failure very unlikely to happen.

Kamil

[1] https://bug1297397.bmoattachments.org/attachment.cgi?id=8787677

> In more detail, this is because, although the sequence points at the end of
> these statements "guarantee" that the first statement happens before the
> second, the as-if rule permits the compiler (and hardware) to reorder or
> even optimize away these statements as long as the "observable behavior" of
> the program doesn't change ***when observed from the standpoint of the
> single-threaded "abstract machine"***.
>
> The language does not guarantee that a second thread will see this same
> ordering, and, in the real world, many compilers will emit code (and
> hardware will process that code in a way) such that a second thread will
> see a different ordering of the two assignments.
>
> The only (portable) way to fix such a race condition is to use some sort of
> explicit thread synchronization, be it a mutex, finer-grained locks, memory
> barriers, etc.
>
> The simplest way to do this would be your suggestion of using a mutex.
> Unless this code is in a highly efficiency-critical location, that's
> what I would
> do. If efficiency is a big concern, you could probably eliminate some of
> the overhead of a mutex by using memory barriers, but that tends to be
> trickier.
>
> I don't mean to belabor this, and I apologize if I've misunderstood what you
> were saying. I just wanted to point out that merely reordering in the
> source code the updates of the variables that make up the slot's state
> won't fix the race condition (absent some additional thread
> synchronization). And, as I'm sure you know, these kinds of race-condition
> bugs can lay dormant for quite a while and be missed by even rather
> thorough testing, so it's worth taking care to not let the bug sneak back
> in.
>
> > ...
> > Kamil
>
> Happy Multi-Threaded Hacking!
>
>
> K. Frank
> -------------------------------------------------------------------
> List admin: https://cool.haxx.se/list/listinfo/curl-library
> Etiquette: https://curl.haxx.se/mail/etiquette.html
-------------------------------------------------------------------
List admin: https://cool.haxx.se/list/listinfo/curl-library
Etiquette: https://curl.haxx.se/mail/etiquette.html
Received on 2016-09-08