Bugs item #1403932, was opened at 2006-01-12 14:45
Message generated for change (Tracker Item Submitted) made by Item Submitter
You can respond by visiting:
https://sourceforge.net/tracker/?func=detail&atid=100976&aid=1403932&group_id=976
Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: compile or build problem
Group: crash
Status: Open
Resolution: None
Priority: 5
Submitted By: Andrew Benham (adsbenham)
Assigned to: Daniel Stenberg (bagder)
Summary: Race in test suite kills many processes
Initial Comment:
There's a nasty race condition in runtests.pl
For some reason, our Solaris 10 x86 build machine gets
hit by the race every time: instead of the test HTTP
server process being killed the code actually kills
the test HTTP server, runtests.pl, and our build system.
In the source for curl 7.15.1, the startnew() function
in runtests.pl reads:
....
my $count=5;
while($count--) {
if(-f $pidfile) {
open(PID, "<$pidfile");
$pid2 = 0 + <PID>;
close(PID);
if(kill(0, $pid2)) {
# make sure this pid is alive, as
otherwise it is just likely
# to be the _previous_ pidfile or similar!
last;
}
}
sleep(1);
}
return ($child, $pid2);
The intention being to return the PID of the server
just started.
Later the stopserver() function does a 'kill (9, pid)'
to stop the server.
However there's a race in startnew(). What if $pidfile
exists but hasn't been written to yet ? server/sws
writes the pidfile, with
fopen()
fprintf()
fclose()
If the scheduler runs between the fopen() and the
fprintf(), $pidfile exists but is an empty file.
The startnew() code reads from the empty file, adds 0
to undef (giving 0), and returns this as the pid.
Later, in stopserver():
sub stopserver {
my ($pid) = @_;
if($pid <= 0) {
return; # this is not a good pid
}
if($pid =~ / /) {
# if it contains space, it might be more than
one pid
my @pids = split(" ", $pid);
for (@pids) {
kill (9, $_); # die!
}
}
my $res = kill (9, $pid); # die!
if($verbose) {
logmsg "RUN: Test server pid $pid signalled to
die\n";
}
}
Now, if $pid is "1234 0" for example, we'll do a valid
kill(9, 1234), and then a'kill (9,0)' - which (on
Unix systems) sends SIGKILL to all processes in the
current process group! It doesn't kill the server, it
kills the runtests.pl process, whatever process called
that, whatever called that, etc ......
This is bad.
The fix is to startnew(), lines marked '*':
....
my $count=5;
while($count--) {
if(-f $pidfile) {
open(PID, "<$pidfile");
* $pid2 = <PID>;
close(PID);
* if($pid2 && kill(0, $pid2)) {
# make sure this pid is alive, as
otherwise it is just likely
# to be the _previous_ pidfile or similar!
last;
}
}
sleep(1);
}
....
So we only accept the contents of $pidfile if it is
true (i.e. not blank, and not zero). This gets around
the race, returns the true pid of the server process,
and we don't end up killing the process group.
----------------------------------------------------------------------
You can respond by visiting:
https://sourceforge.net/tracker/?func=detail&atid=100976&aid=1403932&group_id=976
Received on 2006-01-12