cURL / Mailing Lists / curl-library / Single Mail


Re: A tale of two SPARC servers wherein one gets wedged on the testsuite

From: dev <>
Date: Fri, 16 Oct 2015 03:32:37 -0400 (EDT)

> > The compile goes smooth but the tests get wedged .. badly. However
> > only
> > at test 575 and only on this old SPARC server.

Hello Rainer. There are so few of us doing reasonable work with open
source on SPARC systems that I now recognize your name in various
projects. Like GCC for example but lets just stay on Curl for the moment

> Just to give another data point: on our also old but stable 1280 MHz
> UltraSPARC-IIIi system (8GB Memory), the test suite runs 431 seconds
> ("TESTDONE: 994 tests were considered during 431 seconds.") on a local
> file system.

I had similar results on a Niagara/Fujitsu system.

> The "remaining" output around test 575 does not indicate, that this
> test
> took especially long.

I agree.

> A closer look reveals we are building differently, e.g. we are using
> OpenSSL instead of GnuTLS and we are not using libidn etc.

Actually I am using OpenSSL here. At least I thought I was. I better
take a closer look at this second system :-\

> How much worse got it compared to the latest release you had build
> before? Could it be, that the system reached a resource bottleneck and
> started paging? Maybe it would make sense to look at "sar" output to
> check CPU consumption, I/O and paging during the long time test 575
> was
> processing or hanging?

I was running prstat for a while and saw that the system was 97% idle.
No particular load of any sort and certainly no scan-rate for free pages
in memory via vmstat. It just looked stalled. Very strange. Even truss
showed me nothing happening. I mean really nothing.

> Did you try to check, whether the test made
> progress during tha time by looking at the running processes?

Yes .. I actually watched it for a few hours. Sort of like watching
paint dry to be honest except the paint would be more exciting :-)

> If you can
> reproduce, you coud use pstack to check the stack, the
> hanging/looping/whatever process is in.

I started over from the beginning and did a fresh extract from sources,
configure, build and just ran the testsuite. Results are slightly
different from the Fujitsu/Niagara server :

test 2039...OK (978 out of 984, remaining: 00:06)
test 2040...OK (979 out of 984, remaining: 00:05)
test 2044...OK (983 out of 984, remaining: 00:01)
test 2045...Terminated
OK (984 out of 984, remaining: 00:00)
TESTDONE: 748 tests out of 750 reported OK: 99%
TESTFAIL: These test cases failed: 1060 1061
TESTDONE: 994 tests were considered during 996 seconds.
gmake[1]: *** [quiet-test] Error 1
gmake[1]: Leaving directory
gmake: *** [test] Error 2

real 1009.91
user 0.00
sys 0.00

Total time 1010 secs but look at the user and sys time. Zero?

I don't even know how that is possible.

This has been a strange say to be sure and I feel like starting a whole
new toolchain rebuild on this server. I have been very careful and
ultimately I was going to get to GCC 5.2.0 and even Apache 2.14.x but as
you know OpenSSL and Curl are darn critical pieces.

I am scratching my head and thinking what odd things are happening here.

List admin:
Received on 2015-10-16