Re: A tale of two SPARC servers wherein one gets wedged on the testsuite
Date: Fri, 16 Oct 2015 03:32:37 -0400 (EDT)
> > The compile goes smooth but the tests get wedged .. badly. However
> > only
> > at test 575 and only on this old SPARC server.
Hello Rainer. There are so few of us doing reasonable work with open
source on SPARC systems that I now recognize your name in various
projects. Like GCC for example but lets just stay on Curl for the moment
> Just to give another data point: on our also old but stable 1280 MHz
> UltraSPARC-IIIi system (8GB Memory), the test suite runs 431 seconds
> ("TESTDONE: 994 tests were considered during 431 seconds.") on a local
> file system.
I had similar results on a Niagara/Fujitsu system.
> The "remaining" output around test 575 does not indicate, that this
> took especially long.
> A closer look reveals we are building differently, e.g. we are using
> OpenSSL instead of GnuTLS and we are not using libidn etc.
Actually I am using OpenSSL here. At least I thought I was. I better
take a closer look at this second system :-\
> How much worse got it compared to the latest release you had build
> before? Could it be, that the system reached a resource bottleneck and
> started paging? Maybe it would make sense to look at "sar" output to
> check CPU consumption, I/O and paging during the long time test 575
> processing or hanging?
I was running prstat for a while and saw that the system was 97% idle.
No particular load of any sort and certainly no scan-rate for free pages
in memory via vmstat. It just looked stalled. Very strange. Even truss
showed me nothing happening. I mean really nothing.
> Did you try to check, whether the test made
> progress during tha time by looking at the running processes?
Yes .. I actually watched it for a few hours. Sort of like watching
paint dry to be honest except the paint would be more exciting :-)
> If you can
> reproduce, you coud use pstack to check the stack, the
> hanging/looping/whatever process is in.
I started over from the beginning and did a fresh extract from sources,
configure, build and just ran the testsuite. Results are slightly
different from the Fujitsu/Niagara server :
test 2039...OK (978 out of 984, remaining: 00:06)
test 2040...OK (979 out of 984, remaining: 00:05)
test 2044...OK (983 out of 984, remaining: 00:01)
OK (984 out of 984, remaining: 00:00)
TESTDONE: 748 tests out of 750 reported OK: 99%
TESTFAIL: These test cases failed: 1060 1061
TESTDONE: 994 tests were considered during 996 seconds.
gmake: *** [quiet-test] Error 1
gmake: Leaving directory
gmake: *** [test] Error 2
Total time 1010 secs but look at the user and sys time. Zero?
I don't even know how that is possible.
This has been a strange say to be sure and I feel like starting a whole
new toolchain rebuild on this server. I have been very careful and
ultimately I was going to get to GCC 5.2.0 and even Apache 2.14.x but as
you know OpenSSL and Curl are darn critical pieces.
I am scratching my head and thinking what odd things are happening here.
List admin: http://cool.haxx.se/list/listinfo/curl-library
Received on 2015-10-16