curl / Mailing Lists / curl-library / Single Mail

curl-library

Re: Parallelizing tests

From: Dan Fandrich via curl-library <curl-library_at_cool.haxx.se>
Date: Mon, 1 Oct 2018 14:52:54 +0200

On Mon, Oct 01, 2018 at 11:58:25AM +0200, Daniel Gustafsson via curl-library wrote:
> I was poking a little at parallelizing the test suite in order to try and shave
> some time off the total runtime. But before sinking time into that I wanted to
> ask if there are/have been any other attempts at this? Has anyone hacked on
> this and if so, are there any learnings that can be shared?

I have a local branch I started after the discussions during curl://up Nürnberg
on the topic. I got 80% of the way to running tests on different protocols in
parallel (I could do it but it required manually starting two test harnesses
and selecting nonconflicting test ranges). Most of the changes involved making
each test independent in its use of input and output files so there would be no
conflicts at run-time. The test code has changed in the 1.5 years since, so it
would be a bit of work to rebase it all to the current code. In retrospect, I
probably should have checked in each part as it was complete, but since few of
the changes on the way really helped improve curl without the entire thing
being in place, I didn't.

The other problem with the approach I was working on, namely, parallelizing by
protocol but coordinating all the tests from a single test harness with a
single set of test server (as it is mostly done today), is that the speedup
would be limited. While it would be relatively straightforward to implement,
you wouldn't see more than a 2× speedup since more than half the tests involve
a single protocol, HTTP. A slightly different approach would involve starting N
entire test harnesses in parallel, each responsible for its own suite of test
servers running on its own range of test ports and running in its own
tests/log/ directory. Since nothing would be shared between the test servers
besides the input files, there would be no limit to the number of test
harnesses (and therefore parallel tests) that could be run at once.

If I were looking at the problem again, that's the approach I would take. It
might even end up requiring fewer changes than the approach I was working on.
To give a quick sketch, the test harness would open a pipe and fork() itself
fairly soon after startup, putting the forked copies into slave mode and
listening on the pipe for instructions. The master test harness would determine
the tests to run as it does today, but rather than running them itself, it
would send a message to one of the slaves that would actually run the test. The
master would gather the result of each test run and display it to the user as
it completes. The user wouldn't notice any difference except that the order of
tests would no longer necessarily appear sequentially (a faster test result
would be shown before a slower one that starts at the same time).

Some of the infrastructure changes I made would be really useful in this
approach as well, as I refactored the main test loop to make such a model easy
to fit in. I can publish my branch somewhere if someone would find it useful,
or even rebase the more generally-useful useful cleanups if you're not in a
hurry.

>>> Dan
-------------------------------------------------------------------
Unsubscribe: https://cool.haxx.se/list/listinfo/curl-library
Etiquette: https://curl.haxx.se/mail/etiquette.html
Received on 2018-10-01