Buy commercial curl support from WolfSSL. We help you work
out your issues, debug your libcurl applications, use the API, port to new
platforms, add new features and more. With a team lead by the curl founder
himself.
Re: Experimenting with parallel tests on Debian
- Contemporary messages sorted: [ by date ] [ by thread ] [ by subject ] [ by author ] [ by messages with attachments ]
From: Dan Fandrich via curl-library <curl-library_at_lists.haxx.se>
Date: Fri, 12 Jan 2024 03:38:31 -0800
On Thu, Jan 11, 2024 at 10:44:42AM -0300, Samuel Henrique via curl-library wrote:
> I have recently pushed an experimental build of curl with parallel test
> execution on Debian. This was done with the hopes of helping reporting issues
> and understanding if it's feasible to enable it for non-experimental builds.
>
> We have quite a diverse set of supported architectures and different build
> hosts[0].
>
> There were a few failures that went away after retries. I have not done any
> investigation other than noting the failed tests were not always the same and
> at least one failure occurred on a host with a high number of CPU threads (16,
> high-ish for non-server standards nowadays).
You've discovered why we haven't turned on parallel tests by default yet.
They're quite reliable when run on an unloaded machine, such as a developer's
PC, but CI and build machines (especially in the free CI tiers) tend to be
heavily oversubscribed. This results in highly variable timing and task
scheduling, and, unfortunately, some of the tests are fairly sensitive to this.
Some of the worst ones have keywords "flaky" and "timing-dependent" so they can
be easily skipped if desired.
There are a couple of classes of issues still left in the tests, that if
solved, would eliminate some timing dependencies and make them more reliable.
For example, one of them has to do with sending data immediately before closing
a connection, which tends to make the final "QUIT" command in ftp tests
disappear. The reason for most of these are hard to figure out though, given
that they almost never fail locally when you try (although icing has a theory
about this particular one)>
> All the builds were done following the suggestion of using 7 workers per CPU
> thread [1] and without valgrind.
>
> Do note that I did not try a lower number of workers and I'm only sending this
> in case someone is interested in finding possible bugs. I plan to keep testing
> future releases and me or someone else from Debian might report something more
> concrete in the future.
I've found reducing the number of workers makes things better, but even at only
2 workers, you still see failures on the most oversubscribed hosts. If someone
could figure out how to consistently make a/some tests fail locally, it would
go a long way toward finding and fixing the cause.
Dan
Date: Fri, 12 Jan 2024 03:38:31 -0800
On Thu, Jan 11, 2024 at 10:44:42AM -0300, Samuel Henrique via curl-library wrote:
> I have recently pushed an experimental build of curl with parallel test
> execution on Debian. This was done with the hopes of helping reporting issues
> and understanding if it's feasible to enable it for non-experimental builds.
>
> We have quite a diverse set of supported architectures and different build
> hosts[0].
>
> There were a few failures that went away after retries. I have not done any
> investigation other than noting the failed tests were not always the same and
> at least one failure occurred on a host with a high number of CPU threads (16,
> high-ish for non-server standards nowadays).
You've discovered why we haven't turned on parallel tests by default yet.
They're quite reliable when run on an unloaded machine, such as a developer's
PC, but CI and build machines (especially in the free CI tiers) tend to be
heavily oversubscribed. This results in highly variable timing and task
scheduling, and, unfortunately, some of the tests are fairly sensitive to this.
Some of the worst ones have keywords "flaky" and "timing-dependent" so they can
be easily skipped if desired.
There are a couple of classes of issues still left in the tests, that if
solved, would eliminate some timing dependencies and make them more reliable.
For example, one of them has to do with sending data immediately before closing
a connection, which tends to make the final "QUIT" command in ftp tests
disappear. The reason for most of these are hard to figure out though, given
that they almost never fail locally when you try (although icing has a theory
about this particular one)>
> All the builds were done following the suggestion of using 7 workers per CPU
> thread [1] and without valgrind.
>
> Do note that I did not try a lower number of workers and I'm only sending this
> in case someone is interested in finding possible bugs. I plan to keep testing
> future releases and me or someone else from Debian might report something more
> concrete in the future.
I've found reducing the number of workers makes things better, but even at only
2 workers, you still see failures on the most oversubscribed hosts. If someone
could figure out how to consistently make a/some tests fail locally, it would
go a long way toward finding and fixing the cause.
Dan
-- Unsubscribe: https://lists.haxx.se/mailman/listinfo/curl-library Etiquette: https://curl.se/mail/etiquette.htmlReceived on 2024-01-12