autotools: reduce brute-force when detecting recv/send arg list #9591

vszakats · 2022-09-25T11:51:50Z

autotools uses brute-force to detect recv/send/select argument lists, by interating through all argument type combinations on each ./configure run. This logic exists since 01fa02d (from 2006) and was a bit later extended with Windows support.

This results in a worst-case number of compile + link cycles as below:

recv: 96
send: 192
select: 60
Total: 348 (the number of curl C source files is 195, for comparison)

Notice that e.g. curl-for-win autotools builds require two ./configure invocations, doubling these numbers.

recv on Windows was especially unlucky because SOCKET (the correct choice there) was listed last in one of the outer trial loops. This resulted in lengthy waits while autotools was trying all invalid combinations first, wasting cycles, disk writes and slowing down iteration.

This patch reduces the amount of idle work by reordering the tests in a way to succeed first on a well-known platform such as Windows, and also on non-Windows by testing for POSIX prototypes first, on the assumption that these are the most likely candidates these days. (We do not touch select, where the order was already optimal for these platforms.)

For non-Windows, this means to try a return value of ssize_t first, then int, reordering the buffer argument type to try void * first, then byte *, and prefer the const flavor with send. If we are here, also stop testing for SOCKET type in non-Windows builds.

After the patch, detection on Windows is instantaneous. It should also be faster on popular platforms such as Linux and BSD-based ones.

If there are known-good variations for other platforms, they can also be fast-tracked like above, given a way to check for that platform inside the autotools logic.

Closes #9591

autotools uses brute-force to detect recv/send/select argument lists, by interating through _all_ argument type combinations on each ./configure run. This logic exists since 01fa02d (from 2006) and was a bit later extended with Windows support. This results in a worst-case number of compile + link cycles as below: - `recv`: 96 - `send`: 192 - `select`: 60 Total: 348 (the number of curl C source files is 195, for comparison) Notice that e.g. curl-for-win autotools builds require two ./configure invocations, doubling these numbers. `recv()` on Windows was especially unlucky because `SOCKET` (the correct choice there) was listed last in one of the outer trial loops. This resulted in lengthy waits while autotools was trying all invalid combinations first, wasting cycles, disk writes and slowing down iteration. This patch reduces the amount of idle work, by reordering the tests in a way to succeed first on a well-known platform such as Windows, and also on non-Windows by testing for POSIX prototypes first, on the assumption that these are the most likely candidates these days. (Except for `select`, where the order was already optimal for these platforms.) For non-Windows, this means to try a return value of `ssize_t` first, then `int`, reordering the buffer argument type to try `void *` first, then `byte *`, and prefer the `const` flavor with `send`. After the patch, detection on Windows is instantaneous. It should also be faster on popular platforms such as Linux and BSD-based ones. Closes #xxxx

bagder

Nice work!

I have thought about ripping this out completely from cmake/autoconf and instead replace it with a basic #ifdef sequence but it hasn't become more than thoughts in my head so I think this is a step forward!

vszakats · 2022-09-25T21:46:02Z

Thanks! Ripping it out would definitely be the best. Perf is fine now, but there is a significant complexity in the code and build systems around this, most of it probably overkill.

Data is in: The patch shaved off almost 5 minutes from curl-for-win builds. A 10% improvement.

with patch: https://ci.appveyor.com/project/curlorg/curl-for-win/builds/44877232
without: https://ci.appveyor.com/project/curlorg/curl-for-win/builds/44876993

It reduced CI run times by almost 5 minutes, or 10%. with patch: https://ci.appveyor.com/project/curlorg/curl-for-win/builds/44877232 without: https://ci.appveyor.com/project/curlorg/curl-for-win/builds/44876993 Ref: curl/curl@6adcff6 Ref: curl/curl#9591

bagder · 2022-09-26T06:46:50Z

I took a first step in #9592. I imagine it might need a while and polish before we can switch.

Switch from `curl-gnumake.sh` to `curl-cmake.sh` with upcoming curl release v8.5.0. cmake builds are now _faster_ for Windows builds than raw gnumake (aka `Makefile.mk`). They also use 'unity' mode, which makes builds run fast with the side-effect of also creating potentially more efficient binaries by allowing better compiler optimizations. This also makes curl-for-win use the same build system for all target platforms (`Makefile.mk` is not available for *nix platforms). Initially on 2022-07-04 cmake was 25% slower than gnumake. By 2022-09-26 this reduced to 20%, by 2023-07-29 to 18% and after the latest round of updates gnumake came out 7% slower than cmake. This is for Windows, for a triple x64, arm64 and x86 build. In absolute times this is 27m22s for cmake and 29m11s for gnumake. (FWIW autotools builds are 52% slower than cmake unity builds now.) Making cmake builds fast was a multi-year effort with these major steps: 1. add support for cmake builds in curl-for-win. 420e73c 2. bring it in-line with gnumake and autotools builds and tidy-up as much as possible. Scattered to a many commits. 3. delete a whole bunch of unused feature detections. curl/curl@4d73854 curl/curl#9044 (and a lot more smaller commits) 4. speed up picky warning option initialization by avoiding brute-force all options. (first in libssh2, then in curl, then applied also ngtcp2, nghttp3, nghttp2) curl/curl@9c543de curl/curl#10973 5. implement single build run for shared and static libcurl + tool (first in libssh2) curl/curl@1199308 curl/curl#11505 53dcd49 6. implement single build pass for shared and static libcurl (for Windows initially) curl/curl@2ebc74c curl/curl#11546 7. extend above to non-Windows (e.g. macOS) curl/curl@fc9bfb1 curl/curl#11627 bafa77d (mac) 1b27304 (linux) 8. implement unity builds: single compiler invocation for libcurl + tool curl/curl@3f8fc25 curl/curl#11095 curl/curl@f42a279 curl/curl#11928 67d7fd3 9. speed up 4x the cmake initialization phase (for Windows) curl/curl@2100d9f curl/curl#12044 10. speed up cmake initialization even more by pre-filling detection results for our well-known mingw-w64 environments. 74dd967 5a43c61 +1: speed up autotools initialization by mitigating a brute-force feature detection on Windows. This reduced total build times by 5 minutes at the time, for the 3 Windows targets in total. curl/curl@6adcff6 curl/curl#9591 Also update build times for cmake-unity and gnumake, based on runs: cmake-unity: https://ci.appveyor.com/project/curlorg/curl-for-win/builds/48357875 gnumake: https://ci.appveyor.com/project/curlorg/curl-for-win/builds/48358005

vszakats added build performance labels Sep 25, 2022

bagder approved these changes Sep 25, 2022

View reviewed changes

vszakats closed this in 6adcff6 Sep 25, 2022

vszakats deleted the autotools-reduce-brute-force branch September 25, 2022 21:59

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

autotools: reduce brute-force when detecting recv/send arg list #9591

autotools: reduce brute-force when detecting recv/send arg list #9591

vszakats commented Sep 25, 2022 •

edited

bagder left a comment

vszakats commented Sep 25, 2022

bagder commented Sep 26, 2022

autotools: reduce brute-force when detecting recv/send arg list #9591

autotools: reduce brute-force when detecting recv/send arg list #9591

Conversation

vszakats commented Sep 25, 2022 • edited

bagder left a comment

Choose a reason for hiding this comment

vszakats commented Sep 25, 2022

bagder commented Sep 26, 2022

vszakats commented Sep 25, 2022 •

edited