New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
autotools: reduce brute-force when detecting recv/send arg list #9591
Conversation
autotools uses brute-force to detect recv/send/select argument lists, by interating through _all_ argument type combinations on each ./configure run. This logic exists since 01fa02d (from 2006) and was a bit later extended with Windows support. This results in a worst-case number of compile + link cycles as below: - `recv`: 96 - `send`: 192 - `select`: 60 Total: 348 (the number of curl C source files is 195, for comparison) Notice that e.g. curl-for-win autotools builds require two ./configure invocations, doubling these numbers. `recv()` on Windows was especially unlucky because `SOCKET` (the correct choice there) was listed last in one of the outer trial loops. This resulted in lengthy waits while autotools was trying all invalid combinations first, wasting cycles, disk writes and slowing down iteration. This patch reduces the amount of idle work, by reordering the tests in a way to succeed first on a well-known platform such as Windows, and also on non-Windows by testing for POSIX prototypes first, on the assumption that these are the most likely candidates these days. (Except for `select`, where the order was already optimal for these platforms.) For non-Windows, this means to try a return value of `ssize_t` first, then `int`, reordering the buffer argument type to try `void *` first, then `byte *`, and prefer the `const` flavor with `send`. After the patch, detection on Windows is instantaneous. It should also be faster on popular platforms such as Linux and BSD-based ones. Closes #xxxx
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice work!
I have thought about ripping this out completely from cmake/autoconf and instead replace it with a basic #ifdef
sequence but it hasn't become more than thoughts in my head so I think this is a step forward!
Thanks! Ripping it out would definitely be the best. Perf is fine now, but there is a significant complexity in the code and build systems around this, most of it probably overkill. Data is in: The patch shaved off almost 5 minutes from curl-for-win builds. A 10% improvement. with patch: https://ci.appveyor.com/project/curlorg/curl-for-win/builds/44877232 |
It reduced CI run times by almost 5 minutes, or 10%. with patch: https://ci.appveyor.com/project/curlorg/curl-for-win/builds/44877232 without: https://ci.appveyor.com/project/curlorg/curl-for-win/builds/44876993 Ref: curl/curl@6adcff6 Ref: curl/curl#9591
I took a first step in #9592. I imagine it might need a while and polish before we can switch. |
Switch from `curl-gnumake.sh` to `curl-cmake.sh` with upcoming curl release v8.5.0. cmake builds are now _faster_ for Windows builds than raw gnumake (aka `Makefile.mk`). They also use 'unity' mode, which makes builds run fast with the side-effect of also creating potentially more efficient binaries by allowing better compiler optimizations. This also makes curl-for-win use the same build system for all target platforms (`Makefile.mk` is not available for *nix platforms). Initially on 2022-07-04 cmake was 25% slower than gnumake. By 2022-09-26 this reduced to 20%, by 2023-07-29 to 18% and after the latest round of updates gnumake came out 7% slower than cmake. This is for Windows, for a triple x64, arm64 and x86 build. In absolute times this is 27m22s for cmake and 29m11s for gnumake. (FWIW autotools builds are 52% slower than cmake unity builds now.) Making cmake builds fast was a multi-year effort with these major steps: 1. add support for cmake builds in curl-for-win. 420e73c 2. bring it in-line with gnumake and autotools builds and tidy-up as much as possible. Scattered to a many commits. 3. delete a whole bunch of unused feature detections. curl/curl@4d73854 curl/curl#9044 (and a lot more smaller commits) 4. speed up picky warning option initialization by avoiding brute-force all options. (first in libssh2, then in curl, then applied also ngtcp2, nghttp3, nghttp2) curl/curl@9c543de curl/curl#10973 5. implement single build run for shared and static libcurl + tool (first in libssh2) curl/curl@1199308 curl/curl#11505 53dcd49 6. implement single build pass for shared and static libcurl (for Windows initially) curl/curl@2ebc74c curl/curl#11546 7. extend above to non-Windows (e.g. macOS) curl/curl@fc9bfb1 curl/curl#11627 bafa77d (mac) 1b27304 (linux) 8. implement unity builds: single compiler invocation for libcurl + tool curl/curl@3f8fc25 curl/curl#11095 curl/curl@f42a279 curl/curl#11928 67d7fd3 9. speed up 4x the cmake initialization phase (for Windows) curl/curl@2100d9f curl/curl#12044 10. speed up cmake initialization even more by pre-filling detection results for our well-known mingw-w64 environments. 74dd967 5a43c61 +1: speed up autotools initialization by mitigating a brute-force feature detection on Windows. This reduced total build times by 5 minutes at the time, for the 3 Windows targets in total. curl/curl@6adcff6 curl/curl#9591 Also update build times for cmake-unity and gnumake, based on runs: cmake-unity: https://ci.appveyor.com/project/curlorg/curl-for-win/builds/48357875 gnumake: https://ci.appveyor.com/project/curlorg/curl-for-win/builds/48358005
autotools uses brute-force to detect
recv
/send
/select
argument lists, by interating through all argument type combinations on each./configure
run. This logic exists since 01fa02d (from 2006) and was a bit later extended with Windows support.This results in a worst-case number of compile + link cycles as below:
recv
: 96send
: 192select
: 60Total: 348 (the number of curl C source files is 195, for comparison)
Notice that e.g. curl-for-win autotools builds require two
./configure
invocations, doubling these numbers.recv
on Windows was especially unlucky becauseSOCKET
(the correct choice there) was listed last in one of the outer trial loops. This resulted in lengthy waits while autotools was trying all invalid combinations first, wasting cycles, disk writes and slowing down iteration.This patch reduces the amount of idle work by reordering the tests in a way to succeed first on a well-known platform such as Windows, and also on non-Windows by testing for POSIX prototypes first, on the assumption that these are the most likely candidates these days. (We do not touch
select
, where the order was already optimal for these platforms.)For non-Windows, this means to try a return value of
ssize_t
first, thenint
, reordering the buffer argument type to tryvoid *
first, thenbyte *
, and prefer theconst
flavor withsend
. If we are here, also stop testing forSOCKET
type in non-Windows builds.After the patch, detection on Windows is instantaneous. It should also be faster on popular platforms such as Linux and BSD-based ones.
If there are known-good variations for other platforms, they can also be fast-tracked like above, given a way to check for that platform inside the autotools logic.
Closes #9591