Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

autotools: reduce brute-force when detecting recv/send arg list #9591

Closed
wants to merge 1 commit into from

Conversation

vszakats
Copy link
Member

@vszakats vszakats commented Sep 25, 2022

autotools uses brute-force to detect recv/send/select argument lists, by interating through all argument type combinations on each ./configure run. This logic exists since 01fa02d (from 2006) and was a bit later extended with Windows support.

This results in a worst-case number of compile + link cycles as below:

  • recv: 96
  • send: 192
  • select: 60
    Total: 348 (the number of curl C source files is 195, for comparison)

Notice that e.g. curl-for-win autotools builds require two ./configure invocations, doubling these numbers.

recv on Windows was especially unlucky because SOCKET (the correct choice there) was listed last in one of the outer trial loops. This resulted in lengthy waits while autotools was trying all invalid combinations first, wasting cycles, disk writes and slowing down iteration.

This patch reduces the amount of idle work by reordering the tests in a way to succeed first on a well-known platform such as Windows, and also on non-Windows by testing for POSIX prototypes first, on the assumption that these are the most likely candidates these days. (We do not touch select, where the order was already optimal for these platforms.)

For non-Windows, this means to try a return value of ssize_t first, then int, reordering the buffer argument type to try void * first, then byte *, and prefer the const flavor with send. If we are here, also stop testing for SOCKET type in non-Windows builds.

After the patch, detection on Windows is instantaneous. It should also be faster on popular platforms such as Linux and BSD-based ones.

If there are known-good variations for other platforms, they can also be fast-tracked like above, given a way to check for that platform inside the autotools logic.

Closes #9591

autotools uses brute-force to detect recv/send/select argument lists, by
interating through _all_ argument type combinations on each ./configure
run. This logic exists since 01fa02d
(from 2006) and was a bit later extended with Windows support.

This results in a worst-case number of compile + link cycles as below:
- `recv`: 96
- `send`: 192
- `select`: 60
Total: 348 (the number of curl C source files is 195, for comparison)

Notice that e.g. curl-for-win autotools builds require two ./configure
invocations, doubling these numbers.

`recv()` on Windows was especially unlucky because `SOCKET` (the correct
choice there) was listed last in one of the outer trial loops. This
resulted in lengthy waits while autotools was trying all invalid
combinations first, wasting cycles, disk writes and slowing down
iteration.

This patch reduces the amount of idle work, by reordering the tests in
a way to succeed first on a well-known platform such as Windows, and
also on non-Windows by testing for POSIX prototypes first, on the
assumption that these are the most likely candidates these days. (Except
for `select`, where the order was already optimal for these platforms.)

For non-Windows, this means to try a return value of `ssize_t` first,
then `int`, reordering the buffer argument type to try `void *` first,
then `byte *`, and prefer the `const` flavor with `send`.

After the patch, detection on Windows is instantaneous. It should also
be faster on popular platforms such as Linux and BSD-based ones.

Closes #xxxx
Copy link
Member

@bagder bagder left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice work!

I have thought about ripping this out completely from cmake/autoconf and instead replace it with a basic #ifdef sequence but it hasn't become more than thoughts in my head so I think this is a step forward!

@vszakats
Copy link
Member Author

Thanks! Ripping it out would definitely be the best. Perf is fine now, but there is a significant complexity in the code and build systems around this, most of it probably overkill.

Data is in: The patch shaved off almost 5 minutes from curl-for-win builds. A 10% improvement.

with patch: https://ci.appveyor.com/project/curlorg/curl-for-win/builds/44877232
without: https://ci.appveyor.com/project/curlorg/curl-for-win/builds/44876993

@vszakats vszakats closed this in 6adcff6 Sep 25, 2022
@vszakats vszakats deleted the autotools-reduce-brute-force branch September 25, 2022 21:59
@bagder
Copy link
Member

bagder commented Sep 26, 2022

I took a first step in #9592. I imagine it might need a while and polish before we can switch.

vszakats added a commit to curl/curl-for-win that referenced this pull request Oct 25, 2023
Switch from `curl-gnumake.sh` to `curl-cmake.sh` with upcoming curl
release v8.5.0.

cmake builds are now _faster_ for Windows builds than raw gnumake
(aka `Makefile.mk`). They also use 'unity' mode, which makes builds
run fast with the side-effect of also creating potentially more
efficient binaries by allowing better compiler optimizations.

This also makes curl-for-win use the same build system for all target
platforms (`Makefile.mk` is not available for *nix platforms).

Initially on 2022-07-04 cmake was 25% slower than gnumake. By
2022-09-26 this reduced to 20%, by 2023-07-29 to 18% and after the
latest round of updates gnumake came out 7% slower than cmake.
This is for Windows, for a triple x64, arm64 and x86 build. In
absolute times this is 27m22s for cmake and 29m11s for gnumake.

(FWIW autotools builds are 52% slower than cmake unity builds now.)

Making cmake builds fast was a multi-year effort with these major steps:

1. add support for cmake builds in curl-for-win.
   420e73c
2. bring it in-line with gnumake and autotools builds and tidy-up
   as much as possible. Scattered to a many commits.
3. delete a whole bunch of unused feature detections.
   curl/curl@4d73854
   curl/curl#9044
   (and a lot more smaller commits)
4. speed up picky warning option initialization by avoiding brute-force
   all options. (first in libssh2, then in curl, then applied also
   ngtcp2, nghttp3, nghttp2)
   curl/curl@9c543de
   curl/curl#10973
5. implement single build run for shared and static libcurl + tool
   (first in libssh2)
   curl/curl@1199308
   curl/curl#11505
   53dcd49
6. implement single build pass for shared and static libcurl
   (for Windows initially)
   curl/curl@2ebc74c
   curl/curl#11546
7. extend above to non-Windows (e.g. macOS)
   curl/curl@fc9bfb1
   curl/curl#11627
   bafa77d (mac)
   1b27304 (linux)
8. implement unity builds: single compiler invocation for libcurl + tool
   curl/curl@3f8fc25
   curl/curl#11095
   curl/curl@f42a279
   curl/curl#11928
   67d7fd3
9. speed up 4x the cmake initialization phase (for Windows)
   curl/curl@2100d9f
   curl/curl#12044
10. speed up cmake initialization even more by pre-filling
   detection results for our well-known mingw-w64 environments.
   74dd967
   5a43c61

+1: speed up autotools initialization by mitigating a brute-force
   feature detection on Windows. This reduced total build times
   by 5 minutes at the time, for the 3 Windows targets in total.
   curl/curl@6adcff6
   curl/curl#9591

Also update build times for cmake-unity and gnumake, based on runs:
cmake-unity: https://ci.appveyor.com/project/curlorg/curl-for-win/builds/48357875
gnumake: https://ci.appveyor.com/project/curlorg/curl-for-win/builds/48358005
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

Successfully merging this pull request may close these issues.

None yet

2 participants