Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

mysterious failures and IPv6 problems for macOS GitHub actions jobs #13284

Closed
bagder opened this issue Apr 4, 2024 · 6 comments
Closed

mysterious failures and IPv6 problems for macOS GitHub actions jobs #13284

bagder opened this issue Apr 4, 2024 · 6 comments
Labels
CI Continuous Integration

Comments

@bagder
Copy link
Member

bagder commented Apr 4, 2024

I did this

Over the last several days, CI jobs running on GitHub actions have turned flaky. Sometimes one of the macOS jobs fails, sometimes 10 of them fail. Rerunning the same code again may result in a different outcome.

The failures seem IPv6 related somehow. example job, another example

The most obvious symptom seems to be that when it has problems, it can't "resolve" ::1. It can be noted that for macOS we call getaddrinfo even on plain numerical IP addresses

failures

When this problem happens, usually the following test cases fail: 1085 1400 1401 1402 1403 1404 1405 1406 1407 1420 1465 1467 1468 2100

Looking at test logs in a failed run , we can see that all HTTP-IPv6 tests are skipped because curl cannot verify the IPv6 server:

2024-04-02T19:43:36.1057020Z  * Added connection 0. The cache now contains 1 members
2024-04-02T19:43:36.1057900Z  * STATE: CONNECT => RESOLVING handle 0x7fc26980be08; line 1954
2024-04-02T19:43:36.1058690Z  * Could not resolve host: ::1

test 1085

Tries to use an IPv6 address for --interface but the resolving of the host (::1) fails, which makes the test return an unexpected return code.

test 1400-1468

These tests verify --libcurl - outputting generated libcurl using source code for a curl command line.

These tests mysteriously get an extra line of libcurl code added:

curl_easy_setopt(hnd, CURLOPT_IPRESOLVE, 1L);

The only way CURLOPT_IPRESOLVE is set by the curl command line tool is when the -4/--ipv4 option is used. And it is not used in any of these tests.

test 2100

Uses DoH. When failing, the request does not ask for an IPv6 address (only IPv4) which makes the protocol check fail. Because it deems IPv6 not working?

macOS runner image

Image: macos-12
  Version: 20240329.1
  Included Software: https://github.com/actions/runner-images/blob/macOS-12/20240329.1/images/macos/macos-12-Readme.md
  Image Release: https://github.com/actions/runner-images/releases/tag/macOS-12%2F20240329.1

I expected the following

The tests should just work.

curl/libcurl version

current git

operating system

macOS only

@vszakats
Copy link
Member

vszakats commented Apr 4, 2024

image changes on 20240329:
actions/runner-images@6cb51f9

@vszakats
Copy link
Member

vszakats commented Apr 4, 2024

Possibly related:
actions/runner-images#9626
actions/runner-images#9628

@bagder
Copy link
Member Author

bagder commented Apr 4, 2024

aaaaaah

actions/runner-images#9586

This explains a lot:

echo "Configuring curl to resolve names with IPv4..."
echo '--ipv4' >> ~/.curlrc

@bagder
Copy link
Member Author

bagder commented Apr 4, 2024

What puzzles me is how that was not always causing those tests to fail...

@erik-bershel
Copy link

What puzzles me is how that was not always causing those tests to fail...

Do green tests shipped on updated images? It takes some time to update all the agents in prod. So you should have seen something like an ever-increasing number of errors until all new jobs only hit the updated agent with the curl config. If they took place on updated agents, then this would be a very interesting case.

@bagder
Copy link
Member Author

bagder commented Apr 5, 2024

Do green tests shipped on updated images?

Yes, that's exactly it - or rather it still worked fine on the old images. I went back to check a few of the older PRs and you are right. The green builds are still Version: 20240218.1. All the builds on the new image fail. It all makes sense now.

@bagder bagder closed this as completed in 5ae7255 Apr 5, 2024
bagder added a commit that referenced this issue Apr 16, 2024
To reduce the risk that the user running the tests has a .curlrc present
that messes things up.

Ref: #13284
bagder added a commit that referenced this issue Apr 16, 2024
To reduce the risk that the user running the tests has a .curlrc present
that messes things up.

Support 'option="no-q"' for the <command> tag to switch it off on demand.
Use this new feature in test 433 and 436.

Ref: #13284
Closes #13387
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CI Continuous Integration
Development

No branches or pull requests

3 participants