Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Test 433 suddenly started randomly failing on Azure Ubuntu CI #6739

Closed
mback2k opened this issue Mar 12, 2021 · 19 comments
Closed

Test 433 suddenly started randomly failing on Azure Ubuntu CI #6739

mback2k opened this issue Mar 12, 2021 · 19 comments
Labels
CI Continuous Integration tests

Comments

@mback2k mback2k added CI Continuous Integration tests labels Mar 12, 2021
@mback2k
Copy link
Member Author

mback2k commented Mar 12, 2021

I just queued Azure DevOps Ubuntu Linux CI runs for all commits in the range mentioned above to bisect the root cause.

@bagder
Copy link
Member

bagder commented Mar 12, 2021

None of the commits in that range seems to have anything to do with such a failure... and I cannot reproduce on my Debian machine.

@mback2k
Copy link
Member Author

mback2k commented Mar 13, 2021

I guessed so. Then probably something changed on the Ubuntu version that Azure uses. I will try to verify this by running previous commits on there later today.

@mback2k
Copy link
Member Author

mback2k commented Mar 13, 2021

It seems the test started randomly failing since the 2021-03-11. I re-triggered this build from 2021-03-10 which failed due to a network connectivity issue in the first run, let's see how this turns out:
https://dev.azure.com/daniel0244/curl/_build/results?buildId=4996&view=results

@mback2k
Copy link
Member Author

mback2k commented Mar 13, 2021

Okay, the one above worked on the first try. Let's try the last commit from 2021-03-10 again:
https://dev.azure.com/daniel0244/curl/_build/results?buildId=5057&view=results

@mback2k
Copy link
Member Author

mback2k commented Mar 13, 2021

So the one above failed as well: https://dev.azure.com/daniel0244/curl/_build/results?buildId=5057&view=logs&j=9d58b9ac-e1e6-53b6-f83a-1f9f1d912522&t=51a4b246-0fa1-595b-3cc7-3e7aeb2b7380&l=1824

I guess something in the Azure CI environment has changed, probably the switch to Ubuntu 20.04 LTS which is causing this. @bagder do you happen to have such a VM around for re-production purposes? It would probably require running the test suite multiple times until it fails randomly.

@bagder
Copy link
Member

bagder commented Mar 13, 2021

I have a 18.04 VM here. Do we know if it requires 20.04 to fail? I could probably set one up, it might be useful in the future as well.

@mback2k
Copy link
Member Author

mback2k commented Mar 13, 2021

Do we know if it requires 20.04 to fail?

That is the theory to investigate here. I have this theory since Azure DevOps recently changed ubuntu-latest to 20.04.

@bagder
Copy link
Member

bagder commented Mar 14, 2021

Took me a while but now I have a 20.04 VM with a fresh curl build and... test 433 works fine here! 😞

@mback2k
Copy link
Member Author

mback2k commented Mar 14, 2021

I guess this is a behavioral issue that only appears if the whole test suite is run (multiple times).

@jay
Copy link
Member

jay commented Mar 14, 2021

This is a weird one. It looks like the last line of the curlrc file is not read, but why. It could be a bug in curl. Try adding DEBUG_CONFIG to the preprocessor macros

curl/src/tool_parsecfg.c

Lines 187 to 189 in 2f33be8

#ifdef DEBUG_CONFIG
fprintf(stderr, "GOT: %s\n", option);
#endif

curl/src/tool_parsecfg.c

Lines 240 to 242 in 2f33be8

#ifdef DEBUG_CONFIG
fprintf(stderr, "PARAM: \"%s\"\n",(param ? param : "(null)"));
#endif

@bagder
Copy link
Member

bagder commented Mar 14, 2021

In the most recent commits on master I don't see any test 433 failures on azure.

@bagder
Copy link
Member

bagder commented Mar 15, 2021

@jay: it also lacks the header in the request so I think it rather looks like it doesn't read the .curlrc file at all, which then make it looks like the XDG_CONFIG_HOME=%PWD/log environment variable setting doesn't work...

@bagder bagder mentioned this issue Mar 16, 2021
jay added a commit that referenced this issue Mar 17, 2021
@jay
Copy link
Member

jay commented Mar 17, 2021

Use #6756 to run experiments

@jay
Copy link
Member

jay commented Mar 17, 2021

XDG_CONFIG_HOME is set by perl to /home/vsts/work/1/s/tests/log but when curl retrieves via getenv it's /.config so curl tries to open /.config/.curlrc which is wrong of course

full output

@jay
Copy link
Member

jay commented Mar 17, 2021

What puzzles me is that it looks like curl finds CURL_HOME set and uses that, even though the test is supposed to clear that... ?

curl tool is called several times and there's debug output each time it's called. what I am looking at is the debug output in stderr logfile which should be the right one,

2021-03-17T07:13:06.0977860Z === Start of file stderr433
2021-03-17T07:13:06.0978102Z  DEBUG: parseconfig "(null)"
2021-03-17T07:13:06.0978329Z  DEBUG: homedir(".curlrc")
2021-03-17T07:13:06.0978574Z  DEBUG: CURL_HOME: (null)
2021-03-17T07:13:06.0978834Z  DEBUG: XDG_CONFIG_HOME: /.config
2021-03-17T07:13:06.0979092Z  DEBUG: c: /.config/.curlrc
2021-03-17T07:13:06.0979412Z  DEBUG: fd: -1
2021-03-17T07:13:06.0979641Z  DEBUG: errno: 2
2021-03-17T07:13:06.0979890Z  DEBUG: pathalloc: /home/vsts/.curlrc
2021-03-17T07:13:06.0980127Z  DEBUG: file: (nil)

I've modified the debug output to dump command line arguments as well, to be sure. My guess is the shell sets XDG_CONFIG_HOME

@bagder
Copy link
Member

bagder commented Mar 17, 2021

My guess is the shell sets XDG_CONFIG_HOME

Yes clearly something is setting it anyway after runtests.pl has set it. The question is then what we should do about it. I figure we have a few different alternatives. In order of my preference I can think of:

  1. research if we can stop this practice in these CI machines (perhaps by removing/editing some files in the CI setup)
  2. try to detect that it can't be set (<precheck> ?) and skip it if so
  3. make the test debug-only and for such builds actually use a different environment varaible

@bagder
Copy link
Member

bagder commented Mar 22, 2021

BTW, in my Ubuntu 20.04 VM I don't get XDG_CONFIG_HOME at all in my standard shell after a very standard install:

$ env | grep XDG
XDG_CONFIG_DIRS=/etc/xdg/xdg-ubuntu:/etc/xdg
XDG_MENU_PREFIX=gnome-
XDG_SESSION_DESKTOP=ubuntu
XDG_SESSION_TYPE=x11
XDG_CURRENT_DESKTOP=ubuntu:GNOME
XDG_SESSION_CLASS=user
XDG_RUNTIME_DIR=/run/user/1000
XDG_DATA_DIRS=/usr/share/ubuntu:/usr/local/share/:/usr/share/:/var/lib/snapd/desktop

bagder added a commit that referenced this issue Mar 22, 2021
Something in that environment sets XDG_CONFIG_HOME for us in a way that
breaks the test.

Fixes #6739
@bagder bagder closed this as completed in 45d1e24 Mar 23, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CI Continuous Integration tests
Development

Successfully merging a pull request may close this issue.

3 participants