Skip to content

AppVeyor CI msys2/cygwin debug jobs permafail for test2600 after #11690 #11767

Closed
@vszakats

Description

@vszakats

I did this

Pushed a PR. These AppVeyor CI jobs always fail for test2600:

  • autotools, msys2, Debug, no Proxy, no SSL
  • autotools, msys2, Debug, no SSL
  • autotools, cygwin, Debug, no SSL

All 64-bit builds.
The MSYS2 Release build is always OK however.
Release build has AsynchDNS enabled, while Debug ones don't.

Happening since:
https://ci.appveyor.com/project/curlorg/curl/builds/47840460

Original report: #11690 (comment)
Ref: 8627416 #11690

I expected the following

AppVeyor CI jobs green. Or, green after a rebuild in case of flaky results.

curl/libcurl version

master

operating system

Windows MSYS2/Cygwin

Activity

changed the title [-]AppVeyor CI msys2/cygwin debug jobs permafail for test2600 after 862741637[/-] [+]AppVeyor CI msys2/cygwin debug jobs permafail for test2600 after 862741637bd31f7c621a988e47c5452ebbb58cb4[/+] on Aug 30, 2023
changed the title [-]AppVeyor CI msys2/cygwin debug jobs permafail for test2600 after 862741637bd31f7c621a988e47c5452ebbb58cb4[/-] [+]AppVeyor CI msys2/cygwin debug jobs permafail for test2600 after #11690[/+] on Aug 30, 2023
jay

jay commented on Aug 30, 2023

@jay
Member

I've looked at a few and afaics it's always the happy eyeballs subtests that fail.

 5: test case took 403ms
 unit2600.c:309 test failed: '5: expected ipv6 to start delayed after 150ms, instead first attempt made after 201ms'

...

 6: test case took 403ms
 unit2600.c:309 test failed: '6: expected ipv4 to start delayed after 150ms, instead first attempt made after 201ms'

The error message is misleading because USE_ALARM_TIMEOUT is defined in this case (which may be the issue?) and that means the happy eyeballs timeout is set to 1000ms (instead of he_timeout which is 150ms). So it's failing because it expects a delay of 1s and it's not getting it. As you noted in your original report these fails are all with synchronous dns on Windows.

curl/tests/unit/unit2600.c

Lines 329 to 337 in 226d042

#ifdef USE_ALARM_TIMEOUT
/* we only 1sec timer resolution here */
curl_easy_setopt(easy, CURLOPT_HAPPY_EYEBALLS_TIMEOUT_MS,
(tc->he_timeout_ms > tc->connect_timeout_ms)?
3000L : 1000L);
#else
curl_easy_setopt(easy, CURLOPT_HAPPY_EYEBALLS_TIMEOUT_MS,
(long)tc->he_timeout_ms);
#endif

curl/tests/unit/unit2600.c

Lines 300 to 310 in 226d042

#ifndef USE_ALARM_TIMEOUT
if(stats2->first_created < tc->he_timeout_ms) {
#else
if(stats2->first_created < 1000) {
#endif
curl_msprintf(msg, "%d: expected ip%s to start delayed after %dms, "
"instead first attempt made after %dms",
tc->id, stats2->family, (int)tc->he_timeout_ms,
(int)stats2->first_created);
fail(msg);
}

/cc @icing

vszakats

vszakats commented on Sep 5, 2023

@vszakats
MemberAuthor

Closing due to no interest.

dfandrich

dfandrich commented on Sep 5, 2023

@dfandrich
Contributor

There's definitely interest in dealing with failing tests. The problem is lack of developers with access to Windows. We can't have tests permanently failing.

dfandrich

dfandrich commented on Sep 14, 2023

@dfandrich
Contributor

It turns out there is no CI build that 1) enables debug mode, 2) disables threaded resolver, 3) runs in non-event-based mode, 4) runs in non-torture mode, and 5) runs on a non-Windows platform. If there were, we'd be seeing this test fail there as well, as it's easily reproducible in that configuration locally. This basically confirms the two previous commenters' assumptions that it isn't actually Windows related but rather synchronous resolver related.

dfandrich

dfandrich commented on Sep 14, 2023

@dfandrich
Contributor

I tried disabling the special USE_ALARM_TIMEOUT support and sent it to CI and 2600 is passing everywhere so far. Those three Appveyor jobs have been permafailing on 2600 since #11690 (commit 8627416), and, ironically, it looks to me like the whole reason for special-casing USE_ALARM_TIMEOUT was removed in that same commit. That commit increased the minimum CURLOPT_CONNECTTIMEOUT_MS used in the test well beyond the 1 second threshold that the alarm timeouts can handle. Since that's the only thing that is handled differently in USE_ALARM_TIMEOUT mode, it's no longer required. Even Solaris 11 works when removing that special case.

I'm going to clean it up into a PR and see how that fares.

added a commit that references this issue on Sep 14, 2023
8a16ee2

3 remaining items

Loading
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

      Participants

      @dfandrich@jay@vszakats

      Issue actions

        AppVeyor CI msys2/cygwin debug jobs permafail for test2600 after #11690 · Issue #11767 · curl/curl