Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

test 596 flakiness #12823

Closed
icing opened this issue Jan 31, 2024 · 1 comment
Closed

test 596 flakiness #12823

icing opened this issue Jan 31, 2024 · 1 comment

Comments

@icing
Copy link
Contributor

icing commented Jan 31, 2024

I did this

I ran:

make -j20 && (cd tests && make -j20 && ./runtests.pl --repeat=100 596)

which eventually fails the test with:

 596: protocol FAILED:
--- log/check-expected	2024-01-31 10:29:38
+++ log/check-generated	2024-01-31 10:29:38
@@ -5,4 +5,3 @@
 TYPE I[CR][LF]
 SIZE 596[CR][LF]
 RETR 596[CR][LF]
-QUIT[CR][LF]

 - abort tests

what I see then in the log file is:

0000: 150 Binary data connection for 596 () (4 bytes).
10:29:38.694577 == Info: Maxdownload = -1
10:29:38.694686 == Info: Getting file with size: 4
10:29:38.694807 == Info: Preparing for accepting server on data port
10:29:38.694969 == Info: Checking for server connect
10:29:38.695129 == Info: Ctrl conn has data while waiting for data conn
10:29:38.695299 <= Recv header, 28 bytes (0x1c)
0000: 226 File transfer complete
10:29:38.695516 == Info: [FTP] recv 0 bytes, ftpcode=226
10:29:38.695667 == Info: ftp AllowServerConnect() -> 8
...

which is the error code CURLE_WEIRD_SERVER_REPLY on seeing the 226 response on the control connection before curl has retrieved the file. See ftp.c:421.

What I believe happens here is that the POLL event on the control socket arrives sometimes before the event on the data socket. The server probably has sent the 4 data bytes of the "file" and directly wrote the 226 response on the control. Since the OS makes no guarantee in what order socket event arrive at the peer, this seems to point to a bug in the curl code.

I expected the following

The test to work reliably.

curl/libcurl version

8.7.0-DEV

operating system

macOS

@icing
Copy link
Contributor Author

icing commented Jan 31, 2024

After looking at the code, one way to fix this is to do a "peek" at the response in ftp.c:415 instead of consuming the response. If the "peek" finds a 226, the accepting should continue and keep the 226 around for when it succeeds.

bagder added a commit that referenced this issue Feb 1, 2024
For active mode transfers.

Due to some interesting timing, curl can sometimes get the 226 (transfer
complete) over the control channel first, before the data connection
signals readability. If this happens, use that as a signal to check the
data connection.

Additionally, set the socket filter in listen mode *before* the
PORT/EPRT command is issued, to reduce the risk that the little time gap
could interfere.

This issue never reproduced for me on Debian and takes several hundred
rounds for me to trigger on my mac.

Reported-by: Stefan Eissing
Fixes #12823
@bagder bagder closed this as completed in 1049195 Feb 1, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

Successfully merging a pull request may close this issue.

2 participants