Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Weird mixup of test data in test suite run #6058

Closed
mback2k opened this issue Oct 10, 2020 · 20 comments
Closed

Weird mixup of test data in test suite run #6058

mback2k opened this issue Oct 10, 2020 · 20 comments
Assignees
Labels
CI Continuous Integration tests

Comments

@mback2k
Copy link
Member

mback2k commented Oct 10, 2020

I did this

While working on #6049 this weird test suite run occurred.

For some unknown reason the tests 2055 3002 3003 3004 3005 3006 3007 failed while showing irrelevant test data.

I expected the following

No irrelevant test data, like MQTT related things, for these test cases.

cc @bagder any idea?

@mback2k mback2k added tests CI Continuous Integration labels Oct 10, 2020
@emilengler
Copy link
Contributor

Maybe b8e4d1c is the trouble maker? The weird data looks like a repetition

@bagder
Copy link
Member

bagder commented Oct 10, 2020

runtests.pl:cleardir($LOGDIR) is invoked first thing in singletest(), and that function is supposed to remove all leftover log files from previous test rounds. As this happens on Windows, I would presume that something strange is causing the unlink to fail on those files and the only way I can think of that can do that is if the files are still "in use" somehow by the process that creates them. But I can't say I understand exactly how.

What might be good though, is that it might not be a big problem other than the fact that this output includes irrelevant files.

@mback2k
Copy link
Member Author

mback2k commented Oct 10, 2020

files are still "in use" somehow by the process that creates them. But I can't say I understand exactly how.

I guess this is the main root cause of all CI troubles with Windows, nested test suite processes lurking around still "using"/locking files. My changes in #6049 at least reduce the number of times a test server is "unexpectedly alive".

@bagder
Copy link
Member

bagder commented Oct 10, 2020

Is there a way to check which process is still locking a file? Could be fun to try if unlink of a log fails...

@mback2k
Copy link
Member Author

mback2k commented Oct 10, 2020

In previous attempts to solve this I tried to use tasklist.exe and netstat.exe between every test case to identify hang/used processes and ports by comparing successful and failed test runs. Unfortunately that didn't show any relevant difference.

Is there a way to check which process is still locking a file? Could be fun to try if unlink of a log fails...

I don't think there is a native utility on Windows anymore, previous versions had openfiles, but this is no longer available for 64-bit Windows. I will try to implement something into runtests.pl using the SysInternals Handle tool to identify the relevant processes (if the tool is found). I will implement this and put the tool into the Windows containers used for CI builds probably tomorrow.

@mback2k
Copy link
Member Author

mback2k commented Oct 10, 2020

Maybe b8e4d1c is the trouble maker? The weird data looks like a repetition

I don't think so. This feature is used for multiple test cases and just affects the generated output.

@mback2k
Copy link
Member Author

mback2k commented Oct 21, 2020

Just FYI: this is still on my agenda, just didn't have the time for this yet.

@mback2k mback2k self-assigned this Oct 27, 2020
mback2k added a commit to mback2k/curl that referenced this issue Nov 5, 2020
For now only required and implemented for Windows.

Ref: curl#6058
@mback2k
Copy link
Member Author

mback2k commented Nov 6, 2020

Seems like the TFTP tests are the first during the run causing files to be locked:

2020-11-05T22:32:05.2087979Z test 0285...[TFTP send]
2020-11-05T22:32:07.4259895Z Warning: server logs lock timeout (2 seconds) expired
2020-11-05T22:32:07.4261212Z --p-u--e--- OK (285 out of 1375, remaining: 11:34, took 2.229s, duration: 03:01)
2020-11-05T22:32:07.5467722Z Inter result: "log"
2020-11-05T22:32:07.8999171Z test 0286...[TFTP send of boundary case 512 byte file]
2020-11-05T22:32:12.4676923Z Warning: server logs lock timeout (2 seconds) expired
2020-11-05T22:32:12.4677802Z --p-u--e--- OK (286 out of 1375, remaining: 11:50, took 4.580s, duration: 03:06)
2020-11-05T22:32:12.4678339Z Inter result: "log"

mback2k added a commit to mback2k/curl that referenced this issue Nov 6, 2020
For now only required and implemented for Windows.

Requires Sysinternals handle[64].exe to be on PATH.

Ref: curl#6058
@mback2k
Copy link
Member Author

mback2k commented Nov 9, 2020

@bagder mqttd.exe seems to be locking files after a single test has finished:

2020-11-08T13:27:15.3438069Z test 1190...[MQTT SUBSCRIBE]
2020-11-08T13:27:15.4986640Z --p----e--- OK (915 out of 1376, remaining: 05:43, took 1.389s, duration: 11:21)
2020-11-08T13:27:15.4987653Z Inter result: "log"
2020-11-08T13:28:07.9659079Z mqttd.exe          pid: 5716   type: File            EC: C:\__w\1\s\tests\log\test1190
2020-11-08T13:28:07.9659622Z 
2020-11-08T13:28:07.9659834Z 
2020-11-08T13:28:07.9660653Z Found lock: mqttd.exe          pid: 5716   type: File            EC: C:\__w\1\s\tests\log\test1190
2020-11-08T13:28:07.9661361Z 
2020-11-08T13:28:07.9661533Z 
2020-11-08T13:28:07.9661782Z test 1191...[MQTT PUBLISH]
2020-11-08T13:28:08.1268057Z --p----e--- OK (916 out of 1376, remaining: 06:08, took 0.165s, duration: 12:13)
2020-11-08T13:28:08.1268767Z Inter result: "log"
2020-11-08T13:29:00.6190636Z mqttd.exe          pid: 5716   type: File            EC: C:\__w\1\s\tests\log\test1190
2020-11-08T13:29:00.6191268Z 
2020-11-08T13:29:00.6191863Z 
2020-11-08T13:29:00.6193465Z Found lock: mqttd.exe          pid: 5716   type: File            EC: C:\__w\1\s\tests\log\test1190
2020-11-08T13:29:00.6194846Z 
2020-11-08T13:29:00.6196033Z 
2020-11-08T13:29:00.6196846Z test 1192...[MQTT SUBSCRIBE 2k topic]
2020-11-08T13:29:00.8166392Z --p----e--- OK (917 out of 1376, remaining: 06:33, took 0.207s, duration: 13:06)
2020-11-08T13:29:00.8167206Z Inter result: "log"
2020-11-08T13:29:53.4109109Z mqttd.exe          pid: 5716   type: File            80: C:\__w\1\s\tests\log\test1192
2020-11-08T13:29:53.4112549Z 
2020-11-08T13:29:53.4114278Z 
2020-11-08T13:29:53.4116427Z Found lock: mqttd.exe          pid: 5716   type: File            80: C:\__w\1\s\tests\log\test1192
2020-11-08T13:29:53.4117176Z 
2020-11-08T13:29:53.4117822Z 
2020-11-08T13:29:53.4122431Z mqttd.exe          pid: 5716   type: File            EC: C:\__w\1\s\tests\log\test1190
2020-11-08T13:29:53.4125878Z 
2020-11-08T13:29:53.4126329Z 
2020-11-08T13:29:53.4131838Z Found lock: mqttd.exe          pid: 5716   type: File            EC: C:\__w\1\s\tests\log\test1190
2020-11-08T13:29:53.4132878Z 
2020-11-08T13:29:53.4137887Z 
2020-11-08T13:29:53.4138729Z test 1193...[MQTT PUBLISH 2k payload]

@bagder
Copy link
Member

bagder commented Nov 9, 2020

For all I can see, the mqtt server is using the same logging function as several of the other test servers and it makes fopen() - fprintf() - fclose() for each line it logs, which for all I can see shouldn't lock anything:

curl/tests/server/util.c

Lines 133 to 136 in fa6bbbe

logfp = fopen(serverlogfile, "ab");
if(logfp) {
fprintf(logfp, "%s %s\n", timebuf, buffer);
fclose(logfp);

@bagder
Copy link
Member

bagder commented Nov 9, 2020

Uhm, it is the test file that is locked, not the log...

@bagder
Copy link
Member

bagder commented Nov 9, 2020

PR pending

bagder added a commit that referenced this issue Nov 9, 2020
Reported-by: Marc Hörsken
Bug: #6058
bagder added a commit that referenced this issue Nov 9, 2020
Reported-by: Marc Hörsken
Reviewed-by: Jay Satiro
Bug: #6058
Closes #6189
mback2k added a commit to mback2k/curl that referenced this issue Nov 11, 2020
For now only required and implemented for Windows.

Requires Sysinternals handle[64].exe to be on PATH.

Ref: curl#6058
Closes curl#6179
@mback2k
Copy link
Member Author

mback2k commented Nov 12, 2020

@bagder it seems like the TFTP server is locking the uploaded file even after the test has finished:

test 0285...[TFTP send]
Warning: server logs lock timeout (2 seconds) expired
--p-u--e--- OK (285 out of 1376, remaining: 10:16, took 2.157s, duration: 02:41)
Found File lock of 'C:\__w\1\s\tests\log\upload.285' (E0) by tftpd.exe (1284)

@bagder
Copy link
Member

bagder commented Nov 12, 2020

Given a first look at the code that seems accurate. I'll fire up a PR asap.

@mback2k
Copy link
Member Author

mback2k commented Nov 12, 2020

Thanks, I am currently trying to understand the relevant code, but I guess you are much faster than myself regarding this.

mback2k added a commit to mback2k/curl that referenced this issue Nov 12, 2020
Make sure the log file is not logged once a test has
finished and align with the behaviour of our logmsg.

Rename curl_test_data.py to be a general util.py.

Bug: curl#6058
@mback2k
Copy link
Member Author

mback2k commented Nov 12, 2020

Found a similar issue in the Python-based tests servers and for now added a commit to PR #6179 about that. Will split that one out after testing.

@bagder
Copy link
Member

bagder commented Nov 12, 2020

I was wrong. I have not found the reason for the locking after a completed test,

mback2k added a commit to mback2k/curl that referenced this issue Nov 13, 2020
Make sure the log file is not logged once a test has
finished and align with the behaviour of our logmsg.

Rename curl_test_data.py to be a general util.py.

Bug: curl#6058
mback2k added a commit to mback2k/curl that referenced this issue Nov 14, 2020
Make sure the log file is not logged once a test has
finished and align with the behaviour of our logmsg.

Rename curl_test_data.py to be a general util.py.

Bug: curl#6058
mback2k added a commit to mback2k/curl that referenced this issue Nov 14, 2020
Make sure the log file is not locked once a test has
finished and align with the behavior of our logmsg.

Rename curl_test_data.py to be a general util.py.

Bug: curl#6058
Closes curl#6206
mback2k added a commit to mback2k/curl that referenced this issue Nov 14, 2020
For now only required and implemented for Windows.

Requires Sysinternals handle[64].exe to be on PATH.

Ref: curl#6058
Closes curl#6179
mback2k added a commit to mback2k/curl that referenced this issue Nov 14, 2020
Make sure the log file is not logged once a test has
finished and align with the behaviour of our logmsg.

Rename curl_test_data.py to be a general util.py.

Bug: curl#6058
mback2k added a commit to mback2k/curl that referenced this issue Nov 14, 2020
Make sure uploaded file is not locked after transfer.

Bug: curl#6058
mback2k added a commit to mback2k/curl that referenced this issue Nov 14, 2020
Make sure the log file is not locked once a test has
finished and align with the behavior of our logmsg.

Rename curl_test_data.py to be a general util.py.

Bug: curl#6058
Closes curl#6206
mback2k added a commit to mback2k/curl that referenced this issue Nov 14, 2020
Make sure the log file is not locked once a test has
finished and align with the behavior of our logmsg.

Rename curl_test_data.py to be a general util.py.
Format and sort Python imports with isort/VSCode.

Bug: curl#6058
Closes curl#6206
mback2k added a commit to mback2k/curl that referenced this issue Nov 14, 2020
Make sure the log file is not locked once a test has
finished and align with the behavior of our logmsg.

Rename curl_test_data.py to be a general util.py.
Format and sort Python imports with isort/VSCode.

Bug: curl#6058
Closes curl#6206
mback2k added a commit to mback2k/curl that referenced this issue Nov 14, 2020
For now only required and implemented for Windows.

Requires Sysinternals handle[64].exe to be on PATH.

Ref: curl#6058
Closes curl#6179
mback2k added a commit to mback2k/curl that referenced this issue Nov 14, 2020
Make sure uploaded file is not locked after transfer.

Bug: curl#6058
mback2k added a commit to mback2k/curl that referenced this issue Nov 15, 2020
Make sure uploaded file is not locked after transfer.

Bug: curl#6058
mback2k added a commit to mback2k/curl that referenced this issue Nov 15, 2020
For now only required and implemented for Windows.

Requires Sysinternals handle[64].exe to be on PATH.

Ref: curl#6058
Closes curl#6179
mback2k added a commit to mback2k/curl that referenced this issue Nov 16, 2020
For now only required and implemented for Windows.
Ignore stunnel logs due to long running processes.

Requires Sysinternals handle[64].exe to be on PATH.

Ref: curl#6058
Closes curl#6179
mback2k added a commit to mback2k/curl that referenced this issue Nov 20, 2020
Make sure uploaded file is no longer locked after the
transfer while waiting for the final ACK to be handled.

Reviewed-by: Daniel Stenberg

Bug: curl#6058
Closes curl#6209
mback2k added a commit to mback2k/curl that referenced this issue Nov 20, 2020
For now only required and implemented for Windows.
Ignore stunnel logs due to long running processes.

Requires Sysinternals handle[64].exe to be on PATH.

Ref: curl#6058
Closes curl#6179
mback2k added a commit that referenced this issue Nov 20, 2020
Make sure uploaded file is no longer locked after the
transfer while waiting for the final ACK to be handled.

Assisted-by: Daniel Stenberg

Bug: #6058
Closes #6209
@bagder
Copy link
Member

bagder commented Dec 20, 2020

Is this issue still valid? It is simply too generic for anyone to understand what it refers to unless we have specific examples pointed out. Do we have any such?

@mback2k
Copy link
Member Author

mback2k commented Dec 21, 2020

The examples were talked above at the top of this issue and I think the most prominent ones regarding MQTT and TFTP were fixed. #6179 will also help avoid this issue in the future. We can also already close this issue or keep it open until the PR is ready and merged, which I still plan to do soon once I have the Unix variant implemented.

@bagder
Copy link
Member

bagder commented Dec 21, 2020

Issues work best when they identify an issue and thus also indirectly when the issue is fixed (when the issue/bad thing no longer occurs). The way this issue is phrased it is more of a placeholder for you as nobody else knows what it is about nor when it is done (since the only explicitly mentioned ones are fixed already). As such, I don't think it serves much purposes to be left open.

@mback2k mback2k closed this as completed Dec 21, 2020
mback2k added a commit to mback2k/curl that referenced this issue Dec 26, 2020
For now only required and implemented for Windows.
Ignore stunnel logs due to long running processes.

Requires Sysinternals handle[64].exe to be on PATH.

Ref: curl#6058
Closes curl#6179
mback2k added a commit to mback2k/curl that referenced this issue Feb 27, 2021
For now only required and implemented for Windows.
Ignore stunnel logs due to long running processes.

Requires Sysinternals handle[64].exe to be on PATH.

Ref: curl#6058
Closes curl#6179
mback2k added a commit to mback2k/curl that referenced this issue Feb 28, 2021
For now only required and implemented for Windows.
Ignore stunnel logs due to long running processes.

Requires Sysinternals handle[64].exe to be on PATH.

Reviewed-by: Jay Satiro

Ref: curl#6058
Closes curl#6179
mback2k added a commit to mback2k/curl that referenced this issue Feb 28, 2021
Introduce a new runtests.pl command option: -rm

For now only required and implemented for Windows.
Ignore stunnel logs due to long running processes.

Requires Sysinternals handle[64].exe to be on PATH.

Reviewed-by: Jay Satiro

Ref: curl#6058
Closes curl#6179
mback2k added a commit that referenced this issue Mar 1, 2021
Introduce a new runtests.pl command option: -rm

For now only required and implemented for Windows.
Ignore stunnel logs due to long running processes.

Requires Sysinternals handle[64].exe to be on PATH.

Reviewed-by: Jay Satiro

Ref: #6058
Closes #6179
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CI Continuous Integration tests
Development

No branches or pull requests

3 participants