Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Poor upload performance in Windows #6146

Closed
jay opened this issue Oct 30, 2020 · 26 comments
Closed

Poor upload performance in Windows #6146

jay opened this issue Oct 30, 2020 · 26 comments

Comments

@jay
Copy link
Member

jay commented Oct 30, 2020

User Jeffrey McKay has reported on the curl-library mailing list that after upgrading to 7.73.0 the upload performance decreased drastically. What was typically a 10 second upload took over 5 minutes.

(Click to expand his report) In summary, he describes it as "large chunks of data seem to be taking 1 second to upload".

I maintain a Windows application (written in VC 2017) that basically
uploads thousands of email messages into Microsoft Exchange/Ofice 365,
via their EWS protocol. I have been successfully using libcurl 7.64 for
a while.

Recently we did have some problem with logging in to some servers, and I
determined that the issue could be fixed by updating to libcurl 7.71
(not sure why). However I ran into some other problem with this version,
where for some reason after several thousand messages, libcurl seemed to
be returning null data in response to a GET. However I am *not* asking
about that problem in this message.

The above problem went away when I switched to libcurl 7.73. However,
this version has its own problem, a serious performance issue, where
large chunks of data seem to be taking 1 second to upload, vs a fraction
of a second seen with the previous version.

Take a look at the attached two log files, slow_upload.txt and
fast_upload.txt. These are logs of the debug function after I do a POST.
In the slow version, each 64k chunk of data is taking approximately 1
second. In the fast version, each chunk is a fraction of a second or not
even measurable. These results are consistent when doing multiple tests
at different times of the day. I don't think it is related to EWS server
variability.

Keep in mind that the exact same main program binary executable code is
running in each test, only the libcurl.dll has been changed.

Any idea what could be causing this? I've also included some of my code
that performs the POST operation (cleaned up of a lot of extraneous
stuff). Hopefully there is some curl option that I can set or change
that fixes this.

I am able to reproduce in Windows and bisected it to d2a7d7c with the PostUpload example.

17f58c8 (parent of d2a7d7c)
Transfer rate: 4103 KB/sec (52429310 bytes in 12 seconds)
Transfer rate: 4121 KB/sec (52429310 bytes in 12 seconds)
Transfer rate: 4067 KB/sec (52429310 bytes in 13 seconds)
Transfer rate: 4078 KB/sec (52429310 bytes in 13 seconds)
Transfer rate: 4074 KB/sec (52429310 bytes in 13 seconds)
Transfer rate: 4100 KB/sec (52429310 bytes in 12 seconds)

d2a7d7c ("multi: implement wait using winsock events")
Transfer rate: 503 KB/sec (52429310 bytes in 102 seconds)
Transfer rate: 503 KB/sec (52429310 bytes in 102 seconds)
Transfer rate: 971 KB/sec (52429310 bytes in 53 seconds)
Transfer rate: 503 KB/sec (52429310 bytes in 102 seconds)
Transfer rate: 507 KB/sec (52429310 bytes in 101 seconds)
Transfer rate: 503 KB/sec (52429310 bytes in 102 seconds)

curl 7.73.0-DEV (i386-pc-win32) libcurl/7.73.1-DEV OpenSSL/1.0.2t nghttp2/1.40.0

The 1 second delay is coming from this call to WSAWaitForMultipleEvents in multi_wait:

curl/lib/multi.c

Lines 1278 to 1282 in 315ee3f

#ifdef USE_WINSOCK
WSAWaitForMultipleEvents(1, &multi->wsa_event, FALSE, timeout_ms, FALSE);
#else
int pollrc = Curl_poll(ufds, nfds, timeout_ms);
#endif

Debug code:

diff --git a/lib/multi.c b/lib/multi.c
index c8bba47..d71a573 100644
--- a/lib/multi.c
+++ b/lib/multi.c
@@ -1275,7 +1275,17 @@ static CURLMcode Curl_multi_wait(struct Curl_multi *multi,
 #ifdef USE_WINSOCK
     if(already_writable > 0)
       timeout_ms = 0;
-    WSAWaitForMultipleEvents(1, &multi->wsa_event, FALSE, timeout_ms, FALSE);
+    {
+      DWORD ret;
+      struct curltime before, after;
+      before = Curl_now();
+      ret = WSAWaitForMultipleEvents(1, &multi->wsa_event, FALSE, timeout_ms,
+                                     FALSE);
+      after = Curl_now();
+      fprintf(stderr,
+        "WSAWaitForMultipleEvents timeout_ms: %d, ret: %d, elapsed: %I64d\n",
+        timeout_ms, ret, Curl_timediff(after, before));
+    }
 #else
     int pollrc = Curl_poll(ufds, nfds, timeout_ms);
 #endif

Debug output (ret 258 is WSA_WAIT_TIMEOUT):

WSAWaitForMultipleEvents timeout_ms: 0, ret: 258, elapsed: 0
WSAWaitForMultipleEvents timeout_ms: 0, ret: 258, elapsed: 0
WSAWaitForMultipleEvents timeout_ms: 0, ret: 258, elapsed: 0
WSAWaitForMultipleEvents timeout_ms: 0, ret: 258, elapsed: 0
WSAWaitForMultipleEvents timeout_ms: 0, ret: 258, elapsed: 0
WSAWaitForMultipleEvents timeout_ms: 0, ret: 258, elapsed: 0
WSAWaitForMultipleEvents timeout_ms: 0, ret: 258, elapsed: 0
WSAWaitForMultipleEvents timeout_ms: 0, ret: 258, elapsed: 0
WSAWaitForMultipleEvents timeout_ms: 0, ret: 258, elapsed: 0
WSAWaitForMultipleEvents timeout_ms: 0, ret: 258, elapsed: 0
WSAWaitForMultipleEvents timeout_ms: 0, ret: 258, elapsed: 0
WSAWaitForMultipleEvents timeout_ms: 0, ret: 258, elapsed: 0
WSAWaitForMultipleEvents timeout_ms: 0, ret: 258, elapsed: 0
WSAWaitForMultipleEvents timeout_ms: 0, ret: 258, elapsed: 0
WSAWaitForMultipleEvents timeout_ms: 0, ret: 258, elapsed: 0
WSAWaitForMultipleEvents timeout_ms: 0, ret: 258, elapsed: 0
WSAWaitForMultipleEvents timeout_ms: 0, ret: 258, elapsed: 0
WSAWaitForMultipleEvents timeout_ms: 1000, ret: 0, elapsed: 2
WSAWaitForMultipleEvents timeout_ms: 1000, ret: 258, elapsed: 1000
  4% uploaded...WSAWaitForMultipleEvents timeout_ms: 0, ret: 258, elapsed: 0
WSAWaitForMultipleEvents timeout_ms: 0, ret: 258, elapsed: 0
WSAWaitForMultipleEvents timeout_ms: 0, ret: 258, elapsed: 0
WSAWaitForMultipleEvents timeout_ms: 0, ret: 258, elapsed: 0
WSAWaitForMultipleEvents timeout_ms: 0, ret: 258, elapsed: 0
WSAWaitForMultipleEvents timeout_ms: 0, ret: 258, elapsed: 0
WSAWaitForMultipleEvents timeout_ms: 0, ret: 258, elapsed: 0
WSAWaitForMultipleEvents timeout_ms: 0, ret: 258, elapsed: 0
WSAWaitForMultipleEvents timeout_ms: 0, ret: 258, elapsed: 0
WSAWaitForMultipleEvents timeout_ms: 0, ret: 258, elapsed: 0
WSAWaitForMultipleEvents timeout_ms: 0, ret: 258, elapsed: 0
WSAWaitForMultipleEvents timeout_ms: 0, ret: 258, elapsed: 0
WSAWaitForMultipleEvents timeout_ms: 0, ret: 258, elapsed: 0
WSAWaitForMultipleEvents timeout_ms: 0, ret: 258, elapsed: 0
WSAWaitForMultipleEvents timeout_ms: 0, ret: 258, elapsed: 0
WSAWaitForMultipleEvents timeout_ms: 0, ret: 258, elapsed: 0
WSAWaitForMultipleEvents timeout_ms: 0, ret: 258, elapsed: 0
WSAWaitForMultipleEvents timeout_ms: 0, ret: 258, elapsed: 0
WSAWaitForMultipleEvents timeout_ms: 1000, ret: 0, elapsed: 2
WSAWaitForMultipleEvents timeout_ms: 1000, ret: 258, elapsed: 1000
  6% uploaded...WSAWaitForMultipleEvents timeout_ms: 0, ret: 258, elapsed: 0
WSAWaitForMultipleEvents timeout_ms: 0, ret: 258, elapsed: 0
WSAWaitForMultipleEvents timeout_ms: 0, ret: 258, elapsed: 0
WSAWaitForMultipleEvents timeout_ms: 0, ret: 258, elapsed: 0
WSAWaitForMultipleEvents timeout_ms: 0, ret: 258, elapsed: 0
WSAWaitForMultipleEvents timeout_ms: 0, ret: 258, elapsed: 0
WSAWaitForMultipleEvents timeout_ms: 0, ret: 258, elapsed: 0
WSAWaitForMultipleEvents timeout_ms: 0, ret: 258, elapsed: 0
WSAWaitForMultipleEvents timeout_ms: 0, ret: 258, elapsed: 0
WSAWaitForMultipleEvents timeout_ms: 0, ret: 258, elapsed: 0
WSAWaitForMultipleEvents timeout_ms: 0, ret: 258, elapsed: 0
WSAWaitForMultipleEvents timeout_ms: 0, ret: 258, elapsed: 0
WSAWaitForMultipleEvents timeout_ms: 0, ret: 258, elapsed: 0
WSAWaitForMultipleEvents timeout_ms: 0, ret: 258, elapsed: 0
WSAWaitForMultipleEvents timeout_ms: 0, ret: 258, elapsed: 0
WSAWaitForMultipleEvents timeout_ms: 0, ret: 258, elapsed: 0
WSAWaitForMultipleEvents timeout_ms: 0, ret: 258, elapsed: 0
WSAWaitForMultipleEvents timeout_ms: 0, ret: 258, elapsed: 0
WSAWaitForMultipleEvents timeout_ms: 1000, ret: 0, elapsed: 4
WSAWaitForMultipleEvents timeout_ms: 1000, ret: 258, elapsed: 1000
  8% uploaded...WSAWaitForMultipleEvents timeout_ms: 0, ret: 258, elapsed: 0

If that WSAWaitForMultipleEvents line is commented out then the transfer completes in the expected time (but then of course eats CPU in the interim since there's no poll wait).

According to the commit message in d2a7d7c, a previous version of that commit patch would erroneously wait the full timeout for the socket to become writable. It looks like that issue may not be fully fixed?

Ref: #5397
Ref: #5631
Ref: #5634

/cc @rcombs @mback2k @MarcelRaad @Togtja @ngg @rmja

@jay jay added performance Windows Windows-specific labels Oct 30, 2020
@gvanem
Copy link
Contributor

gvanem commented Oct 30, 2020

Trying Jay's PostUpload example I noted there were 3202 calls to:

WSAIoctl(sockfd, SIO_IDEAL_SEND_BACKLOG_QUERY ...);
setsockopt (784, SOL_SOCKET, SO_SNDBUF, ...);

The Ideal Send Backlog value started at 64kB, a few at 130kB and the remaining at 256 kB. Simply disabling the win_update_buffer_size() call, the transfer speed increased a bit. Somewhat related?

But I got around 9.3 MBit/s which IMHO is very good speed. In Google Chrome it reports 10.2 MBit/s blank to the same Frankfurt, DE server at https://testmy.net/upload. Using a 50 MByte test file for each test.

@jay
Copy link
Member Author

jay commented Oct 30, 2020

The Ideal Send Backlog value started at 64kB, a few at 130kB and the remaining at 256 kB. Simply disabling the win_update_buffer_size() call, the transfer speed increased a bit. Somewhat related?

I see the opposite, performance suffers without it.

17f58c8 (parent of d2a7d7c) with win_update_buffer_size (build default)
Transfer rate: 3886 KB/sec (52429310 bytes in 13 seconds)
Transfer rate: 3182 KB/sec (52429310 bytes in 16 seconds)
Transfer rate: 4103 KB/sec (52429310 bytes in 12 seconds)

17f58c8 (parent of d2a7d7c) without win_update_buffer_size
Transfer rate: 1871 KB/sec (52429310 bytes in 27 seconds)
Transfer rate: 1934 KB/sec (52429310 bytes in 26 seconds)
Transfer rate: 1908 KB/sec (52429310 bytes in 27 seconds)

d2a7d7c with win_update_buffer_size (build default)
Transfer rate: 197 KB/sec (52429310 bytes in 260 seconds)
Transfer rate: 502 KB/sec (52429310 bytes in 102 seconds)
Transfer rate: 255 KB/sec (52429310 bytes in 201 seconds)

d2a7d7c without win_update_buffer_size
Transfer rate: 63 KB/sec (52429310 bytes in 815 seconds)
Transfer rate: 63 KB/sec (52429310 bytes in 813 seconds)
Transfer rate: 63 KB/sec (52429310 bytes in 816 seconds)

curl 7.73.0-DEV (i386-pc-win32) libcurl/7.73.1-DEV OpenSSL/1.0.2t nghttp2/1.40.0

Edit: When I run the 50MB test via the browser I get mbps results similar to what I get from 17f58c8, which is expected.

@mback2k
Copy link
Member

mback2k commented Nov 2, 2020

If the 1 second delay comes from the timeout, the main question should be: why does it run into that timeout?

@mback2k
Copy link
Member

mback2k commented Nov 15, 2020

@jay would you mind performing the same tests for the follow up commits to multi.c which should actually have already improved the situation?

  1. 3334ee6
  2. 003e81e
  3. 1060955

Especially the code section regarding already_writable was replaced after the commits you identified.

@jay
Copy link
Member Author

jay commented Nov 22, 2020

All of them are slow. I think we should work from master for further testing.

If the 1 second delay comes from the timeout, the main question should be: why does it run into that timeout?

That is the question alright and I don't know the answer. I assume the event is supposed to be signaled at that point since there is data, which would mean no wait. One strange thing I've noticed is in some of the results the speed is sometimes doubled or halved. For example, usually 100 seconds but sometimes 50 or 200.

Transfer rate: 489 KB/sec (52429310 bytes in 105 seconds)
Transfer rate: 503 KB/sec (52429310 bytes in 102 seconds)
Transfer rate: 503 KB/sec (52429310 bytes in 102 seconds)
Transfer rate: 255 KB/sec (52429310 bytes in 201 seconds)
Transfer rate: 503 KB/sec (52429310 bytes in 102 seconds)
Transfer rate: 968 KB/sec (52429310 bytes in 53 seconds)
Transfer rate: 255 KB/sec (52429310 bytes in 201 seconds)
Transfer rate: 503 KB/sec (52429310 bytes in 102 seconds)
Transfer rate: 503 KB/sec (52429310 bytes in 102 seconds)

5c8849c (master 2020-11-22)

Transfer rate: 497 KB/sec (52429310 bytes in 103 seconds)
Transfer rate: 503 KB/sec (52429310 bytes in 102 seconds)
Transfer rate: 968 KB/sec (52429310 bytes in 53 seconds)

@mback2k
Copy link
Member

mback2k commented Nov 23, 2020

I had one idea for an alternative implementation of this waiting in multi.c for Windows which could potentially improve the situation again, but avoid the fake socket pairs being created:

  1. revert back to using Curl_poll also on Windows, but just use it for polling with no timeout and exclude the wakeup socket here.
  2. check if any of the sockets in Curl_poll are already ready and skip the next part if that is the case.
  3. if nothing is ready yet, use the new implementation to wait on the sockets including the new wakeup event.

This way we should be able to reduce again the overall number of syscalls, by doing one Curl_poll as readiness pre-check with the old logic and then just perform the waiting with the new logic including the new wakeup event instead of the old wakeup socket pair. This could potentially also reduce the number of ifdef's in the code again.

@bagder
Copy link
Member

bagder commented Nov 23, 2020

How does that reduce the number of syscalls? It'd either be poll or objectwait, right?

More importantly: that logic doesn't work if there's a timeout:

In state (1) it only waits for socket and let's say we have a 1000ms timeout. During this period the wakeup event isn't checked and thus doesn't work. We delay the wakeup with this time.

@mback2k
Copy link
Member

mback2k commented Nov 23, 2020

As I said, step 1 (" with no timeout") wouldn't use a timeout, it would just poll with a timeout of 0 as a pre-check. At the moment a pre-check is done with Curl_socket_check per socket in the wait setup loop, but we could also just use the final poll structure (which is currently disabled for Windows) to do the pre-check without caring for the wakeup socket/event. The wakeup event would only be checked in case of actually waiting. To summarize:

  1. poll with timeout 0 as pre-check on sockets
  2. if nothing ready yet: objectwait with timeout on sockets and wakeup event

mback2k added a commit to mback2k/curl that referenced this issue Nov 24, 2020
This is an attempt to restructure the code in Curl_multi_wait
in such a way that less syscalls are made by removing individual
calls to Curl_socket_check via SOCKET_READABLE/SOCKET_WRITABLE.

Bug: curl#6146
@bagder
Copy link
Member

bagder commented Dec 3, 2020

We're only 6 days from release now. Can we revert this regression and work on a proper fix long-term? The only PR is still draft...

@noloader
Copy link

noloader commented Dec 5, 2020

If that WSAWaitForMultipleEvents line is commented out then the transfer completes in the expected time (but then of course eats CPU in the interim since there's no poll wait).

One comment about this... In the old days of Win32 programming, you could Sleep(0) to give up the remainder of your time slice. A time slice was 30 milliseconds (2 slices) on a client, and 180 milliseconds (12 slices) on a server.

It looks like the behavior has carried through to modern versions of Windows. See Sleep function (synchapi.h).

If you would like to play nicely without a Wait function, then maybe try a Sleep(0). Sleep(0) will yield to other threads. It may work well enough to get you through without spinning on the cpu until you have time to work up a proper fix.

Also be aware the Sleep function yields the remainder of the time slice. SleepEx actually puts the process in a Wait state, so be careful of SleepEx. SleepEx may be just as bad as a Wait function.

Russinovich, Solomon and Ionescu discuss time slices in the Windows Internal books. Find the section on Quantums under Processes and Threads.

@rodwiddowson
Copy link
Contributor

So I know nothing about networking and nothing about usermode programming.

But cold reading the code and the fine documentation

  • At the beginning the function is being used to poll the events (timeout of zero). This is kind of weird but I guess it is expected
  • The next interesting this is that when there is data there the function still says "timeout". So I'm wondering whether the IO completion fires an APC back to this thread in case maybe the final parameter (fAlertable) simply need to be true.
  • Of course you need to be pretty happy that you are safe against any aynchrony that might happen as a result (a big deal in K-mode). If that was the case then you'd need to do the wait in a different thread and get that thread to signal another event and do a non interruptable wait with infinite timeout.

But this is about 10 minutes speculation based on zero insight either into cUrl or WSA.

@tpodom
Copy link

tpodom commented Dec 6, 2020

This may not be on the right track for why the timeout is being hit because I know nothing about the internals of curl.

The WSAWaitForMultipleEvents is being used to wait until the socket is able to receive writes again correct? But, the FD_WRITE is only triggered if there has been a write attempt that was blocked. So, if the buffer sends without ever blocking wouldn't the WSAWaitForMultipleEvents always wait the timeout_ms duration for the next loop iteration in easy_transfer?

I set a breakpoint in Curl_send_plain and I'm never hitting a case where the bytes_written is -1 so I don't think the FD_WRITE would ever be signaled for my scenario.

This SO answer links to the relevant section about the behavior of FD_WRITE:
https://stackoverflow.com/questions/16479106/how-network-event-fd-write-is-generated-when-using-event-driven-sockets

@mback2k
Copy link
Member

mback2k commented Dec 6, 2020

This information is gold, thank you @tpodom. I will try to incorporate this into the next attempt, potentially via #6245.

@tpodom
Copy link

tpodom commented Dec 6, 2020

@mback2k awesome, just tag me if there’s anything you’d like me to test!

mback2k added a commit to mback2k/curl that referenced this issue Dec 6, 2020
This is an attempt to restructure the code in Curl_multi_wait
in such a way that less syscalls are made by removing individual
calls to Curl_socket_check via SOCKET_READABLE/SOCKET_WRITABLE.

Bug: curl#6146
@mback2k
Copy link
Member

mback2k commented Dec 6, 2020

Actually the pre-checking with Curl_poll/select should make it never check/wait for FD_WRITE on a socket via WSAWaitForMultipleEvents since select on Windows should emulate standard behavior and signal readiness for writing every time a send call would not block. So I would appreciate your help with testing #6245 which I just updated. cc @jay

mback2k added a commit to mback2k/curl that referenced this issue Dec 6, 2020
Do not enumerate WinSock network events unless required to
calculate return code in case Curl_poll pre-check failed.

Bug: curl#6146
@bagder bagder closed this as completed in 2260e0e Dec 6, 2020
@curl curl deleted a comment from sak06372509 Dec 6, 2020
mback2k added a commit to mback2k/curl that referenced this issue Dec 9, 2020
This is an attempt to restructure the code in Curl_multi_wait
in such a way that less syscalls are made by removing individual
calls to Curl_socket_check via SOCKET_READABLE/SOCKET_WRITABLE.

Bug: curl#6146
mback2k added a commit to mback2k/curl that referenced this issue Dec 13, 2020
This is an attempt to restructure the code in Curl_multi_wait
in such a way that less syscalls are made by removing individual
calls to Curl_socket_check via SOCKET_READABLE/SOCKET_WRITABLE.

Bug: curl#6146
mback2k added a commit to mback2k/curl that referenced this issue Dec 13, 2020
Reset FD_WRITE by sending zero bytes which is permissible
and will be treated by implementations as successful.

Bug: curl#6146
Closes curl#6245
mback2k added a commit to mback2k/curl that referenced this issue Dec 15, 2020
This is an attempt to restructure the code in Curl_multi_wait
in such a way that less syscalls are made by removing individual
calls to Curl_socket_check via SOCKET_READABLE/SOCKET_WRITABLE.

Bug: curl#6146
mback2k added a commit to mback2k/curl that referenced this issue Dec 15, 2020
Reset FD_WRITE by sending zero bytes which is permissible
and will be treated by implementations as successful.

Bug: curl#6146
Closes curl#6245
mback2k added a commit to mback2k/curl that referenced this issue Feb 25, 2021
This is an attempt to restructure the code in Curl_multi_wait
in such a way that less syscalls are made by removing individual
calls to Curl_socket_check via SOCKET_READABLE/SOCKET_WRITABLE.

Bug: curl#6146
mback2k added a commit to mback2k/curl that referenced this issue Feb 25, 2021
Reset FD_WRITE by sending zero bytes which is permissible
and will be treated by implementations as successful.

Bug: curl#6146
Closes curl#6245
mback2k added a commit to mback2k/curl that referenced this issue Mar 3, 2021
This is an attempt to restructure the code in Curl_multi_wait
in such a way that less syscalls are made by removing individual
calls to Curl_socket_check via SOCKET_READABLE/SOCKET_WRITABLE.

Bug: curl#6146
mback2k added a commit to mback2k/curl that referenced this issue Mar 3, 2021
Reset FD_WRITE by sending zero bytes which is permissible
and will be treated by implementations as successful send.

Reviewed-by: Jay Satiro

Bug: curl#6146
Closes curl#6245
@Eliyahu-Machluf
Copy link

I have a question, as I use libcurl 7.73.0 for our product on Windows.
I see the report, and the suggested fix or code revert applies to multi.c source file and to the function Curl_multi_wait.
If I use only the easy interface, will I also be effected by this issue, or is it relevant to the multi interface solely

@bagder
Copy link
Member

bagder commented Mar 8, 2021

@Eliyahu-Machluf this issue is already fixed in later releases.

It doesn't matter which public API you use to do transfers, they all use the same unified multi code underneath.

@Eliyahu-Machluf
Copy link

@bagder, thanks for the quick response.

Some more questions:

  1. Our product uses libcurl as http client to upload files to hadoop (using its webhdfs interface).
    If I want to check if the product is effected from this issue, which scenario should I run? simply upload large files (using PUT http verb) and compare performance to built kit with older curl version (7.54)? it is related to specific http verb? to other specific parameters?

  2. I saw also curl 7.73.0 for Windows upload performance is extremely bad when server only supports 64KB TCP windows. #6276 which is marked as duplicate of this issue. Does this issue relates to specific tcp windows size? The Windows default is 64KB, so I should see this issue on default Windows configuration, right? We use Windows-2016-Server and Windows-10 machines.

Thanks.

@bagder
Copy link
Member

bagder commented Mar 8, 2021

Since this issue is already fixed, I will not spend time on it anymore.

@Eliyahu-Machluf
Copy link

OK. Thanks.

@jay
Copy link
Member Author

jay commented Mar 9, 2021

@Eliyahu-Machluf if your product is using libcurl 7.73.0 on Windows then it is affected. The performance slowdown is not constant, it depends on network conditions and even then is somewhat arbitrary. I would skip the metrics and upgrade as soon as you can. Sorry for the inconvenience.

mback2k added a commit to mback2k/curl that referenced this issue Mar 12, 2021
1. Consolidate pre-checks into a single Curl_poll call:

This is an attempt to restructure the code in Curl_multi_wait
in such a way that less syscalls are made by removing individual
calls to Curl_socket_check via SOCKET_READABLE/SOCKET_WRITABLE.

2. Avoid resetting the WinSock event multiple times:

We finally call WSAResetEvent anyway, so specifying it as
an optional parameter to WSAEnumNetworkEvents is redundant.

3. Wakeup directly in case no sockets are being monitoring:

Fix the WinSock based implementation to skip extra waiting by
not sleeping in case no sockets are to be waited on and just
the WinSock event is being monitored for wakeup functionality.

Assisted-by: Tommy Odom
Assisted-by: Jay Satiro

Bug: curl#6146
Part of curl#6245
mback2k added a commit to mback2k/curl that referenced this issue Mar 12, 2021
Reset FD_WRITE by sending zero bytes which is permissible
and will be treated by implementations as successful send.

Without this we won't be notified in case a socket is still
writable if we already received such a notification and did
not send any data afterwards on the socket. This would lead
to waiting forever on a writable socket being writable again.

Assisted-by: Tommy Odom
Assisted-by: Jay Satiro
Tested-by: Tommy Odom
Tested-by: tmkk on github

Bug: curl#6146
Part of curl#6245
mback2k added a commit to mback2k/curl that referenced this issue Mar 15, 2021
1. Consolidate pre-checks into a single Curl_poll call:

This is an attempt to restructure the code in Curl_multi_wait
in such a way that less syscalls are made by removing individual
calls to Curl_socket_check via SOCKET_READABLE/SOCKET_WRITABLE.

2. Avoid resetting the WinSock event multiple times:

We finally call WSAResetEvent anyway, so specifying it as
an optional parameter to WSAEnumNetworkEvents is redundant.

3. Wakeup directly in case no sockets are being monitoring:

Fix the WinSock based implementation to skip extra waiting by
not sleeping in case no sockets are to be waited on and just
the WinSock event is being monitored for wakeup functionality.

Assisted-by: Tommy Odom
Assisted-by: Jay Satiro

Bug: curl#6146
Part of curl#6245
mback2k added a commit to mback2k/curl that referenced this issue Mar 15, 2021
Reset FD_WRITE by sending zero bytes which is permissible
and will be treated by implementations as successful send.

Without this we won't be notified in case a socket is still
writable if we already received such a notification and did
not send any data afterwards on the socket. This would lead
to waiting forever on a writable socket being writable again.

Assisted-by: Tommy Odom
Assisted-by: Jay Satiro
Tested-by: Tommy Odom
Tested-by: tmkk on github

Bug: curl#6146
Part of curl#6245
@nirmalavhcl
Copy link

nirmalavhcl commented Mar 23, 2021

@mback2k - Will I be able to just take multi.c changes for fixing this performance issue on my libcurl 7.73 source branch and rebuild? It will be too risky now to take 7.74 release into our code. Are there special instructions to enable the performance changes in this module

@mback2k
Copy link
Member

mback2k commented Mar 23, 2021

You can just try to apply this PR, yes.

@nirmalavhcl
Copy link

ok - 6146 is the issue #, where is the associated PR? Iam seeing a lot of threads for related issues in this page

@mback2k
Copy link
Member

mback2k commented Mar 24, 2021

So 7.74 and 7.75 have been fixed by reverting the problematic commits. And #6245 is the next attempt to implement the changes correctly, so you would have to apply all commits from that PR except the first one which reverts the reverting from 7.74.

@bagder
Copy link
Member

bagder commented Mar 24, 2021

There's likely a much smaller risk to upgrade to the next release rather than to have you custom-patch code with patches you don't seem to have full control or understanding of. I would strongly advice against doing that. Use released versions instead.

mback2k added a commit to mback2k/curl that referenced this issue Apr 5, 2021
1. Consolidate pre-checks into a single Curl_poll call:

This is an attempt to restructure the code in Curl_multi_wait
in such a way that less syscalls are made by removing individual
calls to Curl_socket_check via SOCKET_READABLE/SOCKET_WRITABLE.

2. Avoid resetting the WinSock event multiple times:

We finally call WSAResetEvent anyway, so specifying it as
an optional parameter to WSAEnumNetworkEvents is redundant.

3. Wakeup directly in case no sockets are being monitoring:

Fix the WinSock based implementation to skip extra waiting by
not sleeping in case no sockets are to be waited on and just
the WinSock event is being monitored for wakeup functionality.

Assisted-by: Tommy Odom
Assisted-by: Jay Satiro

Bug: curl#6146
Part of curl#6245
mback2k added a commit to mback2k/curl that referenced this issue Apr 5, 2021
Reset FD_WRITE by sending zero bytes which is permissible
and will be treated by implementations as successful send.

Without this we won't be notified in case a socket is still
writable if we already received such a notification and did
not send any data afterwards on the socket. This would lead
to waiting forever on a writable socket being writable again.

Assisted-by: Tommy Odom
Assisted-by: Jay Satiro
Tested-by: Tommy Odom
Tested-by: tmkk on github

Bug: curl#6146
Part of curl#6245
mback2k added a commit to mback2k/curl that referenced this issue Apr 20, 2021
1. Consolidate pre-checks into a single Curl_poll call:

This is an attempt to restructure the code in Curl_multi_wait
in such a way that less syscalls are made by removing individual
calls to Curl_socket_check via SOCKET_READABLE/SOCKET_WRITABLE.

2. Avoid resetting the WinSock event multiple times:

We finally call WSAResetEvent anyway, so specifying it as
an optional parameter to WSAEnumNetworkEvents is redundant.

3. Wakeup directly in case no sockets are being monitoring:

Fix the WinSock based implementation to skip extra waiting by
not sleeping in case no sockets are to be waited on and just
the WinSock event is being monitored for wakeup functionality.

Assisted-by: Tommy Odom
Assisted-by: Jay Satiro

Bug: curl#6146
Part of curl#6245
mback2k added a commit to mback2k/curl that referenced this issue Apr 20, 2021
Reset FD_WRITE by sending zero bytes which is permissible
and will be treated by implementations as successful send.

Without this we won't be notified in case a socket is still
writable if we already received such a notification and did
not send any data afterwards on the socket. This would lead
to waiting forever on a writable socket being writable again.

Assisted-by: Tommy Odom
Assisted-by: Jay Satiro
Tested-by: Tommy Odom
Tested-by: tmkk on github

Bug: curl#6146
Part of curl#6245
mback2k added a commit to mback2k/curl that referenced this issue Apr 21, 2021
1. Consolidate pre-checks into a single Curl_poll call:

This is an attempt to restructure the code in Curl_multi_wait
in such a way that less syscalls are made by removing individual
calls to Curl_socket_check via SOCKET_READABLE/SOCKET_WRITABLE.

2. Avoid resetting the WinSock event multiple times:

We finally call WSAResetEvent anyway, so specifying it as
an optional parameter to WSAEnumNetworkEvents is redundant.

3. Wakeup directly in case no sockets are being monitoring:

Fix the WinSock based implementation to skip extra waiting by
not sleeping in case no sockets are to be waited on and just
the WinSock event is being monitored for wakeup functionality.

Assisted-by: Tommy Odom
Assisted-by: Jay Satiro

Bug: curl#6146
Part of curl#6245
mback2k added a commit to mback2k/curl that referenced this issue Apr 21, 2021
Reset FD_WRITE by sending zero bytes which is permissible
and will be treated by implementations as successful send.

Without this we won't be notified in case a socket is still
writable if we already received such a notification and did
not send any data afterwards on the socket. This would lead
to waiting forever on a writable socket being writable again.

Assisted-by: Tommy Odom
Assisted-by: Jay Satiro
Tested-by: Tommy Odom
Tested-by: tmkk on github

Bug: curl#6146
Part of curl#6245
mback2k added a commit that referenced this issue Apr 21, 2021
1. Consolidate pre-checks into a single Curl_poll call:

This is an attempt to restructure the code in Curl_multi_wait
in such a way that less syscalls are made by removing individual
calls to Curl_socket_check via SOCKET_READABLE/SOCKET_WRITABLE.

2. Avoid resetting the WinSock event multiple times:

We finally call WSAResetEvent anyway, so specifying it as
an optional parameter to WSAEnumNetworkEvents is redundant.

3. Wakeup directly in case no sockets are being monitoring:

Fix the WinSock based implementation to skip extra waiting by
not sleeping in case no sockets are to be waited on and just
the WinSock event is being monitored for wakeup functionality.

Assisted-by: Tommy Odom
Reviewed-by: Jay Satiro
Reviewed-by: Marcel Raad

Bug: #6146
Closes #6245
mback2k added a commit that referenced this issue Apr 21, 2021
Reset FD_WRITE by sending zero bytes which is permissible
and will be treated by implementations as successful send.

Without this we won't be notified in case a socket is still
writable if we already received such a notification and did
not send any data afterwards on the socket. This would lead
to waiting forever on a writable socket being writable again.

Assisted-by: Tommy Odom
Reviewed-by: Jay Satiro
Reviewed-by: Marcel Raad
Tested-by: tmkk on github

Bug: #6146
Closes #6245
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

Successfully merging a pull request may close this issue.

9 participants