Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

curl on Windows is slow when output destination is on SMB #12750

Closed
CaptainFlint opened this issue Jan 21, 2024 · 17 comments
Closed

curl on Windows is slow when output destination is on SMB #12750

CaptainFlint opened this issue Jan 21, 2024 · 17 comments
Labels

Comments

@CaptainFlint
Copy link

I did this

Make sure you have a file URL with high download speed; a local disk, and a network SMB resource with high upload speed (either mapped as a disk letter, or accessible via network share URI).
First, try to download the file with --output option pointing onto a file located on the local disk, and mark the speed (might need a few seconds for it to stabilize).
Second, try the same, but with --output pointing to the SMB share, and compare the speed.

In my example, I set up a local HTTP server using python -mhttp.server for a directory with some Linux ISO distributions on my local hard drive (max speed for downloading is 250-300 Mbytes/s); and for the destination I used an SMB share on my locally running CentOS 7 virtual machine (transfer speed varies, but is about 100-250 Mbytes/s). This way I made sure I'm not throttled by my ISP, or the upstream server. But my friend found this issue when downloading a file from Internet via 500 Mbps connection onto his NAS, so it doesn't have to be all local.

My results:

C:\Programs\curl>curl "http://127.0.0.1:8000/ubuntu-22.04.2-desktop-amd64.iso" -o c:\Temp\a.iso
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100 4699M  100 4699M    0     0   244M      0  0:00:19  0:00:19 --:--:--  255M
C:\Programs\curl>curl "http://127.0.0.1:8000/ubuntu-22.04.2-desktop-amd64.iso" -o \\192.168.56.101\flint\tmp\a.iso
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100 4699M  100 4699M    0     0  22.6M      0  0:03:27  0:03:27 --:--:-- 22.7M

In other words, the speed is over 10 times lower when saving onto SMB. To confirm that the SMB itself is not slow, here is the measurement of direct copy of the same file to the same SMB location:

C:\Programs\curl>dir D:\Distribs\OS\ubuntu-20.04.6-live-server-amd64.iso & echo %TIME% & copy D:\Distribs\OS\ubuntu-20.04.6-live-server-amd64.iso \\192.168.56.101\flint\tmp\a.iso & cmd /v:on /c "echo !TIME!"
 Volume in drive D has no label.
 Volume Serial Number is 2461-9C46

 Directory of D:\Distribs\OS

25.07.2023  23:56     1 487 339 520 ubuntu-20.04.6-live-server-amd64.iso
               1 File(s)  1 487 339 520 bytes
               0 Dir(s)  1 267 068 116 992 bytes free
20:52:03,98
        1 file(s) copied.
20:52:15,63

which gives us about 122 Mbytes/s. (And it's not because of caching; I monitored for network activity, and it stopped as soon as the file copying finished.)

I expected the following

The download speed onto SMB should be roughly 100-200 Mbytes/s, as the hardware permits.

curl/libcurl version

Reproduced on several different builds:

  • Downloaded from curl.se:
curl 8.5.0 (x86_64-w64-mingw32) libcurl/8.5.0 LibreSSL/3.8.2 (Schannel) zlib/1.3 brotli/1.1.0 zstd/1.5.5 WinIDN libssh2/1.11.0 nghttp2/1.58.0 ngtcp2/1.1.0 nghttp3/1.1.0
Release-Date: 2023-12-06
Protocols: dict file ftp ftps gopher gophers http https imap imaps ldap ldaps mqtt pop3 pop3s rtsp scp sftp smb smbs smtp smtps telnet tftp ws wss
Features: alt-svc AsynchDNS brotli HSTS HTTP2 HTTP3 HTTPS-proxy IDN IPv6 Kerberos Largefile libz MultiSSL NTLM SPNEGO SSL SSPI threadsafe UnixSockets zstd
  • Preinstalled in Windows 11:
curl 8.4.0 (Windows) libcurl/8.4.0 Schannel WinIDN
Release-Date: 2023-10-11
Protocols: dict file ftp ftps http https imap imaps pop3 pop3s smtp smtps telnet tftp
Features: AsynchDNS HSTS HTTPS-proxy IDN IPv6 Kerberos Largefile NTLM SPNEGO SSL SSPI threadsafe Unicode UnixSockets
  • Built from source using Visual Studio:
curl 8.5.0 (x86_64-pc-win32) libcurl/8.5.0 OpenSSL/3.2.0 zlib/1.3
Release-Date: 2023-12-06
Protocols: dict file ftp ftps gopher gophers http https imap imaps ldap ldaps mqtt pop3 pop3s rtsp smb smbs smtp smtps telnet tftp
Features: alt-svc AsynchDNS HSTS HTTPS-proxy IPv6 Largefile libz NTLM SSL threadsafe UnixSockets

operating system

Windows 10 22H2 Pro x64 (10.0.19045.3570)
Windows 11 23H2 Pro x64 (10.0.22631.3007)

jay added a commit to jay/curl that referenced this issue Jan 21, 2024
This branch will measure how long fwrite, fflush and fclose take to
write data to output files.

To override the default FILE buffer size (which I think internally in
Microsoft's CRT is probably 4k) use CURL_OUTFILE_BUFSIZE.

Example:

REM create a sparse 1GB file
REM
fsutil file createnew c:\temp\input 1073741824
fsutil sparse setflag c:\temp\input
fsutil sparse setrange c:\temp\input 0 1073741824

REM write from input to output using curl with default FILE buffer size
REM for me the total fwrite time is about 600k microseconds on average
REM
  set CURL_OUTFILE_BUFSIZE=
  curld -o output file://c:/temp/input

REM write from input to output using curl with 1MB FILE buffer size
REM for me the total fwrite time is about 350k microseconds on average
REM
set CURL_OUTFILE_BUFSIZE=1048576
curld -o output file://c:/temp/input

Ref: curl#12750
jay added a commit to jay/curl that referenced this issue Jan 21, 2024
This branch will measure how long fwrite, fflush and fclose take to
write data to output files.

To override the default FILE buffer size (which I think internally in
Microsoft's CRT is probably 4k) use CURL_OUTFILE_BUFSIZE.

Example:

REM create a sparse 1GB file
REM
fsutil file createnew c:\temp\input 1073741824
fsutil sparse setflag c:\temp\input
fsutil sparse setrange c:\temp\input 0 1073741824

REM write from input to output using curl with default FILE buffer size
REM for me the total fwrite time is about 600k microseconds on average
REM
set CURL_OUTFILE_BUFSIZE=
curld -o output file://c:/temp/input

REM write from input to output using curl with 1MB FILE buffer size
REM for me the total fwrite time is about 350k microseconds on average
REM
set CURL_OUTFILE_BUFSIZE=1048576
curld -o output file://c:/temp/input

Ref: curl#12750
@jay
Copy link
Member

jay commented Jan 21, 2024

Maybe the fwrite default buffer size is too small? IIRC it's 4k in Windows. However I suspect in most cases microsoft's CRT functions are calling Windows API functions which have their own buffering and I'm sure there would be another layer of buffering for SMB so it didn't try to write 4k chunks repeatedly....

I made a test branch that monitors the cumulative time it takes to fwrite/fflush/fclose the output files in Windows, get it here:

https://github.com/curl/curl/compare/master...jay:curl:win32_measure_fwrite_time?expand=1

Try outputting to a file via SMB with different values (as seen in the example in the commit message)

set CURL_OUTFILE_BUFSIZE=
curl -o \\192.168.56.101\flint\tmp\a.iso file://c:/temp/a.iso

set CURL_OUTFILE_BUFSIZE=1048576
curl -o \\192.168.56.101\flint\tmp\a.iso file://c:/temp/a.iso

I used file:// input above to rule out a server issue retrieving the input

@jay jay added Windows Windows-specific SMB labels Jan 21, 2024
@CaptainFlint
Copy link
Author

I applied the patch on top of 8.5.0, but could not compile it, because in my VS 2019 the internals of the FILE type are completely obscured; the definition is only:

    typedef struct _iobuf
    {
        void* _Placeholder;
    } FILE;

and therefore I get compile errors about _bufsiz being not a member of _iobuf.
However, since the _bufsiz is only accessed in some precautionary check code that does not affect any functionality, I commented out that whole if block and compiled successfully without it. And I confirm that the block size, indeed, does the trick!

C:\Temp\curl-build\builds\libcurl-vc-x64-release-static-ssl-static-zlib-static-ipv6\bin>set CURL_OUTFILE_BUFSIZE=
C:\Temp\curl-build\builds\libcurl-vc-x64-release-static-ssl-static-zlib-static-ipv6\bin>curl -o \\192.168.56.101\flint\tmp\a.iso file://d:/Distribs/OS/ubuntu-20.04.6-live-server-amd64.iso
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100 1418M  100 1418M    0     0  34.8M      0  0:00:40  0:00:40 --:--:-- 33.9M


Spent 40 seconds (40269933 microseconds) writing to (outs)->filename \\192.168.56.101\flint\tmp\a.iso




Spent 0 seconds (0 microseconds) writing to (&per->heads)->filename stdout




Spent 0 seconds (0 microseconds) writing to (&per->etag_save)->filename stdout
C:\Temp\curl-build\builds\libcurl-vc-x64-release-static-ssl-static-zlib-static-ipv6\bin>set CURL_OUTFILE_BUFSIZE=1048576
C:\Temp\curl-build\builds\libcurl-vc-x64-release-static-ssl-static-zlib-static-ipv6\bin>curl -o \\192.168.56.101\flint\tmp\a.iso file://d:/Distribs/OS/ubuntu-20.04.6-live-server-amd64.iso
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0


set file buffer size to 1048576 for filename \\192.168.56.101\flint\tmp\a.iso

100 1418M  100 1418M    0     0   178M      0  0:00:07  0:00:07 --:--:--  161M


Spent 5 seconds (5598418 microseconds) writing to (outs)->filename \\192.168.56.101\flint\tmp\a.iso
 (NOTE: FILE stream buffer size was 1048576)




Spent 0 seconds (0 microseconds) writing to (&per->heads)->filename stdout




Spent 0 seconds (0 microseconds) writing to (&per->etag_save)->filename stdout
C:\Temp\curl-build\builds\libcurl-vc-x64-release-static-ssl-static-zlib-static-ipv6\bin>set CURL_OUTFILE_BUFSIZE=10485760
C:\Temp\curl-build\builds\libcurl-vc-x64-release-static-ssl-static-zlib-static-ipv6\bin>curl -o \\192.168.56.101\flint\tmp\a.iso file://d:/Distribs/OS/ubuntu-20.04.6-live-server-amd64.iso
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0


set file buffer size to 10485760 for filename \\192.168.56.101\flint\tmp\a.iso

100 1418M  100 1418M    0     0   345M      0  0:00:04  0:00:04 --:--:--  378M


Spent 3 seconds (3852967 microseconds) writing to (outs)->filename \\192.168.56.101\flint\tmp\a.iso
 (NOTE: FILE stream buffer size was 10485760)




Spent 0 seconds (0 microseconds) writing to (&per->heads)->filename stdout




Spent 0 seconds (0 microseconds) writing to (&per->etag_save)->filename stdout

jay added a commit to jay/curl that referenced this issue Jan 23, 2024
This branch will measure how long fwrite, fflush and fclose take to
write data to output files.

To override the default FILE buffer size (which I think internally in
Microsoft's CRT is probably 4k) use CURL_OUTFILE_BUFSIZE.

Example:

REM create a sparse 1GB file
REM
fsutil file createnew c:\temp\input 1073741824
fsutil sparse setflag c:\temp\input
fsutil sparse setrange c:\temp\input 0 1073741824

REM write from input to output using curl with default FILE buffer size
REM for me the total fwrite time is about 600k microseconds on average
REM
set CURL_OUTFILE_BUFSIZE=
curld -o output file://c:/temp/input

REM write from input to output using curl with 1MB FILE buffer size
REM for me the total fwrite time is about 350k microseconds on average
REM
set CURL_OUTFILE_BUFSIZE=1048576
curld -o output file://c:/temp/input

Ref: curl#12750
@jay
Copy link
Member

jay commented Jan 23, 2024

Thanks for testing. I have amended the commit to remove the _bufsiz check.

Spent 40 seconds (40269933 microseconds) writing to (outs)->filename \192.168.56.101\flint\tmp\a.iso

Spent 3 seconds (3852967 microseconds) writing to (outs)->filename \192.168.56.101\flint\tmp\a.iso
(NOTE: FILE stream buffer size was 1048576)

That's quite a difference. I tested with a 1GB file and I also see a difference. 38 seconds vs 11 seconds if I set a 10MB FILE buffer.

I also monitored the Windows API calls to see what the Microsoft CRT is actually calling:

The file is opened as I would expect with CreateFile:

CreateFileW ( "\\192.168.x.x\sambashare\output", GENERIC_WRITE, FILE_SHARE_READ | FILE_SHARE_WRITE, 0x001ff820, CREATE_ALWAYS, FILE_ATTRIBUTE_NORMAL, NULL )	

Then it's written:
With default FILE buffer: ~77,000 calls to WriteFile
With 10MB FILE buffer: ~1500 calls to WriteFile

That's staggering. I can't believe Windows doesn't handle this automatically.

edit: actually I guess that amount of calls would be expected however I'm surprised Windows doesn't bundle small writes with some larger intermediate buffer before sending over the network.

@jay
Copy link
Member

jay commented Jan 25, 2024

Can anyone else in Windows reproduce this?

@edmcln
Copy link

edmcln commented Jan 25, 2024

I did a few tests with the official version of the curl. The results are as follows:

First Method (Using --upload-file):

PS C:\Users\???\Desktop\curl> Measure-Command { .\curl.exe -T C:\test\GRMSDK_EN_DVD.iso -u "domain\\???:lol" smb://192.168.x.x/Shared/ }
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100  567M    0     0  100  567M      0  97.0M  0:00:05  0:00:05 --:--:-- 94.8M

TotalSeconds      : 5.9021101

Second Method (using --output):

PS C:\Users\???\Desktop\curl> Measure-Command { .\curl.exe file://C:/test/GRMSDK_EN_DVD.iso -o \\192.168.x.x\Shared\GRMSDK_EN_DVD.iso }
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100  567M  100  567M    0     0  52.6M      0  0:00:10  0:00:10 --:--:-- 52.4M

TotalSeconds      : 10.8728373

Third Method (Using internal Copy command):

PS C:\Users\???\Desktop\curl> Measure-Command { copy C:\test\GRMSDK_EN_DVD.iso \\192.168.x.x\Shared\ }

TotalSeconds      : 2.0368835

The initial results make sense, and I think the SMB protocol version, buffer size, caching, etc. play some role. However, a more in-depth analysis (User+Kernel mode) is required to narrow down the root causes.

Note: The First Method uses SMBv1 (presumably curl project implementation), whereas the other Methods use SMBv2/3 (depending on the OS).

@jay
Copy link
Member

jay commented Jan 30, 2024

Thanks for testing. I think we can say that this is a Windows issue not a curl issue. We could mitigate the effect of Windows' smaller SMB buffer by writing larger blocks for network files. However, I'm hesitant to do that because even if we can reliably detect HANDLEs as network files there may be unwanted performance effects on other protocols. Also, it would use a lot more memory per file which would be a problem if many files were open in parallel mode.

@edmcln
Copy link

edmcln commented Jan 31, 2024

I'm not sure what's wrong with the new build (8.6.0), but the write speed is ~3.x times slower than the 8.5.0 or 8.4.0. I monitored the ReadFile and WriteFile API calls, and the buffer size is indeed different. The read buffer is 8192 bytes vs 102400 in previous versions, and write buffer is 4096 bytes vs 16384.

PS C:\Users\???\Desktop\curl> Measure-Command { .\curl.exe file://C:/test/GRMSDK_EN_DVD.iso -o \\192.168.x.x\Shared\GRMSDK_EN_DVD.iso }
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100  567M  100  567M    0     0  17.7M      0  0:00:32  0:00:32 --:--:-- 17.9M

TotalSeconds      : 32.0689169
PS C:\Users\???\Desktop\curl> .\curl.exe -V
curl 8.6.0 (i686-w64-mingw32) libcurl/8.6.0 LibreSSL/3.8.2 zlib/1.3.1 brotli/1.1.0 zstd/1.5.5 WinIDN libpsl/0.21.5 libssh2/1.11.0 nghttp2/1.59.0 ngtcp2/1.2.0 nghttp3/1.1.0
Release-Date: 2024-01-31
Protocols: dict file ftp ftps gopher gophers http https imap imaps ipfs ipns ldap ldaps mqtt pop3 pop3s rtsp scp sftp smb smbs smtp smtps telnet tftp ws wss
Features: alt-svc AsynchDNS brotli HSTS HTTP2 HTTP3 HTTPS-proxy IDN IPv6 Largefile libz NTLM PSL SSL threadsafe UnixSockets zstd

@jay
Copy link
Member

jay commented Jan 31, 2024

I'm not sure what's wrong with the new build (8.6.0), but the write speed is ~3.x times slower than the 8.5.0 or 8.4.0. I monitored the ReadFile and WriteFile API calls, and the buffer size is indeed different. The read buffer is 8192 bytes vs 102400 in previous versions, and write buffer is 4096 bytes vs 16384.

@icing could this be from your changes to file protocol

@bagder
Copy link
Member

bagder commented Jan 31, 2024

Probably mostly because of a6c9a33.

@bagder
Copy link
Member

bagder commented Jan 31, 2024

But I don't see us reverting this change just because SMB on Windows is silly.

@jay
Copy link
Member

jay commented Jan 31, 2024

But I don't see us reverting this change just because SMB on Windows is silly.

yes but why are we doing that with the smaller read buffer

@bagder
Copy link
Member

bagder commented Jan 31, 2024

Because we no longer use the download buffer - it is now stack based. Because it does not need to be big to do fast transfers everywhere else. And of course because we did not anticipate this particular thing to do even worse.

@CaptainFlint
Copy link
Author

I was going to suggest adding an option for changing the buffer size. Basically what @jay implemented earlier for testing (via CURL_OUTFILE_BUFSIZE), only now officially, with a command line option. This way the users who need downloading onto SMB would be able to get full speed, without affecting others. But if you decided to switch buffer located on stack, that may not be possible anymore...

@icing
Copy link
Contributor

icing commented Feb 1, 2024

@bagder we could replace the stack buffers with allocated ones, using data->set.buffer_size. They need to only exist for the duration of the loops, directly freed again. If you like that, I can make a PR for this.

@bagder
Copy link
Member

bagder commented Feb 1, 2024

@bagder we could replace the stack buffers with allocated ones, using data->set.buffer_size.

Since file:// transfers are also transfers, we could also make use of the download buffer for this...

icing added a commit to icing/curl that referenced this issue Feb 13, 2024
- refs curl#12750
- borrow multi handle's transfer buffer for up- and downloads
  of file: urls.
- restores the buffer size behaviour of curl 8.5.0
@icing
Copy link
Contributor

icing commented Feb 13, 2024

I made #12932 which uses the multi's transfer buffer for file: operations. This should restore the buffer sizes and behaviour exactly as it was in curl 8.5.0.

jay pushed a commit that referenced this issue Feb 18, 2024
- For file:// transfers use the multi handle's transfer buffer for
  up- and downloads.

Prior to this change a6c9a33 (precedes 8.6.0) changed the file://
transfers to use a smaller stack based buffer, and that caused a
significant performance decrease in Windows.

Bug: #12750 (comment)
Reported-by: edmcln@users.noreply.github.com

Closes #12932
@bagder
Copy link
Member

bagder commented Feb 18, 2024

Fixed via #12932

@bagder bagder closed this as completed Feb 18, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Development

No branches or pull requests

5 participants