Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

tool: large file range causes OOM error #4393

Closed
SumatraPeter opened this issue Sep 21, 2019 · 11 comments
Closed

tool: large file range causes OOM error #4393

SumatraPeter opened this issue Sep 21, 2019 · 11 comments

Comments

@SumatraPeter
Copy link

A parameterized URI with a numeric sequence like curl "https://example.com/file-[1234567890-9876543210].jpg" -o "file-#1.jpg" fails with the following error:

curl: (3) bad range in URL position 38:
https://example.com/file-[1234567890-9876543210].jpg
                                     ^

Reducing the range to curl "https://example.com/file-[123456789-987654321].jpg" -o "file-#1.jpg" resulted in my system hanging for a while, followed by a brief flash of the command prompt window with curl repeatedly spitting out an Out of Memory error, followed by my monitor going blank and the system requiring a hard reset to recover!

  1. Can you use a larger integer data type (at least 4 bytes if not more) for the range start and end values?

  2. Please fix this bug that's making curl go into an infinite loop of OOM error messages and causing my system to hang so badly. If the range is too large then there should be a single error message printed before the program exits gracefully.


curl -V output:

curl 7.66.0 (x86_64-pc-win32) libcurl/7.66.0 OpenSSL/1.1.1d (Schannel) zlib/1.2.11 brotli/1.0.7 WinIDN libssh2/1.9.0 nghttp2/1.39.2
Release-Date: 2019-09-11
Protocols: dict file ftp ftps gopher http https imap imaps ldap ldaps pop3 pop3s rtsp scp sftp smb smbs smtp smtps telnet tftp
Features: AsynchDNS HTTP2 HTTPS-proxy IDN IPv6 Kerberos Largefile MultiSSL NTLM SPNEGO SSL SSPI TLS-SRP brotli libz

OS: Windows 10 x64 1809 (Build 17763.737)

@jay
Copy link
Member

jay commented Sep 21, 2019

curl should not continue on out of memory, that's certainly a bug. However even though that range is nonsensical to you it is technically allowed afaik. As to why your computer is rebooting I don't think curl is at fault. I tried that command and it didn't reboot my computer but it did eat up a lot of memory briefly.

@bagder
Copy link
Member

bagder commented Sep 21, 2019

I did briefly consider this when I rewrote the internals to handle parallel transfers, as now curl will allocate handles for every transfer before they start. In the case 1234567890-9876543210, that means 8641975320 transfers and each handle will require maybe 6K or so. That equals 49449 GB of memory for that amount of transfers...

@bagder
Copy link
Member

bagder commented Sep 21, 2019

We should probably set a (soft?) maximum number of transfers to allow so that we can catch mistakes. Genuinely asking for 8641975320 transfers is probably rare.

@SumatraPeter
Copy link
Author

SumatraPeter commented Sep 21, 2019

However even though that range is nonsensical to you it is technically allowed afaik.

@jay: Where do you see me saying that the range is 'nonsensical'? If you read my initial write-up again I was actually asking for a larger integer data type for the range start and end values, but from what Daniel has written it seems my problem lies not with the max range values supported but curl's mem allocation strategy in such cases.

As to why your computer is rebooting I don't think curl is at fault.

Again, I never said curl made my PC reboot. I clearly said that curl seemed to go into an infinite loop of OOM error messages, my monitor went blank and the system required a hard reset to recover, which means I myself had to press the reset button to reboot. Given Daniel's explanation that curl ate up all my free mem, that's hardly a surprise.

as now curl will allocate handles for every transfer before they start. In the case 1234567890-9876543210, that means 8641975320 transfers and each handle will require maybe 6K or so. That equals 49449 GB of memory for that amount of transfers...

@bagder: Ah, so curl obviously needs to be fixed so that it quits when it hits the (free mem - some safe buffer) limit. However, this brings up the larger question of whether things can be fundamentally improved here. Is it really necessary to allocate handles for every transfer before they start?

I'm basically looking to use curl to run through a range of file numbers between 0 and 9,999,999 (i.e. range start and end values should support 7 digit integers at least), with a step value of 10 or 100. It would be nice if I can manage that with curl without requiring 60-odd GB of free RAM.

@bagder
Copy link
Member

bagder commented Sep 21, 2019

Is it really necessary to allocate handles for every transfer before they start?

No, that should be improved.

But still, you asked for 8641975320 transfers.If each takes a 100milliseconds, it would take 27 years to complete that operation.

@SumatraPeter
Copy link
Author

SumatraPeter commented Sep 21, 2019

But still, you asked for 8641975320 transfers.If each takes a 100milliseconds, it would take 27 years to complete that operation.

That was just me attempting to test the limits of the range values that curl would accept. :)

As I said above, I need to check <=1 million or <=100,000 values (depending on step count being 10 or 100 and given max range of 0 to 9,999,999), and not every value will lead to a valid file. Still, even max 1 million transfers (100% hit rate) at 100ms each would work out to ~27 hours if I've got my math right, and I'm ok with that.

So, will it be possible to pull this off with a future version of curl alone?

bagder added a commit that referenced this issue Sep 21, 2019
When looping around the ranges and given URLs to create transfers, all
errors should exit the loop and return. Previously it would keep
looping.

Reported-by: SumatraPeter on github
Bug: #4393
@bagder
Copy link
Member

bagder commented Sep 21, 2019

So, will it be possible to pull this off with a future version of curl alone?

If you use a version from before 7.66.0 it was possible and I should hope we can fix this again in a future version as there's really no reason why we shouldn't...

bagder added a commit that referenced this issue Sep 22, 2019
When looping around the ranges and given URLs to create transfers, all
errors should exit the loop and return. Previously it would keep
looping.

Reported-by: SumatraPeter on github
Bug: #4393
Closes #4396
@bagder
Copy link
Member

bagder commented Sep 22, 2019

So I've now at least landed a fix that will abort the loop immediately on the first error. Fixing the handling of large ranges will probably take a little longer, but is certainly worthwhile.

@SumatraPeter
Copy link
Author

Thanks for all your work on this, Daniel.

@bagder bagder changed the title Large file range causes OOM error requiring hard reset tool: large file range causes OOM error Sep 26, 2019
bagder added a commit that referenced this issue Sep 27, 2019
This should again enable crazy-large download ranges of the style
[1-10000000] that otherwise easily ran out of memory starting in 7.66.0
when this new handle allocating scheme was introduced.

Reported-by: Peter Sumatra
Fixes #4393
bagder added a commit that referenced this issue Sep 30, 2019
This should again enable crazy-large download ranges of the style
[1-10000000] that otherwise easily ran out of memory starting in 7.66.0
when this new handle allocating scheme was introduced.

Reported-by: Peter Sumatra
Fixes #4393
bagder added a commit that referenced this issue Oct 1, 2019
This should again enable crazy-large download ranges of the style
[1-10000000] that otherwise easily ran out of memory starting in 7.66.0
when this new handle allocating scheme was introduced.

Reported-by: Peter Sumatra
Fixes #4393
@bagder bagder closed this as completed in e59371a Oct 2, 2019
@SumatraPeter
Copy link
Author

Thanks again, Daniel. Any idea when we can expect Win binaries for the next stable release?

@bagder
Copy link
Member

bagder commented Oct 2, 2019

We can expect them to become available not long after the release (to be called 7.67.0). Which is planned to happen on November 6, 2019.

@lock lock bot locked as resolved and limited conversation to collaborators Jan 1, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Development

Successfully merging a pull request may close this issue.

3 participants