Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Document lack of method to stop uncompressed downloads #7516

Closed
jidanni opened this issue Jul 30, 2021 · 9 comments
Closed

Document lack of method to stop uncompressed downloads #7516

jidanni opened this issue Jul 30, 2021 · 9 comments

Comments

@jidanni
Copy link
Contributor

jidanni commented Jul 30, 2021

Man page says

   --compressed
          (HTTP) Request a compressed response using one of the algorithms
          curl supports, and automatically decompress the content. Headers
          are not modified.

          If  this  option is used and the server sends an unsupported en‐
          coding, curl will report an error.

Well I did

curl --verbose --compressed --remote-name https://www.ncc.gov.tw/chinese/files/opendata/radio.csv

and --compressed got ignored, but no error was reported.

So maybe the man page should say:

This is a request,
not an order; the server may or may not do it.

like it does elsewhere.

I.e.,

          If  this  option is used and the server sends an unsupported en‐
          coding, curl will report an error.

Add: But no encoding is not an error.
I.e., better have a bigger disk ready, as curl gives you no way of stopping uncompressed downloads.

curl 7.74.0 (x86_64-pc-linux-gnu)

@dfandrich
Copy link
Contributor

dfandrich commented Jul 30, 2021 via email

bagder added a commit that referenced this issue Jul 30, 2021
Clarified

Reported-by: Dan Jacobson
Fixes #7516
bagder added a commit that referenced this issue Jul 30, 2021
Clarified

Reported-by: Dan Jacobson
Fixes #7516
@bagder
Copy link
Member

bagder commented Jul 30, 2021

If you want to store the data compressed then this is not the option for you.

@jidanni
Copy link
Contributor Author

jidanni commented Jul 30, 2021

All I want to do is
"wget the file"
But the file sitting on the server is just a file.csv, not a file.csv.gz .
I told the website that they should offer .gz, but they said government regulations prevent it.
So I thought, OK, I am still willing to download the big file, but I will make sure it is compressed as it goes over the wire.
No I am not especially trying to have it end up compressed when it finally is stored on my disk.
I just want to make sure I only have 2MB instead of 46MB go over the wire.

But my bug stands! There is no way to tell curl "bomb out if the server does not agree to compress the file when sending over the wire."

I.e., with curl, and wget, you better have bigger disks ready, because compression is just seen as a luxury that is OK to skip if not available today.

@jidanni
Copy link
Contributor Author

jidanni commented Jul 30, 2021

One probably needs a complicated two step shell script, that first starts the download with all the flags it can, to encourage compression.
Then stops it after a few bytes, and inspects if compression occurred.
Then starts it again, if it did, and bombs, if it didn't.

@danielgustafsson
Copy link
Member

danielgustafsson commented Jul 30, 2021 via email

@jidanni
Copy link
Contributor Author

jidanni commented Jul 30, 2021

If you would like to be able to do this then that would be a new feature, and you’re welcome to supply a PR for that for us to consider.

OK, I hereby submit a FR (Feature Request), not a PR.

@danielgustafsson
Copy link
Member

danielgustafsson commented Jul 30, 2021 via email

@jay
Copy link
Member

jay commented Aug 1, 2021

I don't think we should add such a feature. This seems like something so obscure it's not going to have use to anyone else, and we'd take on a maintenance cost (as we do with anything like this). The reporter could, in their own fork, modify Curl_build_unencoding_stack to flag if an encoding the server sent will be decompressed, like this:

diff --git a/lib/content_encoding.c b/lib/content_encoding.c
index a84ff54..52f090d 100644
--- a/lib/content_encoding.c
+++ b/lib/content_encoding.c
@@ -1065,6 +1065,9 @@ CURLcode Curl_build_unencoding_stack(struct Curl_easy *data,
       if(!encoding)
         encoding = &error_encoding;  /* Defer error at stack use. */
 
+      if(encoding != &identity_encoding && encoding != &error_encoding)
+        k->writer_will_decompress = true;
+
       /* Stack the unencoding stage. */
       writer = new_unencoding_writer(data, encoding, k->writer_stack);
       if(!writer)
diff --git a/lib/urldata.h b/lib/urldata.h
index 1d99112..2195003 100644
--- a/lib/urldata.h
+++ b/lib/urldata.h
@@ -668,6 +668,8 @@ struct SingleRequest {
 
   /* Content unencoding stack. See sec 3.5, RFC2616. */
   struct contenc_writer *writer_stack;
+  bool writer_will_decompress;
+
   time_t timeofdoc;
   long bodywrites;
   char *location;   /* This points to an allocated version of the Location:

.... and then what? I don't know. Maybe error in Curl_client_write if !data->req.writer_will_decompress and data->set.str[STRING_ENCODING] was set to "" or parse the list and check if it was set to request compression. Seems incomplete.

@jidanni
Copy link
Contributor Author

jidanni commented Aug 1, 2021

Maybe the blame lays on the HTTP designers / RFC authors.

"I said Grandma can only eat cooked meat,
but you sent raw meat over the feeder tubes anyway."

"I can only require "meat", and advise "cooked", but have no defense
against "raw" coming down the tube anyway. And now Grandma is dead.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

Successfully merging a pull request may close this issue.

5 participants