Deflate error after all content was received #2719

clbr · 2018-07-07T16:24:08Z

I did this

curl -o test -k https://stooq.com/q/?s=cbl_e.us -v --compressed

Ever since dbcced8, this download fails with "curl: (23) Failed writing data".

I expected the following

Successful download. It may be curl's or Apache's bug, but in either case curl should handle it gracefully.

curl/libcurl version

curl 7.61.0-DEV (x86_64-unknown-linux-gnu) libcurl/7.61.0-DEV OpenSSL/1.0.2o zlib/1.2.3 librtmp/2.3
Release-Date: [unreleased]
Protocols: dict file ftp ftps gopher http https imap imaps pop3 pop3s rtmp rtsp smb smbs smtp smtps telnet tftp
Features: Largefile NTLM NTLM_WB SSL libz TLS-SRP UnixSockets HTTPS-proxy

operating system

Linux x86_64

The text was updated successfully, but these errors were encountered:

bagder · 2018-07-07T20:08:17Z

in either case curl should handle it gracefully

If this is a broken stream then curl should report error and not silently ignore it. (I'm not saying it is, as I haven't researched it deeper.)

When I try that command line I get a test file in my directory that is 146222 bytes big which looks like it could be the whole file so the error could be something in the final bits of the stream...

clbr · 2018-07-08T07:35:57Z

The error is annoying in a browser, since going to that page then throws up an error page instead of the page contents. Ignoring that error would be bad in contexts where it's more serious.

I added some printfs, and looks like it's because that commit removed the restarting support when the state was ZLIB_UNINIT. Is it legal to have multiple deflate streams?

bagder · 2018-07-08T21:56:08Z

@monnerat, can you figure out if there's anything we can improve here?

monnerat · 2018-07-09T12:14:08Z

This particular case uses the "deflate" algorithm in its deviant form (raw data), with 2 extra bytes appended (0x03, 0x00). The semantics of these bytes are unknown to me, but I can see they are constants even if the page content changes.

This was not detected before the commit mentioned above, since all trailing data were silently ignored, even if consisting of thousands of bytes.

Curl deals with 3 kinds of zlib-supported data:

Gzip data with header and trailer.
Normal deflate data (probably containing header and trailer).
Raw deflate data, as a fallback when initial normal deflate data parsing fails. This has no header/trailer. I've read long ago this was implemented to support lame M$ (IIS?) encoding, without further explanations.

What is strange here is this seems to come from an Apache server: I can suspect it is a reverse proxy to a M$ software, or the encoding is processed by the application itself.

Is it another deviant form of data ? Do we have to support it ? Since I can't understand what these additional bytes are, we need details about what to support. And write down these in our own documentation since these forms are not "official".

In any case, unexpected trailing data should not be ignored without error, since this may result in considering the received data as complete while it is not.

If you have details about expected data in deviant forms we should support, I can investigate for an implementation. Else I have no serious solution.

There is no support for multiple streams in curl, since it is hard to figure how to process them. In our current case, the trailing data is obviously not a second stream.

clbr · 2018-07-09T12:58:05Z

Is it another deviant form of data ? Do we have to support it ?

I have no details on the data itself, but I can confirm that all browsers I have available go to the page without issue. I read that curl tries to match browser behavior, and since I myself use curl in a browser, getting the site working would be good. I agree that silently ignoring them is not a good solution, and more information is needed.

bagder · 2018-07-10T09:08:28Z

It's sometimes hard to mimic browsers since they are often very lenient with errors are never very strict on enforcing what's right or wrong (if it would make users annoyed).

In this case where browsers show the full content without errors and even older curls do that I agree that it feels highly annoying that it now instead shows an error.

This specific content is delivered with chunked-encoding. Can we take the fact that it has delivered the final terminating chunk as an additional clue that we have indeed seen the end of the "document" and then be lenient with gzip errors?

monnerat · 2018-07-10T11:59:09Z

This specific content is delivered with chunked-encoding. Can we take the fact that it has delivered the final terminating chunk as an additional clue that we have indeed seen the end of the "document" and then be lenient with gzip errors?

I dont' like it at all. First, because chunked encoding processing is done at the input level, while other encodings (gtip, deflate, brotli) are handled at the output level (they really should not...). Next, because there can be a lot of unprocessed data (up to buffer size) at the write time. In addition, nothing tells us we cannot have it without chunked encoding... and maybe without data size indication too.

I would prefer a solution being tolerant about the trailer length (i.e.: <= 4 bytes) and only in the case of deviant deflate data.

The best solution remains to identify the meaning of these 2 additional bytes, in order to keep control of what curl does: without it, we are navigating in the fog, without GPS :-)

Maybe browser code can tell us details of their implementation, but that goes out of my skills.

bagder · 2018-07-10T14:57:54Z

Maybe browser code can tell us details of their implementation

As I said, browsers are very lenient when it comes to detecting the end of a document and they will refrain from triggering an error or alert and rather just hope all data got in - and if in doubt they rather just mark the connection as not suitable for reuse.

A while ago I worked on trying to make Firefox's treatment of content-length and "framing" stricter, but I ended up having to surrender because the web is already broken in so many regards in that area so in the end we basically had to let it keep ignoring chunked-encoded streams that are cut off before the last chunk, weirdo gzip stream problems, gzip streams without the trailer etc.

If we were to act like the browsers we would probably: ignore all gzip problems, stop reading at that point of the error and mark the connection for not-reuse.

clbr · 2018-07-17T17:43:00Z

stooq.com works, but I hit another site that fails with master: ebay.

curl -o test -k 'https://www.ebay.com/sch/i.html?LH_CAds=&_ex_kw=&_fpos=&_fspt=1&_mPrRngCbx=1&_nkw=150+in+1+gba&_sacat=&_sadis=&_sop=12&_udhi=&_udlo=&rmvSB=true&_fosrp=1' --compressed

clbr · 2018-07-17T17:43:56Z

Oh, even the front page reproduces, no need for the long url:
curl -o test -k 'https://www.ebay.com' --compressed

monnerat · 2018-07-18T01:19:38Z

I cannot reproduce with the ebay home page: this works as expected here.
The long ebay url fails and already failed with 7.55.1.
This is not the same problem as the stooq.com one: the encoding is not a deviant deflate, but supposed to be a conformant gzip. By dumping the server data, I see some clear text appended to the gzip stream by the server:

This is clearly a server bug.

clbr · 2018-07-18T07:10:38Z

Agreed, that's a server bug. I can't seem to find any way to contact the responsible people though. Would it be possible to add a curl option to ignore any such buggy trailing data? Not being able to access ebay is bad.

monnerat · 2018-07-18T16:31:53Z

Would it be possible to add a curl option to ignore any such buggy trailing data?

I'm not convinced this is the good and right thing to do. Therefore I request other developers opinion.

Not being able to access ebay is bad.

You're still able to access it uncompressed and I think this is the best thing to do when facing a server compression bug.

clbr · 2018-07-18T16:37:52Z

I cannot tell it apart from legit (23) write errors.

monnerat · 2018-07-18T16:43:09Z

Would it be satisfactory for you to get CURLE_BAD_CONTENT_ENCODING (61: Unrecognized or bad HTTP Content or Transfer-Encoding) in our cases ?

clbr · 2018-07-18T17:44:02Z

Not sure if that would also have legit errors. A new "retry without compression" error would work.

However, forcing a page reload without compression would give a worse experience vs other browsers, so ideally I'd want these trailers to be ignored. It might cause a visible delay up to one second, plus the time taken by an uncompressed download, and then having to keep track which sites need to disable compression.

clbr · 2018-07-18T17:45:15Z

(well, a CURLE_RETRY_WITHOUT_COMPRESSION would also achieve the ideal goal, since I could then ignore that error without fear of false positives)

monnerat · 2018-07-18T19:03:09Z

The following situations may currently return CURLE_BAD_CONTENT_ENCODING:

Unrecognized encoding token in header Content-Encoding or Transfer-Encoding
Base64 decoding error (wrong input data)
Real deflate, gzip or brotli decode error returned by the corresponding libraries.
WWW-Authenticate or Proxy-Authenticate should be digest and is not.

clbr · 2018-07-19T07:35:13Z

Okay, all those are errors I want to show to the user, so they would be false positives from the ignoring POV.

bagder · 2018-07-20T13:57:50Z

CURLE_RETRY_WITHOUT_COMPRESSION

That would not be an error code name I would approve of. We identify the problem with the error codes. The correct next action is then up to the application, I don't think the error code itself should suggest that as we can't possibly know what the application actually wants to do.

I think we should rather increase our efforts to figure out when we get these errors after all "relevant" data has already been returned so that we can ignore the error (but possibly mark the connection for close since it's in a dubious state).

clbr · 2018-07-25T10:58:26Z

Should I open a new issue so this doesn't get forgotten, or can this be reopened?

AGenchev · 2018-08-20T09:27:11Z

I was trying to find what's that broken server software and while ebay hid their server,
I found that this site: https://dir.bg reports nginx (without version)
curl -o test.htm -k 'https://dir.bg' --compressed
fails and the decoded page is not complete.

…ate data Some servers issue raw deflate data that may be followed by an undocumented trailer. This commit makes curl tolerate such a trailer of up to 4 bytes before considering the data is in error. Reported-by: clbr on github Fixes curl#2719

bagder added the HTTP label Jul 7, 2018

monnerat closed this as completed in f8be737 Jul 12, 2018

bagder reopened this Jul 25, 2018

bagder changed the title ~~Deflate regression~~ Deflate error after all content was received Jul 25, 2018

bagder closed this as completed in 917b1bb Feb 14, 2019

monnerat mentioned this issue Apr 30, 2019

GET requests with gzip encoding fail with CURLE_WRITE_ERROR #3796

Closed

lock bot locked as resolved and limited conversation to collaborators May 15, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Deflate error after all content was received #2719

Deflate error after all content was received #2719

clbr commented Jul 7, 2018

bagder commented Jul 7, 2018

clbr commented Jul 8, 2018

bagder commented Jul 8, 2018

monnerat commented Jul 9, 2018

clbr commented Jul 9, 2018 via email

bagder commented Jul 10, 2018

monnerat commented Jul 10, 2018

bagder commented Jul 10, 2018

clbr commented Jul 17, 2018

clbr commented Jul 17, 2018

monnerat commented Jul 18, 2018

clbr commented Jul 18, 2018 via email

monnerat commented Jul 18, 2018

clbr commented Jul 18, 2018

monnerat commented Jul 18, 2018

clbr commented Jul 18, 2018

clbr commented Jul 18, 2018

monnerat commented Jul 18, 2018 •

edited

clbr commented Jul 19, 2018

bagder commented Jul 20, 2018

clbr commented Jul 25, 2018

AGenchev commented Aug 20, 2018 •

edited

Deflate error after all content was received #2719

Deflate error after all content was received #2719

Comments

clbr commented Jul 7, 2018

I did this

I expected the following

curl/libcurl version

operating system

bagder commented Jul 7, 2018

clbr commented Jul 8, 2018

bagder commented Jul 8, 2018

monnerat commented Jul 9, 2018

clbr commented Jul 9, 2018 via email

bagder commented Jul 10, 2018

monnerat commented Jul 10, 2018

bagder commented Jul 10, 2018

clbr commented Jul 17, 2018

clbr commented Jul 17, 2018

monnerat commented Jul 18, 2018

clbr commented Jul 18, 2018 via email

monnerat commented Jul 18, 2018

clbr commented Jul 18, 2018

monnerat commented Jul 18, 2018

clbr commented Jul 18, 2018

clbr commented Jul 18, 2018

monnerat commented Jul 18, 2018 • edited

clbr commented Jul 19, 2018

bagder commented Jul 20, 2018

clbr commented Jul 25, 2018

AGenchev commented Aug 20, 2018 • edited

monnerat commented Jul 18, 2018 •

edited

AGenchev commented Aug 20, 2018 •

edited