cURL / Mailing Lists / curl-library / Single Mail

curl-library

Re: Problem with CONTENT-ENCODING

From: Dobromir Velev <diadomraz_at_gmail.com>
Date: Tue, 13 Dec 2005 11:25:40 +0200

Hi,

On Monday 12 December 2005 21:51, Dan Fandrich wrote:
> On Mon, Dec 12, 2005 at 08:50:01PM +0200, Dobromir Velev wrote:
> > Here is a small patch that fixes the problem with the missing deflate
> > header in the IIS response. It checks the first byte of the content and
> > if it doesn't look like a header it tells zlib inflate() to process the
> > raw deflate data without looking for headers.
> >
> > --- content_encoding.c 2005-03-31 10:02:03.000000000 +0300
> > +++ content_encoding.new.c 2005-12-12 20:29:23.389573722 +0200
> > @@ -149,14 +149,21 @@
> > z_stream *z = &k->z; /* zlib state structure */
> >
> > /* Initialize zlib? */
> > - if (k->zlib_init == ZLIB_UNINIT) {
> > + if (k->zlib_init == ZLIB_UNINIT && nread) {
> > z->zalloc = (alloc_func)Z_NULL;
> > z->zfree = (free_func)Z_NULL;
> > z->opaque = 0;
> > z->next_in = NULL;
> > z->avail_in = 0;
> > - if (inflateInit(z) != Z_OK)
> > - return process_zlib_error(data, z);
> > + if((k->str[0] & 0xf) != Z_DEFLATED){
> > + if(inflateInit2(z,-MAX_WBITS) != Z_OK)
> > + return process_zlib_error(data, z);
> > + }
> > + else{
> > + if(inflateInit(z) != Z_OK)
> > + return process_zlib_error(data, z);
> > + }
> > +
> > k->zlib_init = ZLIB_INIT;
> > }
>
> This patch will probably cause a crash if Curl_unencode_deflate_write
> is ever called with nread==0. I'm not sure if that will ever happen,
> but I suppose it could if the TCP segment ends right after the HTTP
> headers and before the first byte of data. In any case, this makes the
> code more brittle. Also, the if((k->str[0] & 0xf) != Z_DEFLATED) hack
> should be well commented so the next maintainer understands the reason
> for the brokenness.
That's why added nread in the if clause on line 152. The only possible problem
here is that z will not be inited before any bytes were received but I
thought that was OK. I will send another patch with more comments if needed
- I just saw that zlib is using a similar test in inflate.c to check the
headers and that was were the IIS deflate encoding was failing.

>
> > While testing it I encountered one more problem with broken server
> > encoding. Some server when asked explicitly for deflate encoding will
> > respond with "Content-encoding: deflate" header but will GZIP encode the
> > data so the decoding will fail.
> >
> > This is definitely not a curl issue but I just wanted to let you know
> > about it.
>
> I wasn't aware of this issue. Do you know which server does this? Have
> you reported it to the maintainer?
The problem is that I don't have a lot of different setups to test with so I
tried with several servers on the Internet. Most of the failing servers
ignore the order in the Accept-Encoding header and when a
"Accept-Encoding: deflate, gzip" request is sent they will ignore deflate and
use gzip instead. The problem appears only when a "Accept-Encoding: deflate"
requeste header is sent. Here are two sample sessions that fail

> GET / HTTP/1.1
> User-Agent: curl/7.15.1 (i686-pc-linux-gnu) libcurl/7.15.1 OpenSSL/0.9.7a
zlib/1.2.1.2 libidn/0.5.6
> Host: www.port80software.com
> Accept: */*
> Accept-Encoding: deflate
>
< HTTP/1.1 200 OK
< Date: Mon, 12 Dec 2005 14:43:51 GMT
< Server: Yes we are using ServerMask!
< Set-Cookie: countrycode=BG; path=/
< Set-Cookie: ALT.COOKIE.NAME.2=8MQ21,..S801T.0.QFN4M04M0O,.6M050; path=/
< Cache-control: private
< Content-Length: 6206
< Content-Type: text/html
< Content-Encoding: deflate
< Vary: Accept-Encoding
Error while processing content unencoding: invalid block type

> GET / HTTP/1.1
> User-Agent: curl/7.15.1 (i686-pc-linux-gnu) libcurl/7.15.1 OpenSSL/0.9.7a
zlib/1.2.1.2 libidn/0.5.6
> Host: www.the-gdf.org
> Accept: */*
> Accept-Encoding: deflate
>
< HTTP/1.1 302 Found
< Date: Tue, 13 Dec 2005 08:46:55 GMT
< Server: Apache/2.0.40 (Red Hat Linux)
< Location: http://www.the-gdf.org/wiki/index.php?title=Main_Page
< Content-Length: 311
< Content-Type: text/html; charset=iso-8859-1
* Ignoring the response-body
* Connection #0 to host www.the-gdf.org left intact
* Issue another request to this URL:
'http://www.the-gdf.org/wiki/index.php?title=Main_Page'
* Re-using existing connection! (#0) with host www.the-gdf.org
* Connected to www.the-gdf.org (209.208.199.67) port 80
> GET /wiki/index.php?title=Main_Page HTTP/1.1
> User-Agent: curl/7.15.1 (i686-pc-linux-gnu) libcurl/7.15.1 OpenSSL/0.9.7a
zlib/1.2.1.2 libidn/0.5.6
> Host: www.the-gdf.org
> Accept: */*
> Accept-Encoding: deflate
>
< HTTP/1.1 200 OK
< Date: Tue, 13 Dec 2005 08:46:55 GMT
< Server: Apache/2.0.40 (Red Hat Linux)
< Accept-Ranges: bytes
< X-Powered-By: PHP/4.2.2
< Content-language: en
< Vary: Accept-Encoding
< Expires: -1
< Cache-Control: private, must-revalidate, max-age=0
< Content-Encoding: deflate
< Content-Length: 3758
< Content-Type: text/html; charset=utf-8
Error while processing content unencoding: invalid block type

>
> >>> Dan
Received on 2005-12-13