Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HTTP Raw option on win64 doubles amount of data when used with HTTP Chunks #2303

Closed
driekus77 opened this issue Feb 10, 2018 · 19 comments
Closed
Labels

Comments

@driekus77
Copy link

I did this

I'm investigating HTTP Chunked vs No Chunks using NancyFX 2.0 / Kestrel Web Server. During this investigation I noticed big file differences when using cURL Raw option. The output both for binary and text doubles in size when using cURL on Win64 in combination with HTTP Chunking.
When I use cURL on my macbook to the same server url I get the expected size and result.

I expected the following

Same result on Win64 as on my Macbook!

curl/libcurl version

curl/7.58.0 WRONG (Win64)
curl/7.43.0 GOOD (Mac High Sierra)

[curl -V output]
On my macbook where it works as expected:

curl -ivs --no-keepalive --raw -o raw_linecounts_txt_chunked.mac.bin   http://192.168.1.88:8088/tests/stream/LineCounts.txt
*   Trying 192.168.1.88...
* Connected to 192.168.1.88 (192.168.1.88) port 8088 (#0)
> GET /tests/stream/LineCounts.txt HTTP/1.1
> Host: 192.168.1.88:8088
> User-Agent: curl/7.43.0
> Accept: */*

On my Windows 10 64 bits
hp probook where it does not work as expected:**

curl -ivs --no-keepalive --raw -o raw_linecounts_text_chunks.win64.bin  http://192.168.1.88:8088/tests/stream/LineCounts.txt
*   Trying 192.168.1.88...
* TCP_NODELAY set
* Connected to 192.168.1.88 (192.168.1.88) port 8088 (#0)
> GET /tests/stream/LineCounts.txt HTTP/1.1
> Host: 192.168.1.88:8088
> User-Agent: curl/7.58.0
> Accept: */*

operating system

Problem showed up on Windows 10 - 64 bits on my HP Probook.

Attached 4 files zipped showing the problem when you diff them:
cURL_Raw_Chunked_vs_NotChunked.zip

I'm not that in to network related stuff so it could be I'm doing something wrong. But the difference in file size is strange.

Kind regards,

Henry Roeland

@jay
Copy link
Member

jay commented Feb 10, 2018

Can you please upload LineCounts.txt

@jay jay added the HTTP label Feb 10, 2018
@driekus77
Copy link
Author

A sorry for that! According to Notepad++ this file is UTF-8 without a Byte Order Mark (BOM).
LineCounts.txt

@bagder
Copy link
Member

bagder commented Feb 10, 2018

If you use --trace-ascii dumpfile you'll see exactly what curl receives (and sends). Can you attach the 'dumpfile' of the problematic case here for us?

To me this looks like your server sends curl something weird. What does curl say on this response when you don't use --raw ?

@driekus77
Copy link
Author

driekus77 commented Feb 10, 2018

Curl version 7.53.1 is not having this issue:

> curl -ivs --no-keepalive --raw -o raw2_linecounts_text_chunks.win64.bin  http://192.168.1.88:8088/tests/stream/LineCounts.txt
> *   Trying 192.168.1.88...
> * TCP_NODELAY set
> * Connected to 192.168.1.88 (192.168.1.88) port 8088 (#0)
> > GET /tests/stream/LineCounts.txt HTTP/1.1
> > Host: 192.168.1.88:8088
> > User-Agent: curl/7.53.1
> > Accept: */*
> >
> < HTTP/1.1 200 OK
> < Date: Sat, 10 Feb 2018 22:50:53 GMT
> < Content-Type: text/plain
> < Server: Kestrel
> < Transfer-Encoding: chunked
> <
> { [8479 bytes data]
> * Connection #0 to host 192.168.1.88 left intact

curls_7_53_1_Test.zip

@driekus77
Copy link
Author

driekus77 commented Feb 10, 2018

Now the dump/trace files for both chunked and no chunks:
trace.zip

Using cURL on Windows 7.58.0 as before!

@jay
Copy link
Member

jay commented Feb 10, 2018

hm something is up. i'll bisect it

@driekus77
Copy link
Author

For clarification:
I don't see the wrong data in the browser (Chrome) or under Wireshark when using HTTP Chunks.
Only when using the --raw function in cURL version 7.58.0 on Windows 64 bits.

@driekus77
Copy link
Author

driekus77 commented Feb 11, 2018

For anybody interested:
I found a public URL which serves HTTP Chunked image:

curl -ivs --no-keepalive --raw  -o raw_chunkedimage.jpg http://www.httpwatch.com/httpgallery/chunked/chunkedimage.aspx

Difference in file sizes between cURL version 7.53.1 and 7.58.0 on Win64:

> User-Agent: curl/7.53.1
02/11/2018  12:25 PM            34,196 raw_chunkedimage.jpg

> User-Agent: curl/7.58.0
02/11/2018  12:25 PM            67,849 raw_chunkedimage.jpg

@driekus77
Copy link
Author

It looks to be a version thing: cURL 7.58.0 builded on my macbook has te same issue.

-rw-r--r--  1 henry  staff    **33K** Feb 11 22:12 raw_chunkedimage_v7_43_0.jpg
-rw-r--r--  1 henry  staff    **66K** Feb 11 22:13 raw_chunkedimage_v7_58_0.jpg

Diff on trace files:

diff raw_chunkedimage_v7_43_0.jpg.trdmp raw_chunkedimage_v7_58_0.jpg.trdmp 
1a2
> == Info: TCP_NODELAY set
6c7
< 004e: User-Agent: curl/7.43.0
---
> 004e: User-Agent: curl/7.58.0
30c31
< 0000: Date: Sun, 11 Feb 2018 21:17:44 GMT
---
> 0000: Date: Sun, 11 Feb 2018 21:19:15 GMT

Notice the TCP_NODELAY difference! But when I explicitly set

--tcp-nodelay

for both raw output I still have double for v7.58.0.

Trace files:
Archive.zip

@bagder
Copy link
Member

bagder commented Feb 12, 2018

TCP_NODELAY is set by default since 7.50.2, so totally expected.

@driekus77
Copy link
Author

Tested some older versions and found the version in which its different:

-rw-r--r--  1 henry  staff    33K Feb 12 01:58 raw_chunkedimage_v7_56_0.bin
-rw-r--r--  1 henry  staff    33K Feb 12 02:03 raw_chunkedimage_v7_56_1.bin
-rw-r--r--  1 henry  staff    66K Feb 12 01:53 raw_chunkedimage_v7_57_0.bin
-rw-r--r--  1 henry  staff    66K Feb 12 01:52 raw_chunkedimage_v7_58_0.bin

Source compare and Github blame comes up with:
dbcced8#diff-3bd07f668a09e230441f7991bc8a68ca

Good luck in fixing!
Keep up the good work with cURL!

@monnerat
Copy link
Contributor

My bad: bug introduced in commit dbcced8.
I will issue a fix ASAP.

monnerat added a commit that referenced this issue Feb 12, 2018
@monnerat
Copy link
Contributor

Commit 155ea88 in master should fix the issue.

@jay
Copy link
Member

jay commented Feb 12, 2018

Commit 155ea88 in master should fix the issue.

Works here.

@monnerat
Copy link
Contributor

Works here.

Thanks for testing !

@bagder
Copy link
Member

bagder commented Feb 12, 2018

Can any of you think of a test we could create that would've caught this?

@monnerat
Copy link
Contributor

I will try to create one.

@driekus77
Copy link
Author

Test:
The sum of all the Chunk lengths should be equal to the resulting raw file size(?).

Question:
Are you guys into unit testing, integration testing or system testing?

Building v7.58.0 from cloned master on my mac showed me:

-rw-r--r--  1 henry  staff  67849 Feb 12 18:17 raw_chunkedimage_v7_58_0_beforePatch.bin
-rw-r--r--  1 henry  staff  34196 Feb 12 18:32 raw_chunkedimage_v7_58_0_afterPatch.bin

So on Mac its fine. Unfortunately I don't have time to rebuild it on Win64 but I think this is not really necessary to check it there.

Thanks guys for the quick actions and feedback!
I really like curl and its enjoying to work with.

Kind regards,
Henry Roeland

monnerat added a commit that referenced this issue Feb 13, 2018
Test 319 checks proper raw mode data with non-chunked gzip
transfer-encoded server data.
Test 326 checks raw mode with chunked server data.

Bug: #2303
Closes #2308
@jay
Copy link
Member

jay commented Feb 13, 2018

The sum of all the Chunk lengths should be equal to the resulting raw file size(?).

No. --raw disables all decoding. In this case you have included the http headers in the response (-i) so those come first, then the chunked encoding is not being decoded so the contents will be the hex value of each chunk and then the chunk, and finally a chunk of 0 (assuming the transfer completed).

So on Mac its fine. Unfortunately I don't have time to rebuild it on Win64 but I think this is not really necessary to check it there.

I checked Win64 and it works there. Thanks for your report and all your follow-ups.

Are you guys into unit testing, integration testing or system testing?

curl has unit tests and also full tests using the curl tool, which also tests libcurl and I guess you could refer to as system testing. integration testing depends how you define it. the tests are not combined, they are run sequentially. if some scenario needs to be varied in most cases there's a libcurl test with an ifdef guard separating the two tests or just a separate test.

@monnerat added 2 tests for this issue in e551910.

@jay jay closed this as completed Feb 13, 2018
@lock lock bot locked as resolved and limited conversation to collaborators May 14, 2018
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Development

No branches or pull requests

4 participants