Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fails to get the page from www.washingtonpost.com/wp-srv but wget works #12643

Closed
ghost opened this issue Jan 6, 2024 · 13 comments
Closed

Fails to get the page from www.washingtonpost.com/wp-srv but wget works #12643

ghost opened this issue Jan 6, 2024 · 13 comments
Labels
HTTP not-a-bug This is not a bug in curl

Comments

@ghost
Copy link

ghost commented Jan 6, 2024

result_files.zip

I did this

curl -q --ipv4 -v https://www.washingtonpost.com/wp-srv/politics/special/clinton/stories/bctest092198_1.htm -o 10.html >e1_0 2>e_2_0

I expected the following

To receive a content of the web page on this address. Instead, the file 10.html has 0 (zero) bytes.

After the execution of the above command the file e_2_0 contains:

 % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
         Dload  Upload   Total   Spent    Left  Speed

  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0* Host www.washingtonpost.com:443 was resolved.
* IPv6: (none)
* IPv4: 104.96.133.89
*   Trying 104.96.133.89:443...
* Connected to www.washingtonpost.com (104.96.133.89) port 443
* ALPN: curl offers h2,http/1.1
* TLSv1.3 (OUT), TLS handshake, Client hello (1):
} [321 bytes data]
*  CAfile: c:\util\curl-ca-bundle.crt
*  CApath: none
* TLSv1.3 (IN), TLS handshake, Server hello (2):
{ [122 bytes data]
* TLSv1.3 (IN), TLS handshake, Unknown (8):
{ [29 bytes data]
* TLSv1.3 (IN), TLS handshake, Certificate (11):
{ [3442 bytes data]
* TLSv1.3 (IN), TLS handshake, CERT verify (15):
{ [264 bytes data]
* TLSv1.3 (IN), TLS handshake, Finished (20):
{ [52 bytes data]
* TLSv1.3 (OUT), TLS handshake, Finished (20):
} [52 bytes data]
* SSL connection using TLSv1.3 / TLS_AES_256_GCM_SHA384 / [blank] / UNDEF
* ALPN: server accepted h2
* Server certificate:
*  subject: C=US; ST=District of Columbia; L=Washington; jurisdictionCountryName=US; jurisdictionStateOrProvinceName=Delaware; O=The Washington Post (WP Company LLC); businessCategory=Private Organization; serialNumber=415412; CN=www.washingtonpost.com
*  start date: Apr 12 18:58:40 2023 GMT
*  expire date: May 12 18:58:40 2024 GMT
*  subjectAltName: host "www.washingtonpost.com" matched cert's "www.washingtonpost.com"
*  issuer: C=US; O=Entrust, Inc.; OU=See www.entrust.net/legal-terms; OU=(c) 2014 Entrust, Inc. - for authorized use only; CN=Entrust Certification Authority - L1M
*  SSL certificate verify ok.
*   Certificate level 0: Public key type ? (2048/112 Bits/secBits), signed using sha256WithRSAEncryption
*   Certificate level 1: Public key type ? (2048/112 Bits/secBits), signed using sha256WithRSAEncryption
*   Certificate level 2: Public key type ? (2048/112 Bits/secBits), signed using sha256WithRSAEncryption
* using HTTP/2
* [HTTP/2] [1] OPENED stream for https://www.washingtonpost.com/wp-srv/politics/special/clinton/stories/bctest092198_1.htm
* [HTTP/2] [1] [:method: GET]
* [HTTP/2] [1] [:scheme: https]
* [HTTP/2] [1] [:authority: www.washingtonpost.com]
* [HTTP/2] [1] [:path: /wp-srv/politics/special/clinton/stories/bctest092198_1.htm]
* [HTTP/2] [1] [user-agent: curl/8.5.0]
* [HTTP/2] [1] [accept: */*]
> GET /wp-srv/politics/special/clinton/stories/bctest092198_1.htm HTTP/2

> Host: www.washingtonpost.com

> User-Agent: curl/8.5.0

> Accept: */*

> 

* old SSL session ID is stale, removing

  0     0    0     0    0     0      0      0 --:--:--  0:00:01 --:--:--     0
  0     0    0     0    0     0      0      0 --:--:--  0:00:02 --:--:--     0
  0     0    0     0    0     0      0      0 --:--:--  0:00:03 --:--:--     0
  0     0    0     0    0     0      0      0 --:--:--  0:00:04 --:--:--     0
  0     0    0     0    0     0      0      0 --:--:--  0:00:05 --:--:--     0
  0     0    0     0    0     0      0      0 --:--:--  0:00:06 --:--:--     0
  0     0    0     0    0     0      0      0 --:--:--  0:00:07 --:--:--     0
  0     0    0     0    0     0      0      0 --:--:--  0:00:08 --:--:--     0< HTTP/2 302 

< content-length: 0

< location: https://www.washingtonpost.com/wp-srv/politics/special/clinton/stories/bctest092198_1.htm

< cache-control: max-age=0

< expires: Sat, 06 Jan 2024 16:56:54 GMT

< date: Sat, 06 Jan 2024 16:56:54 GMT

< set-cookie: wp_devicetype=0; expires=Mon, 05-Feb-2024 16:56:54 GMT; path=/; domain=.washingtonpost.com; secure; SameSite=None

< set-cookie: wp_ak_pct=0|20230131; max-age= 2592000; path=/; domain=.washingtonpost.com; SameSite=None; secure

< set-cookie: wp_ak_mab=0|20230101; max-age=0; path=/; domain=.washingtonpost.com; SameSite=None; secure

< set-cookie: wp_ak_signinv2=1|20230125; max-age=2592000; path=/; domain=.washingtonpost.com; SameSite=None; secure

< set-cookie: wp_ak_om=1|20230731; max-age=2592000; path=/; domain=.washingtonpost.com; SameSite=None; secure

< set-cookie: wp_ak_wab=0|0|2|0|0|1|1|1|2|20230418; max-age=31536000; path=/; domain=.washingtonpost.com; SameSite=None; secure

< set-cookie: wp_ak_v_mab=0|0|0|1|20231130; max-age=31536000; path=/; domain=.washingtonpost.com; SameSite=None; secure

< set-cookie: wp_ak_v_ot=1; max-age=5184000; path=/; domain=.washingtonpost.com; SameSite=None; secure

< set-cookie: wp_ak_ot=1|20211012; max-age=5184000; path=/; domain=.washingtonpost.com; SameSite=None; secure

< set-cookie: wp_geo=AT||||EEA; max-age=3600; path=/; domain=.washingtonpost.com; SameSite=None; secure

< akamai-true-ttl: -1

< content-security-policy: upgrade-insecure-requests

< strict-transport-security: max-age=15768000

< x-frame-options: SAMEORIGIN

< x-wp-request-id: 0.044e1202.1704560205.f1e3b177

< server-timing: ak_p; desc="1704560205101_34754052_4058231159_898326_10936_10_16_15";dur=1

< 


  0     0    0     0    0     0      0      0 --:--:--  0:00:09 --:--:--     0
  0     0    0     0    0     0      0      0 --:--:--  0:00:09 --:--:--     0
* Connection #0 to host www.washingtonpost.com left intact

Using wget, e.g.:

wget https://www.washingtonpost.com/wp-srv/politics/special/clinton/stories/bctest092198_1.htm -O w.html >w1 2>w2

stores the expected page in w.html of 29 KB.

Modifying the curl request to

curl -q --ipv4 -v -L --max-redirs 3 https://www.washingtonpost.com/wp-srv/politics/special/clinton/stories/bctest092198_1.htm -o 13.html >e1_3 2>e_2_3

The above command results again in not producing the 13.html file and the file e_2_3 is:

% Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
Dload  Upload   Total   Spent    Left  Speed

0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0* Host www.washingtonpost.com:443 was resolved.
* IPv6: (none)
* IPv4: 104.96.133.89
*   Trying 104.96.133.89:443...
* Connected to www.washingtonpost.com (104.96.133.89) port 443
* ALPN: curl offers h2,http/1.1
* TLSv1.3 (OUT), TLS handshake, Client hello (1):
} [321 bytes data]
*  CAfile: c:\util\curl-ca-bundle.crt
*  CApath: none
* TLSv1.3 (IN), TLS handshake, Server hello (2):
{ [122 bytes data]
* TLSv1.3 (IN), TLS handshake, Unknown (8):
{ [29 bytes data]
* TLSv1.3 (IN), TLS handshake, Certificate (11):
{ [3442 bytes data]
* TLSv1.3 (IN), TLS handshake, CERT verify (15):
{ [264 bytes data]
* TLSv1.3 (IN), TLS handshake, Finished (20):
{ [52 bytes data]
* TLSv1.3 (OUT), TLS handshake, Finished (20):
} [52 bytes data]
* SSL connection using TLSv1.3 / TLS_AES_256_GCM_SHA384 / [blank] / UNDEF
* ALPN: server accepted h2
* Server certificate:
*  subject: C=US; ST=District of Columbia; L=Washington; jurisdictionCountryName=US; jurisdictionStateOrProvinceName=Delaware; O=The Washington Post (WP Company LLC); businessCategory=Private Organization; serialNumber=415412; CN=www.washingtonpost.com
*  start date: Apr 12 18:58:40 2023 GMT
*  expire date: May 12 18:58:40 2024 GMT
*  subjectAltName: host "www.washingtonpost.com" matched cert's "www.washingtonpost.com"
*  issuer: C=US; O=Entrust, Inc.; OU=See www.entrust.net/legal-terms; OU=(c) 2014 Entrust, Inc. - for authorized use only; CN=Entrust Certification Authority - L1M
*  SSL certificate verify ok.
*   Certificate level 0: Public key type ? (2048/112 Bits/secBits), signed using sha256WithRSAEncryption
*   Certificate level 1: Public key type ? (2048/112 Bits/secBits), signed using sha256WithRSAEncryption
*   Certificate level 2: Public key type ? (2048/112 Bits/secBits), signed using sha256WithRSAEncryption
* using HTTP/2
* [HTTP/2] [1] OPENED stream for https://www.washingtonpost.com/wp-srv/politics/special/clinton/stories/bctest092198_1.htm
* [HTTP/2] [1] [:method: GET]
* [HTTP/2] [1] [:scheme: https]
* [HTTP/2] [1] [:authority: www.washingtonpost.com]
* [HTTP/2] [1] [:path: /wp-srv/politics/special/clinton/stories/bctest092198_1.htm]
* [HTTP/2] [1] [user-agent: curl/8.5.0]
* [HTTP/2] [1] [accept: */*]
> GET /wp-srv/politics/special/clinton/stories/bctest092198_1.htm HTTP/2

> Host: www.washingtonpost.com

> User-Agent: curl/8.5.0

> Accept: */*

> 

* old SSL session ID is stale, removing

0     0    0     0    0     0      0      0 --:--:--  0:00:01 --:--:--     0
0     0    0     0    0     0      0      0 --:--:--  0:00:02 --:--:--     0
0     0    0     0    0     0      0      0 --:--:--  0:00:03 --:--:--     0
0     0    0     0    0     0      0      0 --:--:--  0:00:04 --:--:--     0
0     0    0     0    0     0      0      0 --:--:--  0:00:05 --:--:--     0
0     0    0     0    0     0      0      0 --:--:--  0:00:06 --:--:--     0
0     0    0     0    0     0      0      0 --:--:--  0:00:07 --:--:--     0
0     0    0     0    0     0      0      0 --:--:--  0:00:08 --:--:--     0
0     0    0     0    0     0      0      0 --:--:--  0:00:09 --:--:--     0< HTTP/2 302 

< content-length: 0

< location: https://www.washingtonpost.com/wp-srv/politics/special/clinton/stories/bctest092198_1.htm

< cache-control: max-age=0

< expires: Sat, 06 Jan 2024 16:47:15 GMT

< date: Sat, 06 Jan 2024 16:47:15 GMT

< set-cookie: wp_devicetype=0; expires=Mon, 05-Feb-2024 16:47:15 GMT; path=/; domain=.washingtonpost.com; secure; SameSite=None

< set-cookie: wp_ak_pct=0|20230131; max-age= 2592000; path=/; domain=.washingtonpost.com; SameSite=None; secure

< set-cookie: wp_ak_mab=0|20230101; max-age=0; path=/; domain=.washingtonpost.com; SameSite=None; secure

< set-cookie: wp_ak_signinv2=1|20230125; max-age=2592000; path=/; domain=.washingtonpost.com; SameSite=None; secure

< set-cookie: wp_ak_om=1|20230731; max-age=2592000; path=/; domain=.washingtonpost.com; SameSite=None; secure

< set-cookie: wp_ak_wab=0|1|2|0|0|1|1|1|0|20230418; max-age=31536000; path=/; domain=.washingtonpost.com; SameSite=None; secure

< set-cookie: wp_ak_v_mab=0|0|0|1|20231130; max-age=31536000; path=/; domain=.washingtonpost.com; SameSite=None; secure

< set-cookie: wp_ak_v_ot=1; max-age=5184000; path=/; domain=.washingtonpost.com; SameSite=None; secure

< set-cookie: wp_ak_ot=1|20211012; max-age=5184000; path=/; domain=.washingtonpost.com; SameSite=None; secure

< set-cookie: wp_geo=AT||||EEA; max-age=3600; path=/; domain=.washingtonpost.com; SameSite=None; secure

< akamai-true-ttl: -1

< content-security-policy: upgrade-insecure-requests

< strict-transport-security: max-age=15768000

< x-frame-options: SAMEORIGIN

< x-wp-request-id: 0.044e1202.1704559625.f1d308b7

< server-timing: ak_p; desc="1704559625465_34754052_4057139383_998162_9748_10_22_15";dur=1

< 


0     0    0     0    0     0      0      0 --:--:--  0:00:10 --:--:--     0
0     0    0     0    0     0      0      0 --:--:--  0:00:10 --:--:--     0
* Connection #0 to host www.washingtonpost.com left intact
* Issue another request to this URL: 'https://www.washingtonpost.com/wp-srv/politics/special/clinton/stories/bctest092198_1.htm'
* Found bundle for host: 0xdb83277da0 [can multiplex]
* Re-using existing connection with host www.washingtonpost.com
* [HTTP/2] [3] OPENED stream for https://www.washingtonpost.com/wp-srv/politics/special/clinton/stories/bctest092198_1.htm
* [HTTP/2] [3] [:method: GET]
* [HTTP/2] [3] [:scheme: https]
* [HTTP/2] [3] [:authority: www.washingtonpost.com]
* [HTTP/2] [3] [:path: /wp-srv/politics/special/clinton/stories/bctest092198_1.htm]
* [HTTP/2] [3] [user-agent: curl/8.5.0]
* [HTTP/2] [3] [accept: */*]
> GET /wp-srv/politics/special/clinton/stories/bctest092198_1.htm HTTP/2

> Host: www.washingtonpost.com

> User-Agent: curl/8.5.0

> Accept: */*

> 


0     0    0     0    0     0      0      0 --:--:--  0:00:11 --:--:--     0
0     0    0     0    0     0      0      0 --:--:--  0:00:12 --:--:--     0
0     0    0     0    0     0      0      0 --:--:--  0:00:13 --:--:--     0
0     0    0     0    0     0      0      0 --:--:--  0:00:14 --:--:--     0
0     0    0     0    0     0      0      0 --:--:--  0:00:15 --:--:--     0
0     0    0     0    0     0      0      0 --:--:--  0:00:16 --:--:--     0
0     0    0     0    0     0      0      0 --:--:--  0:00:17 --:--:--     0
0     0    0     0    0     0      0      0 --:--:--  0:00:18 --:--:--     0
0     0    0     0    0     0      0      0 --:--:--  0:00:19 --:--:--     0
0     0    0     0    0     0      0      0 --:--:--  0:00:20 --:--:--     0< HTTP/2 302 

< content-length: 0

< location: https://www.washingtonpost.com/wp-srv/politics/special/clinton/stories/bctest092198_1.htm

< cache-control: max-age=0

< expires: Sat, 06 Jan 2024 16:47:25 GMT

< date: Sat, 06 Jan 2024 16:47:25 GMT

< set-cookie: wp_devicetype=0; expires=Mon, 05-Feb-2024 16:47:25 GMT; path=/; domain=.washingtonpost.com; secure; SameSite=None

< set-cookie: wp_ak_pct=0|20230131; max-age= 2592000; path=/; domain=.washingtonpost.com; SameSite=None; secure

< set-cookie: wp_ak_mab=0|20230101; max-age=0; path=/; domain=.washingtonpost.com; SameSite=None; secure

< set-cookie: wp_ak_signinv2=1|20230125; max-age=2592000; path=/; domain=.washingtonpost.com; SameSite=None; secure

< set-cookie: wp_ak_om=1|20230731; max-age=2592000; path=/; domain=.washingtonpost.com; SameSite=None; secure

< set-cookie: wp_ak_wab=0|0|0|1|1|1|0|1|2|20230418; max-age=31536000; path=/; domain=.washingtonpost.com; SameSite=None; secure

< set-cookie: wp_ak_v_mab=0|0|0|1|20231130; max-age=31536000; path=/; domain=.washingtonpost.com; SameSite=None; secure

< set-cookie: wp_ak_v_ot=1; max-age=5184000; path=/; domain=.washingtonpost.com; SameSite=None; secure

< set-cookie: wp_ak_ot=1|20211012; max-age=5184000; path=/; domain=.washingtonpost.com; SameSite=None; secure

< set-cookie: wp_geo=AT||||EEA; max-age=3600; path=/; domain=.washingtonpost.com; SameSite=None; secure

< akamai-true-ttl: -1

< content-security-policy: upgrade-insecure-requests

< strict-transport-security: max-age=15768000

< x-frame-options: SAMEORIGIN

< x-wp-request-id: 0.044e1202.1704559635.f1d3544d

< server-timing: ak_p; desc="1704559635514_34754052_4057158733_998124_9446_10_0_15";dur=1

< 


0     0    0     0    0     0      0      0 --:--:--  0:00:20 --:--:--     0
* Connection #0 to host www.washingtonpost.com left intact
* Issue another request to this URL: 'https://www.washingtonpost.com/wp-srv/politics/special/clinton/stories/bctest092198_1.htm'
* Found bundle for host: 0xdb83277da0 [can multiplex]
* Re-using existing connection with host www.washingtonpost.com
* [HTTP/2] [5] OPENED stream for https://www.washingtonpost.com/wp-srv/politics/special/clinton/stories/bctest092198_1.htm
* [HTTP/2] [5] [:method: GET]
* [HTTP/2] [5] [:scheme: https]
* [HTTP/2] [5] [:authority: www.washingtonpost.com]
* [HTTP/2] [5] [:path: /wp-srv/politics/special/clinton/stories/bctest092198_1.htm]
* [HTTP/2] [5] [user-agent: curl/8.5.0]
* [HTTP/2] [5] [accept: */*]
> GET /wp-srv/politics/special/clinton/stories/bctest092198_1.htm HTTP/2

> Host: www.washingtonpost.com

> User-Agent: curl/8.5.0

> Accept: */*

> 


0     0    0     0    0     0      0      0 --:--:--  0:00:21 --:--:--     0
0     0    0     0    0     0      0      0 --:--:--  0:00:22 --:--:--     0
0     0    0     0    0     0      0      0 --:--:--  0:00:23 --:--:--     0
0     0    0     0    0     0      0      0 --:--:--  0:00:24 --:--:--     0
0     0    0     0    0     0      0      0 --:--:--  0:00:25 --:--:--     0
0     0    0     0    0     0      0      0 --:--:--  0:00:26 --:--:--     0
0     0    0     0    0     0      0      0 --:--:--  0:00:27 --:--:--     0
0     0    0     0    0     0      0      0 --:--:--  0:00:28 --:--:--     0< HTTP/2 302 

< content-length: 0

< location: https://www.washingtonpost.com/wp-srv/politics/special/clinton/stories/bctest092198_1.htm

< cache-control: max-age=0

< expires: Sat, 06 Jan 2024 16:47:33 GMT

< date: Sat, 06 Jan 2024 16:47:33 GMT

< set-cookie: wp_devicetype=0; expires=Mon, 05-Feb-2024 16:47:33 GMT; path=/; domain=.washingtonpost.com; secure; SameSite=None

< set-cookie: wp_ak_pct=0|20230131; max-age= 2592000; path=/; domain=.washingtonpost.com; SameSite=None; secure

< set-cookie: wp_ak_mab=0|20230101; max-age=0; path=/; domain=.washingtonpost.com; SameSite=None; secure

< set-cookie: wp_ak_signinv2=1|20230125; max-age=2592000; path=/; domain=.washingtonpost.com; SameSite=None; secure

< set-cookie: wp_ak_om=1|20230731; max-age=2592000; path=/; domain=.washingtonpost.com; SameSite=None; secure

< set-cookie: wp_ak_wab=1|1|1|0|1|1|1|0|1|20230418; max-age=31536000; path=/; domain=.washingtonpost.com; SameSite=None; secure

< set-cookie: wp_ak_v_mab=0|0|0|1|20231130; max-age=31536000; path=/; domain=.washingtonpost.com; SameSite=None; secure

< set-cookie: wp_ak_v_ot=1; max-age=5184000; path=/; domain=.washingtonpost.com; SameSite=None; secure

< set-cookie: wp_ak_ot=1|20211012; max-age=5184000; path=/; domain=.washingtonpost.com; SameSite=None; secure

< set-cookie: wp_geo=AT||||EEA; max-age=3600; path=/; domain=.washingtonpost.com; SameSite=None; secure

< akamai-true-ttl: -1

< content-security-policy: upgrade-insecure-requests

< strict-transport-security: max-age=15768000

< x-frame-options: SAMEORIGIN

< x-wp-request-id: 0.044e1202.1704559645.f1d3a322

< server-timing: ak_p; desc="1704559645541_34754052_4057178914_798453_10572_10_0_15";dur=1

< 


0     0    0     0    0     0      0      0 --:--:--  0:00:28 --:--:--     0
* Connection #0 to host www.washingtonpost.com left intact
* Issue another request to this URL: 'https://www.washingtonpost.com/wp-srv/politics/special/clinton/stories/bctest092198_1.htm'
* Found bundle for host: 0xdb83277da0 [can multiplex]
* Re-using existing connection with host www.washingtonpost.com
* [HTTP/2] [7] OPENED stream for https://www.washingtonpost.com/wp-srv/politics/special/clinton/stories/bctest092198_1.htm
* [HTTP/2] [7] [:method: GET]
* [HTTP/2] [7] [:scheme: https]
* [HTTP/2] [7] [:authority: www.washingtonpost.com]
* [HTTP/2] [7] [:path: /wp-srv/politics/special/clinton/stories/bctest092198_1.htm]
* [HTTP/2] [7] [user-agent: curl/8.5.0]
* [HTTP/2] [7] [accept: */*]
> GET /wp-srv/politics/special/clinton/stories/bctest092198_1.htm HTTP/2

> Host: www.washingtonpost.com

> User-Agent: curl/8.5.0

> Accept: */*

> 


0     0    0     0    0     0      0      0 --:--:--  0:00:29 --:--:--     0
0     0    0     0    0     0      0      0 --:--:--  0:00:30 --:--:--     0
0     0    0     0    0     0      0      0 --:--:--  0:00:31 --:--:--     0
0     0    0     0    0     0      0      0 --:--:--  0:00:32 --:--:--     0
0     0    0     0    0     0      0      0 --:--:--  0:00:33 --:--:--     0
0     0    0     0    0     0      0      0 --:--:--  0:00:34 --:--:--     0
0     0    0     0    0     0      0      0 --:--:--  0:00:35 --:--:--     0
0     0    0     0    0     0      0      0 --:--:--  0:00:36 --:--:--     0
0     0    0     0    0     0      0      0 --:--:--  0:00:37 --:--:--     0< HTTP/2 302 

< content-length: 0

< location: https://www.washingtonpost.com/wp-srv/politics/special/clinton/stories/bctest092198_1.htm

< cache-control: max-age=0

< expires: Sat, 06 Jan 2024 16:47:42 GMT

< date: Sat, 06 Jan 2024 16:47:42 GMT

< set-cookie: wp_devicetype=0; expires=Mon, 05-Feb-2024 16:47:42 GMT; path=/; domain=.washingtonpost.com; secure; SameSite=None

< set-cookie: wp_ak_pct=0|20230131; max-age= 2592000; path=/; domain=.washingtonpost.com; SameSite=None; secure

< set-cookie: wp_ak_mab=0|20230101; max-age=0; path=/; domain=.washingtonpost.com; SameSite=None; secure

< set-cookie: wp_ak_signinv2=1|20230125; max-age=2592000; path=/; domain=.washingtonpost.com; SameSite=None; secure

< set-cookie: wp_ak_om=1|20230731; max-age=2592000; path=/; domain=.washingtonpost.com; SameSite=None; secure

< set-cookie: wp_ak_wab=0|0|1|0|0|0|1|0|2|20230418; max-age=31536000; path=/; domain=.washingtonpost.com; SameSite=None; secure

< set-cookie: wp_ak_v_mab=0|0|0|1|20231130; max-age=31536000; path=/; domain=.washingtonpost.com; SameSite=None; secure

< set-cookie: wp_ak_v_ot=1; max-age=5184000; path=/; domain=.washingtonpost.com; SameSite=None; secure

< set-cookie: wp_ak_ot=1|20211012; max-age=5184000; path=/; domain=.washingtonpost.com; SameSite=None; secure

< set-cookie: wp_geo=AT||||EEA; max-age=3600; path=/; domain=.washingtonpost.com; SameSite=None; secure

< akamai-true-ttl: -1

< content-security-policy: upgrade-insecure-requests

< strict-transport-security: max-age=15768000

< x-frame-options: SAMEORIGIN

< x-wp-request-id: 0.044e1202.1704559653.f1d3da9c

< server-timing: ak_p; desc="1704559653569_34754052_4057193116_898357_10218_11_0_15";dur=1

< 


0     0    0     0    0     0      0      0 --:--:--  0:00:37 --:--:--     0
* Connection #0 to host www.washingtonpost.com left intact
* Maximum (3) redirects followed
curl: (47) Maximum (3) redirects followed

Trying to add --http1.1 also doesn't change the outcome, even if it uses HTTP/1.1, so it's not HTTP/2 related.

The original resulting files attached:
result_files.zip

curl/libcurl version

curl 8.5.0 (x86_64-w64-mingw32) libcurl/8.5.0 LibreSSL/3.8.2 (Schannel) zlib/1.3 brotli/1.1.0 zstd/1.5.5 WinIDN libssh2/1.11.0 nghttp2/1.58.0 ngtcp2/1.1.0 nghttp3/1.1.0
Release-Date: 2023-12-06
Protocols: dict file ftp ftps gopher gophers http https imap imaps ldap ldaps mqtt pop3 pop3s rtsp scp sftp smb smbs smtp smtps telnet tftp ws wss
Features: alt-svc AsynchDNS brotli HSTS HTTP2 HTTP3 HTTPS-proxy IDN IPv6 Kerberos Largefile libz MultiSSL NTLM SPNEGO SSL SSPI threadsafe UnixSockets zstd

operating system

Windows 8.1

@ghost ghost changed the title fails to get the page from www.washingtonpost.com/wp-srv, wget works fails to get the page from www.washingtonpost.com/wp-srv but wget works Jan 6, 2024
@ghost ghost changed the title fails to get the page from www.washingtonpost.com/wp-srv but wget works Fails to get the page from www.washingtonpost.com/wp-srv but wget works Jan 6, 2024
@dfandrich
Copy link
Contributor

dfandrich commented Jan 6, 2024 via email

@ghost
Copy link
Author

ghost commented Jan 6, 2024

Add the -L option to have curl follow it.

See in the first message, I've tried that too, and it doesn't help:

 Modifying the curl request to

 curl -q --ipv4 -v -L --max-redirs 3 https://www.washingtonpost.com/wp-srv/politics/special/clinton/stories/bctest092198_1.htm -o 13.html >e1_3 2>e_2_3

 The above command results again in not producing the 13.html file and the file e_2_3 is

@bagder bagder added HTTP not-a-bug This is not a bug in curl labels Jan 6, 2024
@bagder
Copy link
Member

bagder commented Jan 6, 2024

curl is acting correctly here according to how the server tells it

You might get it to do what you want by also enabling cookies with -b.

@ghost
Copy link
Author

ghost commented Jan 6, 2024

Thanks, I didn't know how to see that that was needed. I assume the current behavior of curl is "by design" and it is expected from the users to understand that (I wasn't able to figure that out from the output). Using that assumption I'm closing the issue.

@ghost ghost closed this as completed Jan 6, 2024
@ghost
Copy link
Author

ghost commented Jan 6, 2024

One more question: What also confuses me now is that curl takes

0m 9.46s

of real time to fetch the page but wget gets the page in

0m 0.08s

real time, that is, curl is still 118 times slower than wget. Can that slow get be avoided with some switch too?

@jay
Copy link
Member

jay commented Jan 6, 2024

real time, that is, curl is still 118 times slower than wget. Can that slow get be avoided with some switch too?

That is because of the server. I checked in Wireshark to confirm. Probably it is some CDN server that does not cache the page for some agents or waits x number of times etc. I see a 9s delay with both curl and wget probably because mine has a different user agent string. If I run the curl command with -A "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36" the response is instant. real 0m0.143s

@ghost
Copy link
Author

ghost commented Jan 6, 2024

Yes, that's it, the server changes its reactions, many thanks! I also see the consistent behavior when I match the user agent strings.

One additional cause of confusion by me during these experiments:

an empty -b in the command line "activates the cookie engine" even without specifying the input file

but

adding just the -b in the .curlrc results in a warning and "the cookie engine" probably remains non active?

I've had to see more failures and to think more until I've finally tried to put the explicit null file name in the .curlrc and the behavior matched one in the command line, where I've had simply

curl --something -b --somethingelse 

and it "worked" but not with

 -b

in the .curlrc. Now I know that too, in Windows it has to be

-b NUL

etc. Thanks!

@jay
Copy link
Member

jay commented Jan 6, 2024

The argument that comes after -b is the required option value, -b <data|filename>. You can enable the cookie engine with an empty string like -b "".

@ghost
Copy link
Author

ghost commented Jan 7, 2024

I don't doubt that it's a "required option value", I'm just reporting that

  • from the command line it "worked" without specifying anything at all

  • but "didn't" from the .curlrc

I haven't investigated why the behaviors differ, and if it is platform specific (I've tried on Windows), and that's what I believe I've observed, experimenting with this site which, if I understand correctly, insists on the "cookie engine" being on for that redirection step to "work".

Specifically, I believe that instead

curl -L -b "" http://example.com

one can, from the command line, just write:

curl -b -L  http://example.com

without noticing that it's officially required. And I, personally, like the possibility to write:

curl -b -L  http://example.com

@jay
Copy link
Member

jay commented Jan 7, 2024

curl -b -L http://example.com

The option value is required. That will open cookie filename -L if it exists. In other words -L is treated as an option value and not an option.

@ghost
Copy link
Author

ghost commented Jan 7, 2024

Thanks a lot Jay! That's completely new concept to me: "-L is treated as an option value and not an option" - it never occurred to me. I've naively thought that the options are recognized before it is attempted to read their parameters, and that therefore -L couldn't be considered as a file. Now when I think about why, I guess that the probable rationale was that one should be able to use the file names with the starting '-' for these parameters without some additional methods being needed for that. I also have no idea how many other programs use the same approach, clearly because I've never even thought about that.

jay added a commit to jay/curl that referenced this issue Jan 7, 2024
- Explain that --cookie "" can be used to enable the cookie engine
  without reading any initial cookies.

As is documented in CURLOPT_COOKIEFILE.

Ref: https://curl.se/libcurl/c/CURLOPT_COOKIEFILE.html

Bug: curl#12643 (comment)
Reported-by: janko-js@users.noreply.github.com

Closes #xxxx
jay added a commit to jay/curl that referenced this issue Jan 7, 2024
- Explain that --cookie "" can be used to enable the cookie engine
  without reading any initial cookies.

As is documented in CURLOPT_COOKIEFILE.

Ref: https://curl.se/libcurl/c/CURLOPT_COOKIEFILE.html

Bug: curl#12643 (comment)
Reported-by: janko-js@users.noreply.github.com

Closes #xxxx
@jay
Copy link
Member

jay commented Jan 7, 2024

I've submitted #12646 to add --cookie "" to the manpage.

jay added a commit to jay/curl that referenced this issue Jan 7, 2024
- Explain that --cookie "" can be used to enable the cookie engine
  without reading any initial cookies.

As is documented in CURLOPT_COOKIEFILE.

Ref: https://curl.se/libcurl/c/CURLOPT_COOKIEFILE.html

Bug: curl#12643 (comment)
Reported-by: janko-js@users.noreply.github.com

Closes #xxxx
@ghost
Copy link
Author

ghost commented Jan 7, 2024

Thanks!

Now when the man page is a topic:

I also haven't known that the convention of the curl man page is that < > brackets mark a required option argument, now that I think about it, maybe indeed they influenced me to somehow expect some "optionality" as I'm more used to read things like:

   -e sub-extension, --extension=sub-extension

(from man man) with an argument in both cases and without < > than

  -c, --cookie-jar <filename>

from man curl where the short version also doesn't have the argument. It could have contributed to confusion. Also no idea which other man uses the curl-like conventions.

jay added a commit that referenced this issue Jan 9, 2024
- Explain that --cookie "" can be used to enable the cookie engine
  without reading any initial cookies.

As is documented in CURLOPT_COOKIEFILE.

Ref: https://curl.se/libcurl/c/CURLOPT_COOKIEFILE.html

Bug: #12643 (comment)
Reported-by: janko-js@users.noreply.github.com

Closes #12646
This issue was closed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
HTTP not-a-bug This is not a bug in curl
Development

No branches or pull requests

3 participants