cURL / Mailing Lists / curl-users / Single Mail

curl-users

Re: simple/weird question

From: bruce <badouglas_at_gmail.com>
Date: Mon, 30 Nov 2015 10:20:53 -0500

Hey Von..

You'll get most of the file! But, and here's the slyness, if you look
at the output in the browser for the Quarter/Term (fall/winter)
select/option, you'll see the full page..

However, if you do the curl, and look at the output in the term,
you'll see what appears to be a corrupted output. Now, if you do the
curl, store the output in a text/output file, and then use something
lke gedit/vim/etc.. you'll see the complete data....!!!

The issue! -- the file/contents from the server is malformed, in that
all lines aren't terminated with a '\n\r'... or a '\n'.. a couple of
lines in the select/option block are terminated with a '\a' or 0x0a.

This gets handled by the term in a manner that makes the display
appear to be corrupted, so what needs to be done, is to do a simple
replace... prob solved...

On Mon, Nov 30, 2015 at 10:04 AM, Von Hawkins via curl-users
<curl-users_at_cool.haxx.se> wrote:
> Bruce,
>
>> Trying to fetch a simple page.
>>
>> The target is:
>> http://www.foothill.edu/schedule/schedule.php
>>
>> In different browsers, without javascript or referer, results come
>> back as expected.
>>
>> Using different curls settings, can't seem to get the complete
>> returned content. The content seems to be missing the select for the
>> terms!!
>> Almost as though the returned data is corrupted.. But it happens
>> consistently, regardless of trying different headers, etc..
>>
>> my test curl is:
>> curl -vvv -k -A "User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:38.0)
>> Gecko/20100101 Firefox/38.0" -H 'Accept:
>> text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8' -H
>> 'Accept-Language: en-US,en;q=0.5' -H 'Accept-Encoding: gzip, deflate'
>> --cookie-jar aa.lwp --cookie aa.lwp -L
>> "http://www.foothill.edu/schedule/schedule.php"
>>
>
> I cannot replicate your issue. Using curl version:
> $ curl -V
> curl 7.29.0 (x86_64-redhat-linux-gnu) libcurl/7.29.0 NSS/3.15.4 zlib/1.2.7
> libidn/1.28 libssh2/1.4.3
> Protocols: dict file ftp ftps gopher http https imap imaps ldap ldaps pop3
> pop3s rtsp scp sftp smtp smtps telnet tftp
> Features: AsynchDNS GSS-Negotiate IDN IPv6 Largefile NTLM NTLM_WB SSL libz
>
>
> Using either the simple:
> curl http://www.foothill.edu/schedule/schedule.php > junk.php
>
> or the exact command you provided above always results in a text file that
> contains the same contents I get from Chrome when I "view page source". In
> other words, the file ends with:
>
> <script type="text/javascript">
> _uacct = "UA-1477111-1";
> urchinTracker();
> </script></body>
> </html>
>
>
> What version and environment are you running? I'm just using a CentOS 7 VM
> with the stock curl version.
>
> --obivon
>
>
> -------------------------------------------------------------------
> List admin: http://cool.haxx.se/list/listinfo/curl-users
> FAQ: http://curl.haxx.se/docs/faq.html
> Etiquette: http://curl.haxx.se/mail/etiquette.html
>
-------------------------------------------------------------------
List admin: http://cool.haxx.se/list/listinfo/curl-users
FAQ: http://curl.haxx.se/docs/faq.html
Etiquette: http://curl.haxx.se/mail/etiquette.html
Received on 2015-11-30