curl-users
Re: simple/weird question
Date: Tue, 1 Dec 2015 13:23:35 +1000
Bruce,
I managed to take another look at this one again today and my original gut
feeling about special characters was close if you look again at the curl'ed
output file through vi you will noticed that there are two ^M (Dos/Windows
standards carriage returns) just after each of the select options that have
been the source of your concerns!!!
It appears our terminals prefer normal LF line feeds so I went ahead and
removed those Dos carriage returns and saved the file.
I then performed a "more output.html" to stream the data to the terminal
and voila happy days!!! =)
Now that we know what we are dealing with the problem solving moves to
solution conceiving and I haven't read the curl man to discover if there is
a regular expression replace incoming line feeds option entitled unix line
feed standards, hehehehehe =P but maybe this is something the curl experts
have had to deal with previously and already know a workaround or a
solution? (maybe piping some replacement programs to the curl)
Cheers,
Bundy
P.S. Sure is an interesting way of masking data from the terminal. =P
On 1 December 2015 at 01:20, bruce <badouglas_at_gmail.com> wrote:
> Hey Von..
>
> You'll get most of the file! But, and here's the slyness, if you look
> at the output in the browser for the Quarter/Term (fall/winter)
> select/option, you'll see the full page..
>
> However, if you do the curl, and look at the output in the term,
> you'll see what appears to be a corrupted output. Now, if you do the
> curl, store the output in a text/output file, and then use something
> lke gedit/vim/etc.. you'll see the complete data....!!!
>
> The issue! -- the file/contents from the server is malformed, in that
> all lines aren't terminated with a '\n\r'... or a '\n'.. a couple of
> lines in the select/option block are terminated with a '\a' or 0x0a.
>
> This gets handled by the term in a manner that makes the display
> appear to be corrupted, so what needs to be done, is to do a simple
> replace... prob solved...
>
>
>
>
> On Mon, Nov 30, 2015 at 10:04 AM, Von Hawkins via curl-users
> <curl-users_at_cool.haxx.se> wrote:
> > Bruce,
> >
> >> Trying to fetch a simple page.
> >>
> >> The target is:
> >> http://www.foothill.edu/schedule/schedule.php
> >>
> >> In different browsers, without javascript or referer, results come
> >> back as expected.
> >>
> >> Using different curls settings, can't seem to get the complete
> >> returned content. The content seems to be missing the select for the
> >> terms!!
> >> Almost as though the returned data is corrupted.. But it happens
> >> consistently, regardless of trying different headers, etc..
> >>
> >> my test curl is:
> >> curl -vvv -k -A "User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:38.0)
> >> Gecko/20100101 Firefox/38.0" -H 'Accept:
> >> text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8' -H
> >> 'Accept-Language: en-US,en;q=0.5' -H 'Accept-Encoding: gzip, deflate'
> >> --cookie-jar aa.lwp --cookie aa.lwp -L
> >> "http://www.foothill.edu/schedule/schedule.php"
> >>
> >
> > I cannot replicate your issue. Using curl version:
> > $ curl -V
> > curl 7.29.0 (x86_64-redhat-linux-gnu) libcurl/7.29.0 NSS/3.15.4
> zlib/1.2.7
> > libidn/1.28 libssh2/1.4.3
> > Protocols: dict file ftp ftps gopher http https imap imaps ldap ldaps
> pop3
> > pop3s rtsp scp sftp smtp smtps telnet tftp
> > Features: AsynchDNS GSS-Negotiate IDN IPv6 Largefile NTLM NTLM_WB SSL
> libz
> >
> >
> > Using either the simple:
> > curl http://www.foothill.edu/schedule/schedule.php > junk.php
> >
> > or the exact command you provided above always results in a text file
> that
> > contains the same contents I get from Chrome when I "view page source".
> In
> > other words, the file ends with:
> >
> > <script type="text/javascript">
> > _uacct = "UA-1477111-1";
> > urchinTracker();
> > </script></body>
> > </html>
> >
> >
> > What version and environment are you running? I'm just using a CentOS 7
> VM
> > with the stock curl version.
> >
> > --obivon
> >
> >
> > -------------------------------------------------------------------
> > List admin: http://cool.haxx.se/list/listinfo/curl-users
> > FAQ: http://curl.haxx.se/docs/faq.html
> > Etiquette: http://curl.haxx.se/mail/etiquette.html
> >
> -------------------------------------------------------------------
> List admin: http://cool.haxx.se/list/listinfo/curl-users
> FAQ: http://curl.haxx.se/docs/faq.html
> Etiquette: http://curl.haxx.se/mail/etiquette.html
>
-------------------------------------------------------------------
List admin: http://cool.haxx.se/list/listinfo/curl-users
FAQ: http://curl.haxx.se/docs/faq.html
Etiquette: http://curl.haxx.se/mail/etiquette.html