curl-users
Re: simple/weird question
Date: Tue, 1 Dec 2015 13:56:07 +1000
Bruce,
This appeared to work for me => curl
http://www.foothill.edu/schedule/schedule.php | sed -e 's/\^M//g'
But again it would be nice to see if curl has got the replace or convert
already built in?
Cheers,
Bundy
On 1 December 2015 at 13:23, Bundy <blazenbundy_at_gmail.com> wrote:
> Bruce,
>
> I managed to take another look at this one again today and my original gut
> feeling about special characters was close if you look again at the curl'ed
> output file through vi you will noticed that there are two ^M (Dos/Windows
> standards carriage returns) just after each of the select options that have
> been the source of your concerns!!!
>
>
>
>
> It appears our terminals prefer normal LF line feeds so I went ahead and
> removed those Dos carriage returns and saved the file.
>
>
>
>
> I then performed a "more output.html" to stream the data to the terminal
> and voila happy days!!! =)
>
>
>
>
> Now that we know what we are dealing with the problem solving moves to
> solution conceiving and I haven't read the curl man to discover if there is
> a regular expression replace incoming line feeds option entitled unix line
> feed standards, hehehehehe =P but maybe this is something the curl experts
> have had to deal with previously and already know a workaround or a
> solution? (maybe piping some replacement programs to the curl)
>
> Cheers,
>
> Bundy
>
> P.S. Sure is an interesting way of masking data from the terminal. =P
>
>
> On 1 December 2015 at 01:20, bruce <badouglas_at_gmail.com> wrote:
>
>> Hey Von..
>>
>> You'll get most of the file! But, and here's the slyness, if you look
>> at the output in the browser for the Quarter/Term (fall/winter)
>> select/option, you'll see the full page..
>>
>> However, if you do the curl, and look at the output in the term,
>> you'll see what appears to be a corrupted output. Now, if you do the
>> curl, store the output in a text/output file, and then use something
>> lke gedit/vim/etc.. you'll see the complete data....!!!
>>
>> The issue! -- the file/contents from the server is malformed, in that
>> all lines aren't terminated with a '\n\r'... or a '\n'.. a couple of
>> lines in the select/option block are terminated with a '\a' or 0x0a.
>>
>> This gets handled by the term in a manner that makes the display
>> appear to be corrupted, so what needs to be done, is to do a simple
>> replace... prob solved...
>>
>>
>>
>>
>> On Mon, Nov 30, 2015 at 10:04 AM, Von Hawkins via curl-users
>> <curl-users_at_cool.haxx.se> wrote:
>> > Bruce,
>> >
>> >> Trying to fetch a simple page.
>> >>
>> >> The target is:
>> >> http://www.foothill.edu/schedule/schedule.php
>> >>
>> >> In different browsers, without javascript or referer, results come
>> >> back as expected.
>> >>
>> >> Using different curls settings, can't seem to get the complete
>> >> returned content. The content seems to be missing the select for the
>> >> terms!!
>> >> Almost as though the returned data is corrupted.. But it happens
>> >> consistently, regardless of trying different headers, etc..
>> >>
>> >> my test curl is:
>> >> curl -vvv -k -A "User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:38.0)
>> >> Gecko/20100101 Firefox/38.0" -H 'Accept:
>> >> text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8' -H
>> >> 'Accept-Language: en-US,en;q=0.5' -H 'Accept-Encoding: gzip, deflate'
>> >> --cookie-jar aa.lwp --cookie aa.lwp -L
>> >> "http://www.foothill.edu/schedule/schedule.php"
>> >>
>> >
>> > I cannot replicate your issue. Using curl version:
>> > $ curl -V
>> > curl 7.29.0 (x86_64-redhat-linux-gnu) libcurl/7.29.0 NSS/3.15.4
>> zlib/1.2.7
>> > libidn/1.28 libssh2/1.4.3
>> > Protocols: dict file ftp ftps gopher http https imap imaps ldap ldaps
>> pop3
>> > pop3s rtsp scp sftp smtp smtps telnet tftp
>> > Features: AsynchDNS GSS-Negotiate IDN IPv6 Largefile NTLM NTLM_WB SSL
>> libz
>> >
>> >
>> > Using either the simple:
>> > curl http://www.foothill.edu/schedule/schedule.php > junk.php
>> >
>> > or the exact command you provided above always results in a text file
>> that
>> > contains the same contents I get from Chrome when I "view page
>> source". In
>> > other words, the file ends with:
>> >
>> > <script type="text/javascript">
>> > _uacct = "UA-1477111-1";
>> > urchinTracker();
>> > </script></body>
>> > </html>
>> >
>> >
>> > What version and environment are you running? I'm just using a CentOS
>> 7 VM
>> > with the stock curl version.
>> >
>> > --obivon
>> >
>> >
>> > -------------------------------------------------------------------
>> > List admin: http://cool.haxx.se/list/listinfo/curl-users
>> > FAQ: http://curl.haxx.se/docs/faq.html
>> > Etiquette: http://curl.haxx.se/mail/etiquette.html
>> >
>> -------------------------------------------------------------------
>> List admin: http://cool.haxx.se/list/listinfo/curl-users
>> FAQ: http://curl.haxx.se/docs/faq.html
>> Etiquette: http://curl.haxx.se/mail/etiquette.html
>>
>
>
-------------------------------------------------------------------
List admin: http://cool.haxx.se/list/listinfo/curl-users
FAQ: http://curl.haxx.se/docs/faq.html
Etiquette: http://curl.haxx.se/mail/etiquette.html