curl-and-php
RE: curl-and-php Digest, Vol 29, Issue 3
Date: Fri, 4 Jan 2008 15:53:36 +0000
Thank you Richard and Colleen for your replies.
First I should say that I went back to my code and simplified it to the barest minimum, and ran it against an online pdf and the code actually worked perfectly. I present this code below. The problem I was having was that I was attempting to run the pdf through a complex series of routines to parse out html code and this code was running into problems with the pdf format.
In response to Richard, I actually have a set of code that converts the pdf format into a text format utilizing a shell program called pdftotext available from http://www.bluem.net/downloads/pdftotext_en/ . A requirement for this program is that the pdf must first be written to disk.
Below is my code to capture pdf code and write it to disk. It is actually a pretty basic curl download followed by a disk write.
Ralph
#open curl session
$s = curl_init();
#configure curl command
curl_setopt($s, CURLOPT_URL, "http://www.ire.org/training/nettour/pdf/PDFTOTEXT.pdf"); // target pdf
curl_setopt($s, CURLOPT_RETURNTRANSFER, TRUE); // return in string
# execute curl command & send contents of target pdf to string
$downloaded_page = curl_exec($s);
# close php/curl session
curl_close($s);
$filename = "downloaded.pdf";
$outfile = fopen($filename, "w+") or die("Error opening file\n");
fwrite($outfile, $downloaded_page) or die("Error writing to file.");
fclose($outfile);
> From: curl-and-php-request_at_cool.haxx.se
> Subject: curl-and-php Digest, Vol 29, Issue 3
> To: curl-and-php_at_cool.haxx.se
> Date: Fri, 4 Jan 2008 12:00:02 +0100
>
> Send curl-and-php mailing list submissions to
> curl-and-php_at_cool.haxx.se
>
> To subscribe or unsubscribe via the World Wide Web, visit
> http://cool.haxx.se/cgi-bin/mailman/listinfo/curl-and-php
> or, via email, send a message with subject or body 'help' to
> curl-and-php-request_at_cool.haxx.se
>
> You can reach the person managing the list at
> curl-and-php-owner_at_cool.haxx.se
>
> When replying, please edit your Subject line so it is more specific
> than "Re: Contents of curl-and-php digest..."
>
>
> Today's Topics:
>
> 1. Re: Problem with redirection (Douglas Fonseca)
> 2. PDF links (Ralph Seward)
> 3. Re: PDF links (Richard Lynch)
> 4. Re: PDF links (Colleen R. Dick)
>
>
> ----------------------------------------------------------------------
>
> Message: 1
> Date: Thu, 3 Jan 2008 15:01:31 -0300
> From: "Douglas Fonseca" <dglsbr_at_gmail.com>
> Subject: Re: Problem with redirection
> To: curl-and-php_at_cool.haxx.se
> Message-ID:
> <abbefc8b0801031001j9df51c0u1791993f99d0f3a6_at_mail.gmail.com>
> Content-Type: text/plain; charset="iso-8859-1"
>
> Hi,
> I'm with the same problem.
> I'm trying to log in on orkut with cURL, it works, but don't redirects de
> page to Orkut Home, just show a header.
> I'm already using COOKIEJAR.
> Please I hope someone can help us.
> Thank you,
> Douglas Fonseca
>
>
> 2008/1/1, Richard Lynch <ceo_at_l-i-e.com>:
> >
> > On Mon, December 17, 2007 9:34 am, Werner Hofer wrote:
> > > I would like to use the curl library. I wish to get a content from a
> > > page
> > > (page x: http://www.travelan.net/module/20316/more-gesamt/).
> > > For that i call a page (page y:
> > > http://www.getyourstock.com/alfa13.php) and
> > > this page calls page x (see the example alfa13.php).
> > > Now the problem is following: There is an redirection whitin page x
> > > and i do
> > > not know how to get the content of the redirected page x
> > > The content of the redirected page x should be displayed finally in
> > > the
> > > browser.
> > > Result: I only get the header, but not the content of the page
> > > itselfs.
> > >
> > > The header i get is following:
> > >
> > > HTTP/1.1 302 Found Date: Mon, 17 Dec 2007 15:20:17 GMT Server:
> > > Apache/2.0.54
> > > (Debian GNU/Linux) PHP/5.2.3 with Suhosin-Patch DAV/2 mod_ssl/2.0.54
> > > OpenSSL/0.9.7e X-Powered-By: PHP/5.2.3 Set-Cookie:
> > > PHPSESSID=1vmqk1rfjgau42pvki54kl8je5; path=/ Expires: Thu, 19 Nov 1981
> > > 08:52:00 GMT Cache-Control: no-store, no-cache, must-revalidate,
> > > post-check=0, pre-check=0 Pragma: no-cache location:
> > > http://www.holidayandmore.de/index.asp?Agentur=50376&AgentID=20316
> > > Transfer-Encoding: chunked Content-Type: text/html
> >
> > Since you are getting cookies in the headers, perhaps you need to
> > provide a COOKIEJAR and COOKIEFILE for the redirects to work.
> >
> > --
> > Some people have a "gift" link here.
> > Know what I want?
> > I want you to buy a CD from some indie artist.
> > http://cdbaby.com/from/lynch
> > Yeah, I get a buck. So?
> >
> > _______________________________________________
> > http://cool.haxx.se/cgi-bin/mailman/listinfo/curl-and-php
> >
> -------------- next part --------------
> An HTML attachment was scrubbed...
> URL: http://cool.haxx.se/pipermail/curl-and-php/attachments/20080103/307f7dd3/attachment-0001.htm
>
> ------------------------------
>
> Message: 2
> Date: Thu, 3 Jan 2008 19:24:19 +0000
> From: Ralph Seward <rj_seward_at_hotmail.com>
> Subject: PDF links
> To: <curl-and-php_at_cool.haxx.se>
> Message-ID: <BAY123-W349DEDE5DA833B25B236839E530_at_phx.gbl>
> Content-Type: text/plain; charset="iso-8859-1"
>
> Dear Folks,
>
> I am currently developing a web bot using php/curl and I have a question to throw out. Many times I will come across a link to a pdf file that appears just like a link to a web page. For example, http://www.somesite/healthcenter/ImmunizationForm.pdf. Click on this link, and in Firefox a popup-like window will appear asking "What should Firefox do with this file?" with the options of Open or Save to Disk.
> Now, is it possible to follow such a link through curl and have the pdf file saved to disk? Has anyone ever succeeded in doing anything with a pdf through curl?
> Thanks in advance.
> Ralph J Seward
>
> _________________________________________________________________
> Get the power of Windows + Web with the new Windows Live.
> http://www.windowslive.com?ocid=TXT_TAGHM_Wave2_powerofwindows_122007
> -------------- next part --------------
> An HTML attachment was scrubbed...
> URL: http://cool.haxx.se/pipermail/curl-and-php/attachments/20080103/1e77629f/attachment-0001.htm
>
> ------------------------------
>
> Message: 3
> Date: Thu, 3 Jan 2008 16:03:53 -0600 (CST)
> From: "Richard Lynch" <ceo_at_l-i-e.com>
> Subject: Re: PDF links
> To: "curl with PHP" <curl-and-php_at_cool.haxx.se>
> Message-ID: <37113.98.193.37.55.1199397833.squirrel_at_www.l-i-e.com>
> Content-Type: text/plain;charset=iso-8859-1
>
> On Thu, January 3, 2008 1:24 pm, Ralph Seward wrote:
> > I am currently developing a web bot using php/curl and I have a
> > question to throw out. Many times I will come across a link to a pdf
> > file that appears just like a link to a web page. For example,
> > http://www.somesite/healthcenter/ImmunizationForm.pdf. Click on this
> > link, and in Firefox a popup-like window will appear asking "What
> > should Firefox do with this file?" with the options of Open or Save to
> > Disk.
> > Now, is it possible to follow such a link through curl and have the
> > pdf file saved to disk? Has anyone ever succeeded in doing anything
> > with a pdf through curl?
>
> You can get it just as you would with an HTML document.
>
> There's nothing particularly fancy involved.
>
> If you want to actually analyze what's IN the PDF, then things get a
> bit more complicated, as the PDF format itself has a bewildering array
> of ways in which it can obfuscate content...
>
> But there are projects/products "out there" for tearing apart a PDF
> into its parts and analyzing them to varying degrees.
>
> --
> Some people have a "gift" link here.
> Know what I want?
> I want you to buy a CD from some indie artist.
> http://cdbaby.com/from/lynch
> Yeah, I get a buck. So?
>
>
>
> ------------------------------
>
> Message: 4
> Date: Thu, 03 Jan 2008 15:09:11 -0800
> From: "Colleen R. Dick" <platypus_at_proaxis.com>
> Subject: Re: PDF links
> To: curl with PHP <curl-and-php_at_cool.haxx.se>
> Message-ID: <477D6B17.7070908_at_proaxis.com>
> Content-Type: text/plain; charset="iso-8859-1"
>
> I didn't see him indicate that he cares about analyzing whats in the
> PDF, he just wants the bot to be able to download it to the host that
> its running on.
>
> I could say the same about csv files. Of course I'm having trouble
> getting the app to spit them out in the first place from curl. It calls
> a javascript and I have to simulate everything it does.
>
>
>
> Ralph, have you tried? What code have you tried and what is the result?
>
> Ralph Seward wrote:
> > Dear Folks,
> >
> > I am currently developing a web bot using php/curl and I have a
> > question to throw out. Many times I will come across a link to a pdf
> > file that appears just like a link to a web page. For example,
> > http://www.somesite/healthcenter/ImmunizationForm.pdf. Click on this
> > link, and in Firefox a popup-like window will appear asking "What
> > should Firefox do with this file?" with the options of Open or Save to
> > Disk.
> > Now, is it possible to follow such a link through curl and have the
> > pdf file saved to disk? Has anyone ever succeeded in doing anything
> > with a pdf through curl?
> > Thanks in advance.
> > Ralph J Seward
> >
> > ------------------------------------------------------------------------
> > Get the power of Windows + Web with the new Windows Live. Get it now!
> > <http://www.windowslive.com?ocid=TXT_TAGHM_Wave2_powerofwindows_122007>
> > ------------------------------------------------------------------------
> >
> > _______________________________________________
> > http://cool.haxx.se/cgi-bin/mailman/listinfo/curl-and-php
> >
>
> -------------- next part --------------
> A non-text attachment was scrubbed...
> Name: platypus.vcf
> Type: text/x-vcard
> Size: 314 bytes
> Desc: not available
> Url : http://cool.haxx.se/pipermail/curl-and-php/attachments/20080103/75ea0b75/attachment-0001.vcf
>
> ------------------------------
>
> _______________________________________________
> curl-and-php mailing list
> curl-and-php_at_cool.haxx.se
> http://cool.haxx.se/cgi-bin/mailman/listinfo/curl-and-php
>
>
> End of curl-and-php Digest, Vol 29, Issue 3
> *******************************************
_________________________________________________________________
Share life as it happens with the new Windows Live.
http://www.windowslive.com/share.html?ocid=TXT_TAGHM_Wave2_sharelife_012008
_______________________________________________
http://cool.haxx.se/cgi-bin/mailman/listinfo/curl-and-php
Received on 2008-01-04