curl-and-python
Re: Noob Question.
Date: Wed, 30 Jan 2013 09:25:59 -0800
Hi,
Thanks for the response - but I do not need to set headers to send, I want
to receive the headers only.
This is what I ended up using (clue came from --libcurl option mentioned in
one of the previous threads by Daniel):
c = pycurl.Curl()
c.setopt(c.URL, 'http://www.yahoo.com')
c.setopt(c.HEADER, True)
c.setopt(c.NOBODY, True)
c.setopt(c.FOLLOWLOCATION, True)
and it gives me just the headers that I want.
For efficiency, I am looking into curl_multi option. I initialize the
curl_multi, add handles, read the returned results, delete the curl_multi
object and restart with the next batch ... but after a few batches the
process hangs.
I am looking into how to read the status from info_read() method of the
curl_multi object to find out what is going wrong.
This gives me data in separate sets - success objects and failed objects.
But the pycurl object's getinfo method gives me only effective_url ...
which could be different than the original url. How do I tie the results
back to the original url? One can check the redirect_count, but that still
does not give me the original url to tie the response back to.
What am I missing?
Also, is it better to del the curl_multi and re-initialize it, or should
one remove the handles and add new(er) handles to the same object in the
next batch?
SS
On Tue, Jan 29, 2013 at 10:46 AM, Sandip Shah <sandipshah_at_vthrive.com>wrote:
> Hi,
>
> I need to get the headers only from a URL (I am doing this for a lot of
> URLs) and seems like PyCURL is the fastest way to do it in Python.
>
> However, I do not see a "setopt_HEAD" (curl -I option) in PyCURL.
>
> What am I missing, and how can I get it?
>
> Thanks,
>
> SS
>
>
_______________________________________________
http://cool.haxx.se/cgi-bin/mailman/listinfo/curl-and-python
Received on 2013-01-30