curl-and-python
RE: Pycurl and the curl_multi_socket_action API
Date: Fri, 6 Apr 2012 15:26:00 +0530
Did you see the pycurl official example for curl_multi_socket implementation?
http://pycurl.cvs.sourceforge.net/viewvc/pycurl/pycurl/examples/retriever-multi.py?revision=1.29
But I am not sure if pycurl multi gives similar non blocking feature as in libcurl. I prefer using curl easy in threaded mode. The following example creates 100 parent threads and I was easily able to crawl around 5M urls per day on Amazon EC2 small environment.
https://github.com/utsavsabharwal/Web-Crawlers/blob/master/penelope/crawler.py
d_r_a_G_o_s
On Fri, Apr 6, 2012 at 3:06 PM, Utsav Sabharwal <utsavsabharwal@live.com> wrote:> Hi everybody,>> I'm trying to simulate realistic web traffic to benchmark one of our> products (a network controller).> Performance is very important since we're targeting to simulate 10 000> simultaneous "browsing sessions".>> I've read Daniel's blog post about curl's performance> : http://daniel.haxx.se/blog/2010/08/03/curl-performance/> He states that it is necessary to avoid select and poll as they are slow and> he advises to rely on the curl_multi_socket_action API combined with an> event library.>> I think that the C example given in> http://curl.haxx.se/libcurl/c/hiperfifo.html is the closest thing to what> I'm trying to achieve here.>> Does anyone know about a similar example based on PycURL instead ?>> Thanks in advance !
_______________________________________________
http://cool.haxx.se/cgi-bin/mailman/listinfo/curl-and-python
Received on 2012-04-06