curl-and-python

Re: Using pycurl with streaming python interfaces?

From: Daniel Stenberg <daniel_at_haxx.se>
Date: Thu, 4 Dec 2008 09:32:13 +0100 (CET)

On Wed, 3 Dec 2008, johansen_at_sun.com wrote:

> The TarFile class can take a file-object, or an object that implements the
> same interfaces that a file does, and will read from that as data becomes
> available.
>
> I looked in libcurl and pycurl, but I didn't see any interface that would
> let me access data in the response body as a file-object, without first
> downloading the entire response.
>
> If one were to try to accomplish this today, is there a way to read the data
> in the response, a bit at a time, so that I may be streamed into a
> file-blocks type of interface that is provided by TarFile and gzip classes?

Quite clearly your biggest "problem" in my view seems to be that urllib's and
libcurl's (and thus pycurl's) interfaces are somewhat different and your
existing code is baked around the nature of urllib and not really that
libcurl/pycurl has any particular flaws.

Allow me to throw in my opinions on this - me being a libcurl guy primarily,
and not really a pycurl person by the concepts should still apply no matter
language:

- Getting data line-by-line from a site is an inefficient way of downloading
   whatever you want. It's much more clever to get more data at once and then
   have a front-end that can traverse the downloaded data line-by-line.

- libcurl's easy interface is a blocking interface that is designed and made
   to do the entire request/transfer in one go before it returns, and it
   provides and reads data using callbacks. You could achieve a line-by-line
   interface by using a separate thread that downloads the data and have the
   line-by-line function access that data (and you could even let the write
   callback sleep or slow down in case the local buffer gets too big for you).

- libcurl provides its multi interface as a non-blocking alternative that
   allows the caller, the application, to decide when to act and thus you can
   make the line-by-line function more or less exactly like how I understand
   your current code does it without additional threads. I actually once wrote
   up an fopen() style function (that was subsequently enhanced by others) that
   shows how it could be done (the example is in C using the native libcurl
   API):

           http://curl.haxx.se/lxr/source/docs/examples/fopen.c

-- 
  / daniel.haxx.se
_______________________________________________
http://cool.haxx.se/cgi-bin/mailman/listinfo/curl-and-python
Received on 2008-12-04