curl-and-python
Re: Using pycurl with streaming python interfaces?
Date: Thu, 4 Dec 2008 09:32:13 +0100 (CET)
On Wed, 3 Dec 2008, johansen_at_sun.com wrote:
> The TarFile class can take a file-object, or an object that implements the
> same interfaces that a file does, and will read from that as data becomes
> available.
>
> I looked in libcurl and pycurl, but I didn't see any interface that would
> let me access data in the response body as a file-object, without first
> downloading the entire response.
>
> If one were to try to accomplish this today, is there a way to read the data
> in the response, a bit at a time, so that I may be streamed into a
> file-blocks type of interface that is provided by TarFile and gzip classes?
Quite clearly your biggest "problem" in my view seems to be that urllib's and
libcurl's (and thus pycurl's) interfaces are somewhat different and your
existing code is baked around the nature of urllib and not really that
libcurl/pycurl has any particular flaws.
Allow me to throw in my opinions on this - me being a libcurl guy primarily,
and not really a pycurl person by the concepts should still apply no matter
language:
- Getting data line-by-line from a site is an inefficient way of downloading
whatever you want. It's much more clever to get more data at once and then
have a front-end that can traverse the downloaded data line-by-line.
- libcurl's easy interface is a blocking interface that is designed and made
to do the entire request/transfer in one go before it returns, and it
provides and reads data using callbacks. You could achieve a line-by-line
interface by using a separate thread that downloads the data and have the
line-by-line function access that data (and you could even let the write
callback sleep or slow down in case the local buffer gets too big for you).
- libcurl provides its multi interface as a non-blocking alternative that
allows the caller, the application, to decide when to act and thus you can
make the line-by-line function more or less exactly like how I understand
your current code does it without additional threads. I actually once wrote
up an fopen() style function (that was subsequently enhanced by others) that
shows how it could be done (the example is in C using the native libcurl
API):
http://curl.haxx.se/lxr/source/docs/examples/fopen.c
-- / daniel.haxx.se _______________________________________________ http://cool.haxx.se/cgi-bin/mailman/listinfo/curl-and-pythonReceived on 2008-12-04