curl-and-python
feature req (or bug) CurlMulti should keep reference to added Curl objects
Date: Mon, 7 May 2012 09:01:16 +0300
Hi,
I recently ran into a gotcha with pycurl that took me quite a while to
get my mind around, in a nutshell consider this code:
multi = pycurl.CurlMulti()
for url in alist:
c = pycurl.Curl()
c.setopt(...) # url, write function, header function, post, etc
multi.add_handle(c)
while True:
_, active = multi.perform()
if not active: break
Somehow this turned out to perform only the last request from the url
list. What I figured a week later was happening was that Curl objects
were getting garbage collected, all except last a reference to which
was kept in `c`. That is CurlMulti doesn't keep references to added
Curl handles.
I think existing behaviour is not very pythonic, I instinctively
assumed CurlMulti.add_handle to have semantics similar to list.add.
I would rather CurlMulti kept references to added handles. I'm not
sure what it ought to release the references, quick counter-intuitive
hack is when request completed, a better solution to keep references
until explicitly removed, which allows to query error status per
handle and what not.
Of course what I propose is a semantic change.
And it might break someone's code.
I hope it doesn't break much existing, because those who kept an
explicit reference to Curl objects in a python data structure can
still explicitly call Curl.close on those handles if they want magic
CurlMulti auto-removal or explicitly remove their handles from
CurlMulti. I find it easier to discard and re-create a CurlMulti
object anyway.
Thoughts, comments?
For the time being I added a workaround like this in python:
+reqs = []
while url in alist:
c = pycurl.Curl()
+ reqs.append(c)
_______________________________________________
http://cool.haxx.se/cgi-bin/mailman/listinfo/curl-and-python
Received on 2012-05-07