curl-and-python

Re: Program dies on call of multi.select...

From: Kamil Dudka <kdudka_at_redhat.com>
Date: Fri, 15 Aug 2014 12:49:57 +0200

On Friday, August 15, 2014 03:30:53 Sam's Lists wrote:
> Hmmm...Is there any more information I should provide to get a response?
>
> Or would I be better off asking on the general libcurl mailing list?
>
> Anyone have a recommended next step in solving this problem?
>
> Thanks!

Please attach a self-contained program that we can run to repeat the problem.
It is also not clear what you mean by "program dies". What exactly happens?

Kamil

> On Tue, Aug 12, 2014 at 3:18 PM, Sam's Lists <samslists_at_gmail.com> wrote:
> > Whoops, sorry that last email I hit send too quickly somehow. Here's what
> >
> > I meant to send:
> > multi = pycurl.CurlMulti()
> > now = datetime.datetime.utcnow()
> >
> > for counter, website in enumerate(websites, 1):
> > website.grabber = WebSite.Resource(website.next_page.original_url)
> > multi.add_handle(website.grabber._curl)
> >
> > while 1:
> > ret, num_handles = multi.perform()
> >
> > if ret != pycurl.E_CALL_MULTI_PERFORM:
> > break
> >
> > while num_handles:
> > multi.select(30.0)
> >
> > This above multi.select line is where it dies about 70% of the time.
> >
> > Why might it die there? Again there are no exceptions printed, no stack
> > traces, nothing.
> >
> > Could it have something do with a signal from the parent process?
> >
> > This is using Python 2.7 and Ubuntu 12.04.
> >
> > Pycurl is 7.19.5 and libcurl is 7.22.0-3ubuntu4.8
> >
> > Thanks!
> >
> > On Tue, Aug 12, 2014 at 3:09 PM, Sam's Lists <samslists_at_gmail.com> wrote:
> >> I have a rather complicated crawler that seems to die often - but not
> >> always at the same place.
> >>
> >> What's exasperating is that there is no exceptions, stack traces, etc.,
> >> printed. I was only able to find where it died by adding lots of print
> >> statements, and seeing what was the last thing to be printed.
> >>
> >> Here's a somewhat simplified version of the code:
> >> multi = pycurl.CurlMulti()
> >>
> >> print("ag2")
> >> now = datetime.datetime.utcnow()
> >> print("ag3")
> >>
> >> for counter, website in enumerate(websites, 1):
> >> print("ag4")
> >> assert website.crawl_type in ('standard', 'refresh', 'new')
> >> print("ag5")
> >> website.grabber =
> >> WebSite.Resource(website.next_page.original_url,
> >>
> >> anonymous=Options.anonymous)
> >>
> >> print("ag6")
> >> website.next_page.crawled_ts = now
> >> print("ag7")
> >> multi.add_handle(website.grabber._curl)
> >> print("ag8")
> >>
> >> print("ag9")
> >> # Number of seconds to wait for a timeout to happen
> >>
> >> if Options.test:
> >> SELECT_TIMEOUT = 30.0 # Set for longer cause blicker_pierce
> >>
> >> takes forever
> >>
> >> # on the additional start page with
> >>
> >> all the wines
> >>
> >> else:
> >> SELECT_TIMEOUT = 10.0
> >>
> >> print("ag10")
> >>
> >> #To do: implement it this way
> >>
> >> http://www.josefassad.com/pycurl_curlmulti_mini_howto
> >>
> >> # Stir the state machine into action
> >>
> >> while 1:
> >> print("ag11")
> >> ret, num_handles = multi.perform()
> >>
> >> if ret != pycurl.E_CALL_MULTI_PERFORM:
> >> break
> >>
> >> print("ag12")
> >> #CauseError
> >> # Keep going until all the connections have terminated
> >>
> >> while num_handles:
> >> # The select method uses fdset internally to determine which file
> >>
> >> descriptors
> >>
> >> # to check.
> >>
> >> # Todo: This code is looped a lot
> >> # Should there be a sleep here???? I got no idea
> >>
> >> print("ag12.5")
> >> print("calling multi.select with:", SELECT_TIMEOUT)
> >> print("Please don't die here!!!!")
> >> multi.select(SELECT_TIMEOUT)

_______________________________________________
http://cool.haxx.se/cgi-bin/mailman/listinfo/curl-and-python
Received on 2014-08-15