Program dies on call of

From: Sam's Lists <>
Date: Tue, 12 Aug 2014 15:09:47 -0700

I have a rather complicated crawler that seems to die often - but not
always at the same place.

What's exasperating is that there is no exceptions, stack traces, etc.,
printed. I was only able to find where it died by adding lots of print
statements, and seeing what was the last thing to be printed.

Here's a somewhat simplified version of the code:

  multi = pycurl.CurlMulti()
    now = datetime.datetime.utcnow()
    for counter, website in enumerate(websites, 1):
        assert website.crawl_type in ('standard', 'refresh', 'new')
        website.grabber = WebSite.Resource(website.next_page.original_url,
        website.next_page.crawled_ts = now

    # Number of seconds to wait for a timeout to happen
    if Options.test:
        SELECT_TIMEOUT = 30.0 # Set for longer cause blicker_pierce takes
                                    # on the additional start page with all
the wines
        SELECT_TIMEOUT = 10.0

    #To do: implement it this way
    # Stir the state machine into action
    while 1:
        ret, num_handles = multi.perform()
        if ret != pycurl.E_CALL_MULTI_PERFORM:

    # Keep going until all the connections have terminated
    while num_handles:
        # The select method uses fdset internally to determine which file
        # to check.

        # Todo: This code is looped a lot
        # Should there be a sleep here???? I got no idea

        print("calling with:", SELECT_TIMEOUT)
        print("Please don't die here!!!!")

Received on 2014-08-13