cURL / Mailing Lists / curl-library / Single Mail

curl-library

Threaded download slows down after a while

From: Antonio Weber <antonio.weber_at_ds-content.de>
Date: Tue, 02 Sep 2008 13:51:36 +0200

Hello List,

I've a problem when downloading a lot of similar pages. The attached
code runs about 50 times parallel (using pthreads). It should download a
html page from which I have to parse descriptions to update a database
(about 50000).
The download-threads work well for a couple minutes (downloading with
full bandwidth usage) but then it slows down and in the end no more
traffic can be monitored.
When i start parallel to my running (hanging) program wget with the same
url wget downloads the html-page with full speed so the network is up
and ok.

I also tried to see if there are to much handles or stuff like that with
lsof but there are only as much networkconnections / handles as threads
are running (about 50 threads).

To the code:
The Product::downloadDescription method is static.

The Description class is a sublcass of CurlData. CurlData::addData only
resizes a char array and add the received data to it. CurlData::clear
frees the memory.
The memory usage is constant all the time.

So my question, is there an error the way I use libcurl, which causes
this behavior?

Thanks in advance,
Antonio

CODE
-----------------------------------------------------------------------

void *Product::downloadDescriptions(void *p) {
    struct DescDownloadParams *params = static_cast<struct
DescDownloadParams *>(p);

    const Config *config = params->config;
    BlockingQueue<Product *> *queue = params->queue;

    CURL *curlHandle = curl_easy_init();

    Description d;

    curl_easy_setopt(curlHandle, CURLOPT_NOPROGRESS, 1); // need no
progress info
    curl_easy_setopt(curlHandle, CURLOPT_WRITEFUNCTION, write_data);
    curl_easy_setopt(curlHandle, CURLOPT_FOLLOWLOCATION, 1);
    curl_easy_setopt(curlHandle, CURLOPT_WRITEDATA, &d);

    while(true) {
        Product *prod = queue->dequeue();
        if(prod == NULL) { // terminate thread with a NULL item
            curl_easy_cleanup(curlHandle);
            delete connection;
            return NULL;
        }

        int retry = 3;
        string url = config->getDescriptionBaseUrl() + prod->getId();
        cout << "getting description from " << url << endl;
        CURLcode success;
        do {
            curl_easy_setopt(curlHandle, CURLOPT_URL, url.c_str());
            success = curl_easy_perform(curlHandle);

            if(success != 0) {
                cout << "error occured during description download,
libcurl - errorcode: " << success << ". Retrying (" << retry << " tries
left)" << endl;
            }
        } while(success != 0 && --retry);
        ...
        d.clear();
    }
}

------------------------------------------------------------------------------

// curl callback function
size_t write_data(void *buffer, size_t s, size_t nmemb, void *userp) {
    size_t bytes = s * nmemb;
    CurlData *curlDat = static_cast<CurlData *>(userp);
    curlDat->addData((const char *)buffer, bytes);
    return bytes;
}
Received on 2008-09-02