curl-library
Threaded download slows down after a while
Date: Tue, 02 Sep 2008 13:51:36 +0200
Hello List,
I've a problem when downloading a lot of similar pages. The attached
code runs about 50 times parallel (using pthreads). It should download a
html page from which I have to parse descriptions to update a database
(about 50000).
The download-threads work well for a couple minutes (downloading with
full bandwidth usage) but then it slows down and in the end no more
traffic can be monitored.
When i start parallel to my running (hanging) program wget with the same
url wget downloads the html-page with full speed so the network is up
and ok.
I also tried to see if there are to much handles or stuff like that with
lsof but there are only as much networkconnections / handles as threads
are running (about 50 threads).
To the code:
The Product::downloadDescription method is static.
The Description class is a sublcass of CurlData. CurlData::addData only
resizes a char array and add the received data to it. CurlData::clear
frees the memory.
The memory usage is constant all the time.
So my question, is there an error the way I use libcurl, which causes
this behavior?
Thanks in advance,
Antonio
CODE
-----------------------------------------------------------------------
void *Product::downloadDescriptions(void *p) {
struct DescDownloadParams *params = static_cast<struct
DescDownloadParams *>(p);
const Config *config = params->config;
BlockingQueue<Product *> *queue = params->queue;
CURL *curlHandle = curl_easy_init();
Description d;
curl_easy_setopt(curlHandle, CURLOPT_NOPROGRESS, 1); // need no
progress info
curl_easy_setopt(curlHandle, CURLOPT_WRITEFUNCTION, write_data);
curl_easy_setopt(curlHandle, CURLOPT_FOLLOWLOCATION, 1);
curl_easy_setopt(curlHandle, CURLOPT_WRITEDATA, &d);
while(true) {
Product *prod = queue->dequeue();
if(prod == NULL) { // terminate thread with a NULL item
curl_easy_cleanup(curlHandle);
delete connection;
return NULL;
}
int retry = 3;
string url = config->getDescriptionBaseUrl() + prod->getId();
cout << "getting description from " << url << endl;
CURLcode success;
do {
curl_easy_setopt(curlHandle, CURLOPT_URL, url.c_str());
success = curl_easy_perform(curlHandle);
if(success != 0) {
cout << "error occured during description download,
libcurl - errorcode: " << success << ". Retrying (" << retry << " tries
left)" << endl;
}
} while(success != 0 && --retry);
...
d.clear();
}
}
------------------------------------------------------------------------------
// curl callback function
size_t write_data(void *buffer, size_t s, size_t nmemb, void *userp) {
size_t bytes = s * nmemb;
CurlData *curlDat = static_cast<CurlData *>(userp);
curlDat->addData((const char *)buffer, bytes);
return bytes;
}
Received on 2008-09-02