cURL / Mailing Lists / curl-library / Single Mail

curl-library

RE: Multithreading Problems (CentOS 5.1)

From: Nick George <nick.george_at_hotmail.com>
Date: Fri, 20 Jun 2008 18:39:54 +1000

More bizarre behaviour to report.

I have re-written the multithreaded test to use forking instead. I figure it's the fork() that's expensive, and I don't really care about the cost of process startup, just execution. ANYWAY, I get exactly the same performance problems with multiple processes as I do with multiple threads. I'm at a loss to explain what is going on here. My multiple processes all wait on a fifo read before they will perform the HTTP GET with cURL. When they run, I get pretty much identical timings as I do with multiple threads. I get the same behaviour on Ubuntu 8.04 i686 (Curl 7.18.2) as I do on CentOS 5.1 (Curl 7.15.5).

What's more, I find that I'm consistantly getting the same waiting times, I'll have a couple of hundred threads finish FAST, then a couple hundred finish as 3 seconds, then a couple more hundred at 9 seconds, then more at 21 seconds then more at 45 seconds then finally, more at 91 seconds! I'm finding it difficult to track down what each of these processes is actually waiting on, I'm presuming its the same futex() call.

Maybe the OS is deliberately stifling what it considers to be a DoS? I don't have selinux enabled... can't think of what else would be causing this.

Any ideas?

Regards,
Nick

> From: nick.george_at_hotmail.com
> To: curl-library_at_cool.haxx.se
> Subject: RE: Multithreading Problems (CentOS 5.1)
> Date: Fri, 20 Jun 2008 15:30:53 +1000
>
>
>
>> Using c-ares? What libcurl version? Which Linux kernel?
> Kernel Version: 2.6.18-53.1.21.el5 SMP x86_64 (CentOS 5.1)
> cURL version 7.15.5 Vendor: CentOS Release 2.el5 Arch: x86_64
> I'm not using c-ares at all.
>
> I've looked a bit further into the issue.
> If I comment out the curl_easy_perform() call, the code runs VERY quick (probably because there's nothing to do), so it seems that it's the cURL code must be calling some library functions that end up calling futex(). I also tried commenting out all custom code in the CURLOPT_WRITEFUNCTION callback (in case I was doing something stupid) to no avail.
>
> I've tried the same code on Ubuntu 8.04 (kernel 2.6.24-16-generic i686) with the latest stable version of libcurl compiled from source. I'm running into the same issue. I can't run more than 380 threads at the same time, but it's enough to see that it pauses for a number of seconds on calls to futex().
>
> I ended up giving up on the multithreaded approach. I was having other problems with my i386 box not being able to create more than 300 threads anyway. So, instead I tried extending the multi-app.c example to perform the same tests. It works fine up to about 150 threads, where performance starts to degrade massively. This time, the call to select() is pausing for many seconds at a time. I can't win! Right now, I'm at a bit of a loss to explain what's going on.
>
> I've included my multithreaded code below, maybe you could have a quick glance and let me know if you see me doing anything real dumb?
>
> Regards,
> Nick
>
> /*****************************************************************************
> * _ _ ____ _
> * Project ___| | | | _ \| |
> * / __| | | | |_) | |
> * | (__| |_| | _
> #include
> #include
> #include
> #include
> #include
> #include
> #include
> #include
> #include
> #include
>
> //GLOBALS
> pthread_cond_t cond = PTHREAD_COND_INITIALIZER;
> pthread_mutex_t mutex = PTHREAD_MUTEX_INITIALIZER;
> unsigned char *vfile_data; //the contents of the verification file
> off_t vfile_size; //the size of the verification file
>
> typedef struct {
> pthread_t pthr;
> char *url;
> int tid;
> int result;
> int data_recvd;
> double t_total;
> double t_connect;
> double t_lookup;
> double t_pretrans;
> double t_starttrans;
> double t_redirect;
> }thread_data;
>
> size_t got_data( void *ptr, size_t size, size_t nmemb, void *data)
> {
> thread_data *thr_data = data;
> //compare with input file.
> if(memcmp(vfile_data + thr_data->data_recvd, ptr, size*nmemb)!= 0)
> {
> thr_data->result = 1;
> return 0;
> }
> thr_data->data_recvd += size*nmemb;
> if(thr_data->data_recvd == vfile_size)
> {
> thr_data->result = 0;
> }
> return size*nmemb;
> }
>
>
> void *pull_one_url(void *buf)
> {
> thread_data *data = (thread_data *)buf;
> pthread_mutex_lock(&mutex);
> pthread_cond_wait(&cond, &mutex);
> pthread_mutex_unlock(&mutex);
> CURL *curl;
>
> curl = curl_easy_init();
> curl_easy_setopt(curl, CURLOPT_URL, data->url);
> curl_easy_setopt(curl, CURLOPT_NOSIGNAL, 1); //disable all signals arriving at this thread
> curl_easy_setopt(curl, CURLOPT_WRITEFUNCTION, got_data);
> curl_easy_setopt(curl, CURLOPT_WRITEDATA, data);
> curl_easy_perform(curl); // ignores error
> curl_easy_getinfo(curl, CURLINFO_TOTAL_TIME, &data->t_total);
> curl_easy_getinfo(curl, CURLINFO_NAMELOOKUP_TIME, &data->t_lookup);
> curl_easy_getinfo(curl, CURLINFO_CONNECT_TIME, &data->t_connect);
> curl_easy_getinfo(curl, CURLINFO_PRETRANSFER_TIME, &data->t_pretrans);
> curl_easy_getinfo(curl, CURLINFO_STARTTRANSFER_TIME, &data->t_starttrans);
> curl_easy_getinfo(curl, CURLINFO_REDIRECT_TIME, &data->t_redirect);
> curl_easy_cleanup(curl);
> return NULL;
> }
>
> void read_file(char *fname)
> {
> int fd = open(fname, O_RDONLY);
> if(fd == -1)
> {
> printf("Could not open verification file. %s\n", strerror(errno));
> exit(1);
> }
> size_t res = read(fd, vfile_data, vfile_size);
> if(res != vfile_size)
> {
> printf("Could not read entire verification file in one read. %s\n", strerror(errno));
> exit(1);
> }
> close(fd);
> }
>
> int main(int argc, char **argv)
> {
> char *url;
> char *vfile;
> int num_threads;
> struct stat finfo;
> int i;
> int error;
>
> if(argc != 4)
> {
> printf("Usage: ./www-test \n");
> exit(1);
> }
> vfile = argv[1];
> url = argv[2];
> sscanf(argv[3], "%d",&num_threads);
>
> if(stat(vfile, &finfo) == -1)
> {
> printf("Problem with verification file. %s\n", strerror(errno));
> exit(1);
> }
> vfile_size = finfo.st_size;
> vfile_data = (unsigned char *)malloc(vfile_size); //risky, should doublecheck size first.
> if(vfile_data == 0)
> {
> printf("Could not malloc data to read verification file\n");
> exit(1);
> }
> read_file(vfile);
>
> thread_data *thr_list = (thread_data *)malloc(num_threads*sizeof(thread_data));
> curl_global_init(CURL_GLOBAL_ALL);
>
> for(i = 0; i < num_threads; i++)
> {
> thr_list[i].url = url;
> thr_list[i].tid = i;
> thr_list[i].result = -1;
> thr_list[i].data_recvd = 0;
> error = pthread_create(&thr_list[i].pthr,
> NULL, /* default attributes please */
> pull_one_url,
> &thr_list[i]);
> if(0 != error)
> {
> fprintf(stderr, "Couldn't run thread number %d, error %s\n", i, strerror(errno));
> exit(1);
> }
> }
> sleep(2); /* Sleep is not a very robust way to serialize threads */
> pthread_mutex_lock(&mutex);
> pthread_cond_broadcast(&cond);
> pthread_mutex_unlock(&mutex);
>
> /* now wait for all threads to terminate */
> printf("thread id, result, total time, lookup time, connect time, pre-trans, trans, redirect\n");
> double grand_total = 0;
> int good_count = 0;
> for(i = 0; i < num_threads ; i++)
> {
> error = pthread_join(thr_list[i].pthr, NULL);
> printf("\t%d, \t%d, %f, %f, %f, %f, %f, %f\n", thr_list[i].tid, thr_list[i].result, thr_list[i].t_total,thr_list[i].t_lookup, thr_list[i].t_connect, thr_list[i].t_pretrans, thr_list[i].t_starttrans, thr_list[i].t_redirect);
> if(thr_list[i].result == 0)
> {
> grand_total += thr_list[i].t_total;
> good_count++;
> }
> }
> double average_total = grand_total/(float)good_count;
> printf("Average total time, %f\n", average_total);
> free(vfile_data);
> free(thr_list);
> pthread_cond_destroy(&cond);
> pthread_mutex_destroy(&mutex);
>
> return 0;
> }
>
>
>
>>
>>> All is well for up to about 50 threads, but even then I notice that the time required for the threads to complete increases exponentially.
>>>
>>> Threads: 1 Average time, 0.003281 seconds
>>> Threads: 5 Average time, 0.007052 seconds
>>> Threads: 10 Average time, 0.012198 seconds
>>> Threads: 50 Average time, 0.052749 seconds
>>> Threads: 100 Average time, 0.103197 seconds
>>> Threads: 500 Average time, 2.994846 seconds
>>> Threads:1000 Average time, 7.310746 seconds
>>>
>>> You can see that between 100 and 500 threads, the time required goes up by
>>> 30 when only 5 times the number of requests are made.
>>
>> This is quite surprising results. I've not seen any clear measurements on this
>> in a long time, but when we've used the multi interface and done operations
>> such as this using a single thread we've managed far better results than that.
>>
>> I don't even know what libcurl _could_ do to cause such delays!
>>
>>> I ran "strace" over the program and found that it pauses for long periods
>>> (many seconds) on a calls to "futex" with FUTEX_WAIT. Maybe the library is
>>> experiencing some kind of deadlock?
>>
>> The library is written single-threaded and uses no mutexes nor locks (unless
>> you've introduced them) so I don't think it does.
>>
>> Those futex things are rather used by system functions that perhaps fight for
>> a shared resource or something?
>>
>> --
>>
>> / daniel.haxx.se
>
> _________________________________________________________________
> It's simple! Sell your car for just $30 at CarPoint.com.au
> http://a.ninemsn.com.au/b.aspx?URL=http%3A%2F%2Fsecure%2Dau%2Eimrworldwide%2Ecom%2Fcgi%2Dbin%2Fa%2Fci%5F450304%2Fet%5F2%2Fcg%5F801459%2Fpi%5F1004813%2Fai%5F859641&_t=762955845&_r=tig_OCT07&_m=EXT

_________________________________________________________________
It's simple! Sell your car for just $30 at CarPoint.com.au
http://a.ninemsn.com.au/b.aspx?URL=http%3A%2F%2Fsecure%2Dau%2Eimrworldwide%2Ecom%2Fcgi%2Dbin%2Fa%2Fci%5F450304%2Fet%5F2%2Fcg%5F801459%2Fpi%5F1004813%2Fai%5F859641&_t=762955845&_r=tig_OCT07&_m=EXT
Received on 2008-06-20