curl-library
Bad Performance When Uploading Large Data With Libcurl
Date: Mon, 10 Mar 2014 10:38:10 -0400
We have an application written in C, running on both Windows and
Linux, which uses libcurl's easy interface to interact with an Apache
server via an HTTP API over the internet. Normally, the API
transactions are short, and bandwidth performance is inconsequential,
but recently we have had to start working with a transaction which
sends large volumes of data to the API. In our case, large consists of
up to ~10MB per HTTP POST, but could be as small as a few KB. We'll be
making many of these transactions, sending anywhere from a ~100MB to a
100GB during program execution.
The problem that we've encountered, is that our total bandwidth (total
amount of data sent / (amount of time it takes to execute all upload
transactions - server side processing time)) is quite bad. If we take
the raw data and FTP it to the same webserver which hosts the Apache
instance, we get ~5x more bandwidth than the current setup. We
understand that sending the data in chunks will eat into performance
to some degree, but a 5x slowdown seems excessive (maybe it's not,
that's why we're asking!).
More precisely If we take the data being sent as a single text file,
compress it using deflate and FTP the resulting 125MB file with
FileZilla, it takes about 2 minutes (8Mb/s). Our application POSTs the
data one chunk at a time, each individually compressed, so it's
sending 175MB of data, but takes 14.5 minutes (1.5Mb/s). The
distribution of POST body sizes is as follows:
Range: 3.7 KB - 9.6 MB
Median: 55 KB
Average: 1.6 MB
Less than 10% of POST bodies are less than 1500 bytes.
We're looking for suggestions to improve performance, or an argument
as to why it can't be improved under our constraints. We must continue
to use HTTP POSTs to send the data one chunk at a time. Ideally
there'd be some way to configure libcurl to perform better with our
non-standard load (we've read about TCP_CORK, but weren't able to find
any documentation on how to use it in libcurl).
Roughly, our code currently looks like this (error checking removed
for terseness)
CURL *cur = curl_easy_init()))
struct curl_slist *headers = NULL
headers = curl_slist_append(headers, "Connection: Keep-Alive");
headers = curl_slist_append(headers, "Keep-Alive: 60");
headers = curl_slist_append(headers, "Content-Type: text/xml");
headers = curl_slist_append(headers, xff); // xff declared elsewhere
curl_easy_setopt(cur, CURLOPT_PROXY, proxyurl); // proxyurl declared elsewhere
// proxyauth declared elsewhere
curl_easy_setopt(cur, CURLOPT_PROXYUSERPWD, proxyauth);
curl_easy_setopt(cur, CURLOPT_URL, urlbuf); // urlbuf declared elsewhere
// user_agent declared elsewhere
curl_easy_setopt(cur, CURLOPT_USERAGENT, user_agent);
// callback makes a copy of a tiny (<100 byte) response which is
processes later.
curl_easy_setopt(cur, CURLOPT_WRITEFUNCTION, callback);
curl_easy_setopt(cur, CURLOPT_WRITEDATA, ptr); // ptr declared elsewhere
// postBody is the up to ~10MB data buffer being sent
curl_easy_setopt(cur, CURLOPT_POSTFIELDS, postBody);
curl_easy_setopt(cur, CURLOPT_POSTFIELDSIZE, postBodySize); // postBody size
curl_easy_setopt(cur, CURLOPT_HTTPHEADER, headers);
curl_easy_setopt(cur, CURLOPT_BUFFERSIZE, CURL_MAX_WRITE_SIZE);
curl_easy_setopt(cur, CURLOPT_NOSIGNAL, 1);
curl_easy_setopt(cur, CURLOPT_SSL_VERIFYPEER, 0);
// connectTimeout declared elsewhere
curl_easy_setopt(cur, CURLOPT_CONNECTTIMEOUT, connectTimeout);
curl_easy_setopt(cur, CURLOPT_TIMEOUT, timeout); // timeout declared elsewhere
/* Turn on http keep-alive packets */
curl_easy_setopt(cur, CURLOPT_TCP_KEEPALIVE, 1);
curl_easy_perform(cur)
Any and all thoughts/suggestions are very much appreciated.
-------------------------------------------------------------------
List admin: http://cool.haxx.se/list/listinfo/curl-library
Etiquette: http://curl.haxx.se/mail/etiquette.html
Received on 2014-03-10