cURL / Mailing Lists / curl-library / Single Mail

curl-library

Pulling Multi Urls At Once

From: Donald Boissonneault <donnyb_at_xplornet.ca>
Date: Tue, 06 Jul 2010 20:35:23 -0600

 Ok, guys and gals I have really done my homework here. I can not
accomplish what I want to. All I want to do is put a bunch of url
address into a deque of a class and then ask a function to get more then
one at once. The reason, well due to DNS resolution speed I can be held
up if I am only calling one url at a time. I would like to specify how
many I am pulling at once and here is the information I need returned.
 the IP address of the site;
 the HTTP Response of the site;
 the url address of the site
 and get the html body of the site.

 The last is where all of the problems come in.
 I have tried threading, I have tried the multi functions and I keep
running into one problem. I can not get it to process more then one at a
time of the html body because it calls a static function.
 This fist line calls the second function I show. That static function
is being very unfriendly. I would post all the code I have tried, but I
am up to my 7 or 8th try. If anyone would know how to pass a function
say 10 to 100 urls at a time to maximize the internet connection I would
be very happy. Ideally what I would like is a seprate thread that looks
up deque's in my class to see if it is ready to be pulled, lock it while
that data is being pulled and then mark it as done.
 However at this point I would be happy with any help I could get.
 Thank you in advance if it is possible.
 Thank you,
 Donald

P.S. I am new to c and c++, but I am working very hard to learn it. I
just need to get over this problem and I can continue with my program,
that I am writing to learn. I am attending school this fall, but I am
really want to learn as much as possible in advance. I just need some
body to explain to me how to do this. This is the one example that does
work, but as I said that static function is a big problem as I can not
pass any information to it.

#include <curl/curl.h>
#include <curl/types.h>
#include <curl/easy.h>

using namespace std;

#include "MakeFile.h"

char* memory;
size_t UrlConnectionHtmlBody_size;

static size_t write_data(char *ptr, size_t size, size_t nmemb, void
*stream);

static size_t write_data(char *ptr, size_t size, size_t nmemb, void
*stream)
{
    size_t mem;
    //increase the memory buffer size being held
    mem = size * nmemb;
    // set the sizt_t to know how long the char* is
    UrlConnectionHtmlBody_size += mem;
    if (mem>0)
    {
        memory = (char*)realloc(memory, UrlConnectionHtmlBody_size);
    }
    else
    {
        memory = (char*) malloc(UrlConnectionHtmlBody_size);
    }
    // store the data
    if (mem)
    {
        memcpy(&(memory[UrlConnectionHtmlBody_size-mem]), ptr, mem);
    };
    return mem;
};

void UrlGetInfo(char* VarUrlToGet, bool VarGetMainUrl)
{
    const char *p = VarUrlToGet; // get const char * representation
    printf("Get Url %s\n",VarUrlToGet);
    //Reset string varable for getting data
    memory = NULL;
    UrlConnectionHtmlBody_size = 0;
    CURL *curl_handle;
    CURLcode res;
    curl_global_init(CURL_GLOBAL_ALL);

    /* init the curl session */
    curl_handle = curl_easy_init();

    /* set URL to get */
    curl_easy_setopt(curl_handle, CURLOPT_URL, p);

    /* no progress meter please */
    curl_easy_setopt(curl_handle, CURLOPT_NOPROGRESS, 1L);

    /* send all data to this function */
    curl_easy_setopt(curl_handle, CURLOPT_WRITEFUNCTION, write_data);

    /*
    * Notice here that if you want the actual data sent anywhere else
but
    * stdout, you should consider using the CURLOPT_WRITEDATA option.
*/

    /* get it! */
    res = curl_easy_perform(curl_handle);
    if(CURLE_OK == res)
    {
        //set the information for the body to the UrlInfo
            CurrentUrl.UrlHtmlBody = (char*)
malloc(UrlConnectionHtmlBody_size);
            if (CurrentUrl.UrlHtmlBody != NULL)
            {
                CurrentUrl.UrlHtmlBodyMalloced = true;
                memcpy(&(CurrentUrl.UrlHtmlBody[0]), memory,
UrlConnectionHtmlBody_size);
                CurrentUrl.UrlHtmlBody_size =
UrlConnectionHtmlBody_size;
                CurrentUrl.UrlHtmlBodyStop = CurrentUrl.UrlHtmlBody +
CurrentUrl.UrlHtmlBody_size;
            }
        // pointer Redirect Site
        char *ra;
        char *ip;
        long HttpResponse;
        /* get the CURLINFO_HTTP_CONNECTCODE*/
        res = curl_easy_getinfo(curl_handle, CURLINFO_RESPONSE_CODE,
&HttpResponse);
            CurrentUrl.SetUrlHttpConnectionCode(HttpResponse);
        /* ask for the ReDirectAddress*/
        res = curl_easy_getinfo(curl_handle, CURLINFO_REDIRECT_URL,
&ra);
        if((CURLE_OK == res) && ra)
        {
            CurrentUrl.SetUrlAddressReDirect(ra);
        };
        // Get the IP address for the web site
        res = curl_easy_getinfo(curl_handle, CURLINFO_PRIMARY_IP, &ip);
        if((CURLE_OK == res) && ip)
        {
            CurrentUrl.SetUrlPrimaryIpAddress(ip);
        };
    }
    free (memory);
    /* cleanup curl stuff */
    curl_easy_cleanup(curl_handle);
};

-------------------------------------------------------------------
List admin: http://cool.haxx.se/list/listinfo/curl-library
Etiquette: http://curl.haxx.se/mail/etiquette.html
Received on 2010-07-07