Re: Last-Modified header
Date: Thu, 21 May 2020 20:58:03 +0100
On Thu, May 21, 2020 at 4:18 PM Dan Fandrich via curl-library <
curl-library_at_cool.haxx.se> wrote:
> On Thu, May 21, 2020 at 03:46:33PM +0100, James Read via curl-library
> wrote:
> > I'm implementing a simple web crawler with curl and want to retrieve the
> > Last-Modified header so I can implement a sensible recrawl policy. I've
> found
> > https://curl.haxx.se/libcurl/c/getinfo.html which is a nice easy way to
> > retrieve the Content-Type header. Is there a similarly easy way to
> retrieve the
> > Last-Modified header? Or I do I need to parse the header myself?
> >
> > If I need to parse the header myself I found
> https://curl.haxx.se/libcurl/c/
> > sepheaders.html which prints headers to a file. Is there a way of just
> storing
> > the headers in memory so I can parse them there? I don't want to have to
> write
> > a file just to read it again.
>
> You can use that example as a basis, then set CURLOPT_HEADERFUNCTION with a
> function like WriteMemoryCallback() in the getinmemory.c example to store
> the
> headers in memory instead. Or, do something more intelligent since you're
> only
> interested in a single header. libcurl writes to a file by default, so by
> setting your own header callback function you can process them however you
> want.
>
>
OK, This is as far as I got:
static size_t
write_cb(void *contents, size_t size, size_t nmemb, void *p)
{
ConnInfo *conn = (ConnInfo *)p;
size_t realsize = size * nmemb;
conn->data = realloc(conn->data, conn->size + realsize + 1);
if (conn->data == NULL) {
/* out of memory! */
printf("not enough memory (realloc returned NULL)\n");
return 0;
}
memcpy(&(conn->data[conn->size]), contents, realsize);
conn->size += realsize;
conn->data[conn->size] = 0;
return realsize;
}
When I print out conn->data it just prints out the body. How do I get the
header?
> Dan
> -------------------------------------------------------------------
> Unsubscribe: https://cool.haxx.se/list/listinfo/curl-library
> Etiquette: https://curl.haxx.se/mail/etiquette.html
-------------------------------------------------------------------
Unsubscribe: https://cool.haxx.se/list/listinfo/curl-library
Etiquette: https://curl.haxx.se/mail/etiquette.html
Received on 2020-05-21