Re: Last-Modified header
Date: Thu, 21 May 2020 17:13:44 +0200
On Thu, May 21, 2020 at 03:46:33PM +0100, James Read via curl-library wrote:
> I'm implementing a simple web crawler with curl and want to retrieve the
> Last-Modified header so I can implement a sensible recrawl policy. I've found
> https://curl.haxx.se/libcurl/c/getinfo.html%c2%a0which is a nice easy way to
> retrieve the Content-Type header. Is there a similarly easy way to retrieve the
> Last-Modified header? Or I do I need to parse the header myself?
>
> If I need to parse the header myself I found https://curl.haxx.se/libcurl/c/
> sepheaders.html which prints headers to a file. Is there a way of just storing
> the headers in memory so I can parse them there? I don't want to have to write
> a file just to read it again.
You can use that example as a basis, then set CURLOPT_HEADERFUNCTION with a
function like WriteMemoryCallback() in the getinmemory.c example to store the
headers in memory instead. Or, do something more intelligent since you're only
interested in a single header. libcurl writes to a file by default, so by
setting your own header callback function you can process them however you want.
Dan
-------------------------------------------------------------------
Unsubscribe: https://cool.haxx.se/list/listinfo/curl-library
Etiquette: https://curl.haxx.se/mail/etiquette.html
Received on 2020-05-21