curl / Mailing Lists / curl-users / Single Mail
Buy commercial curl support from WolfSSL. We help you work out your issues, debug your libcurl applications, use the API, port to new platforms, add new features and more. With a team lead by the curl founder himself.

Re: curl and etag usage (--etag-save and --etag-compare)?

From: Timothe Litt <litt_at_acm.org>
Date: Sun, 6 Mar 2022 13:09:16 -0500

On 06-Mar-22 07:29, jacques granduel via curl-users wrote:
> Hi Curl Community,
>
> and this was another question about etag usage posted on StackOverFlow
> <https://stackoverflow.com/questions/71244361/curl-and-etag-usage-etag-save-and-etag-compare>
> without any answer.
> I reproduce the question hereafter:
>
> Somehow related to how-to-use-curl-z-parallel-effectively
> <https://stackoverflow.com/questions/71244217/how-to-use-curl-z-parallel-effectively>,
> I need to download thousands of documents, but the server manages
> *etags*. As of version |curl-7.68|, curl has 2 options for dealing
> with etags (|--etag-save <file> and --etag-compare <file>|). So etags
> can be saved for later comparisons. But it seems that the only built
> way is to use an etag file per downloaded file, which is cumbersome.
> Is there a way to pass only the etag value? or a key-value file with
> all etags? Should I resort to |-H 'If-None-Match: <etag>'| as
> described curl-and-etag-usage
> <https://stackoverflow.com/questions/9920018/curl-and-etag-usage>?
>
> What's the best way to use --etag-save/--etag-compare with thousands
> downloads?
>
> Thanks again. Best regards.
> jgran
>
You're right, this is awkward & not scalable to downloads of a large
number of files.  curl could do a somewhat better job.

I usually keep etags in a parallel file - I use <filename>.etag.

Perhaps curl could modify --etag-save & --etag-compare to accept a
template, e.g.

     --etag-save '*.etag'  & --etag-compare '*.etag'

In this scheme, curl would replace the '*' with the filename that it's
fetching.  Some might like to hide the etag files (e.g. '.*.etag' =>
.<filename>.etag).

For --etag-save, this is straightforward.

For --etag-compare, this only works when the filename is known - e.g. if
it's the last component of the URL path.

Unless an output file is specified to curl, it doesn't work when
fetching from a URL results a filename not in the URL. (e.g.
https://example.com/stuff/latest => mystuff.pdf, or worse =>
mystuff_v123.pdf)   The problem is that you have to know what e-tag to
send in the header before the URL is resolved.

Of course you can use a naming convention in a driver script - though
you lose connection reuse if you transfer one at a time.

One thing I do when implementing e-tags this way is to check the mtime
of the e-tag file.  If it's older than the data file, it's possible that
the etag file and local copy of the data don't match.  In rare cases,
(e.g. if the server reverted a file version), this could result in
incorrectly skipping a download... People do the strangest things - and
you can get local version skew via backup/restore and crashes.

Like many things, e-tags appear simple, but have a lot of corner cases.

Timothe Litt
ACM Distinguished Engineer
--------------------------
This communication may not represent the ACM or my employer's views,
if any, on the matters discussed.



-- 
Unsubscribe: https://lists.haxx.se/listinfo/curl-users
Etiquette:   https://curl.haxx.se/mail/etiquette.html
Received on 2022-03-06