cURL / Mailing Lists / curl-library / Single Mail

curl-library

Re: different "maxdownload" solutions

From: Lucas Adamski <wakked1_at_yahoo.com>
Date: Fri, 19 Oct 2001 16:03:35 -0700 (PDT)

I've been trying some of these ideas out, but I don't see how they are going
to work in a multi-threaded environment.

Why? Well, because the callbacks can only access global variables, and
currently there's no way that I can see of passing a per-instance
max-download size variable to any of these callbacks without hacking the lib
source code.

So, these options seem like even more of a hack than the original proposed
solution. I would think this would be a useful option for a lot of people...
anyone that's doing research on or crawling public, uncontrolled websites
runs the risk of running into large spam HTML pages that can oftentimes be
megabytes in size. In my particular case, I'm looking at processing log
files and such remotely via HTTP, but I only want to process the first X
bytes per file.

Given libcurl already has truely esoteric stuff like CURLOPT_INTERFACE
and CURLOPT_EGDSOCKET, I don't think a simple download size limit is
unreasonable. I'd rather have a dedicated MAX_DOWNLOADSIZE option rather
than hacking the callbacks, so that I can actually differentiate from a real
callback error.

I'm willing to do a bit of work to make this happen if we come to an
agreement as to the best way to approach it and incorporate it into the lib,
but I don't want to keep having to hack each version of libcurl that comes
out to make it work. Thanks,
Lucas.

--- Daniel Stenberg <daniel_at_haxx.se> wrote:
> Hi Lucas
>
> I've been thinking about your wish to see CURLOPT_MAXDOWNLOAD incorporated
> into libcurl. While I understand your request, I hesitate to actually add
> this. It is very specialised and I don't think that very many people will
> ever find a use for it.
>
> Instead I've been thinking about other, more generic, ways that you could
> achive the same goal: cut off larger downloads at a certain point.
>
> 1. You can just return from the WRITEFUNCTION with a value that causes
> libcurl to abort. You will get an error returned from perform(), but
> you'd
> know when you caused this error so you could treat the error as OK when
> this happens.
>
> 2. Similar to the approach above, you can use the PROGRESSFUNCTION callback
> as that will get repeated calls with download info, and returning a
> non-zero value back when you think enough has been downloaded will cause
> perform() to return CURLE_ABORTED_BY_CALLBACK. That too is easily
> handled
> by your code.
>
> Both these solutions are far better in my eyes, as they can be made without
> changes to libcurl. Of course, if you can come up with other solutions I'm
> all ears. I just don't like adding very odd specific cases into the generic
> library code.
>
> --
> Daniel Stenberg -- curl groks URLs -- http://curl.haxx.se/
>

__________________________________________________
Do You Yahoo!?
Make a great connection at Yahoo! Personals.
http://personals.yahoo.com
Received on 2001-10-20

This message: [ Message body ]
Next message: john lask: "(no subject)"
Previous message: Yanick Pelletier: "RE: Missing Timeout?"
In reply to: Daniel Stenberg: "different "maxdownload" solutions"
Next in thread: Daniel Stenberg: "Re: different "maxdownload" solutions"
Reply: Daniel Stenberg: "Re: different "maxdownload" solutions"

Contemporary messages sorted: [ by date ] [ by thread ] [ by subject ] [ by author ] [ by messages with attachments ]