cURL / Mailing Lists / curl-library / Single Mail

curl-library

Re: Cacheing In

From: <CHRIS.CLARK_at_FRIENDSPROVIDENT.CO.UK>
Date: Sun, 17 Feb 2002 13:29:56 +0000

--- Received from FPU.CLARKC1 0115-9810103 17-02-02 13:26

  -> IN=curl-library_at_lists.sourceforge.net

Hi Daniel, and thanks for the reply!

I have to say first that I'm amazed and impressed at your sheer volume and
quality of input to cURL and libcurl. Not to mention the Web site, the
mailing lists, and the full-time job. I guess sorting through everyone else's
comments and contributions and evaluating and prioritising them must keep you
off the streets as well. I hope the vacation has recharged your batteries!

I guess the performance issue that I raised for files (certificates and keys)
more properly belongs to your much wider "sharing is caring" discussion,
since that seems to cover many of the same areas (cacheing, sharing objects
across threads and curl handles, portable support for multithreaded apps with
or without mutexes, and so on). And doing all of that will be no small
undertaking, I suspect.

However, in the meantime here's a couple of smaller scale suggestions on the
particular point I raised. One option would be to use the existing APIs, but
amend the perform() code to do its own cacheing for any of the referenced
files (certificates and key). It would have to check that the filenames were
still the same as last time, and reload them if they'd changed. And for those
awkward customers who are cussed enough to swap the contents of the files
between calls, but keep the same filenames, maybe a file timestamp comparison
as well. But as the whole point is to avoid hitting the file system again,
and catering for minority tastes would spoil things for the majority, they
could take a hike.

Another option would be to extend the existing APIs by introducing a new
(but small) set of CURLOPTs, paralleling the current filename-oriented
CURLOPTs. The pointer parameter for these new CURLOPTs would point to an
[address + length] structure, which would contain everything needed to get
the data from application memory instead of from a file. The perform() API
could be amended to get the data by preference from application memory, so it
wouldn't have to do its own cacheing. Applications could also use these new
CURLOPTs to reset the pointers to NULL again (or even just reset their own
pointer in the [address + length] structure), so the application could easily
switch off cacheing, and force file reloads if it wanted to. So this way we
cater for the awkward squad as well, without impacting everyone else.

A minor variation on this theme, if you don't like the idea of introducing
new CURLOPTs, would be to have "special" values for the pointer for the
existing CURLOPTs, so that the perform() code can differentiate between a
filename reference and a memory block reference. Rather better (IMHO) than a
special value for the pointer would be a special value for the reference (eg,
a zero word followed by the [address + length] structure). But you'd have to
choose carefully so as not to break any existing apps.

Finally, if you took the second option or its variation, you'd have to worry
about all the language bindings for other than 'C' and C++. I don't know how
portable the suggested API changes would be across the whole range of
language bindings you currently support.

But either option would solve my original problem, and make libcurl HTTPS
performance in a high-volume, multithreaded environment even better.

Chris

----------------------------------------------------------------------

From: daniel_at_haxx.se
To: CHRIS.CLARK_at_FRIENDSPROVIDENT.CO.UK
Cc: curl-library_at_lists.sourceforge.net
Date: Sat, 16 Feb 2002 18:51:33 +0100
Subject: Re: Cacheing In

On Mon, 11 Feb 2002 CHRIS.CLARK_at_FRIENDSPROVIDENT.CO.UK wrote:

> Hi!

Hey. Sorry it took a while to respond, no one else seems to have grabbed this
and I was on vacation. Anyway, here's my shot...

> When each thread starts up, it creates its own curl handle, and then sets
> up filenames on that handle for a client certificate, a client private key
> and password, and a trusted server certificate, using curl_easy_setopt
> calls.

> The values of these filenames, passwords, etc, given by each thread are
> identical (each thread is using a single client certificate and key, to
> represent itself - a single client application - to the Web server, and a
> single trusted server certificate to represent the one and only Web server
> it talks to).

> At a high rate of knots after that, my threads are called (in no
> particular order, but in rapid succession) and a called thread will fire
> off a single HTTPS URL to the (one and only) Web server. When the thread
> gets the response, and processes it, it goes back to sleep for a while,
> until it's prodded into life again.

> My question is this - for each thread, how many times do the various files
> (client certificate, client key, and server certificate) get read in
> physically from disk? Is it only once each per thread (for the first time
> the thread gets called to issue a URL)? That would be really neat, as it
> would indicate a fantastic caching deal going on somewhere, either in
> libcurl, or OpenSSL, or possibly both.

> Or would it be what I fear, which is three disk reads (one per file), per
> thread, per requested URL? Yes I know that even then I'd probably get a
> good deal from a goodly-sized disk cache, but I'd like this thing to really
> fly if I can make it.

> Any takers? Thanks in advance for any illumination you can shed.

I won't deny that some of your fears as stated above most likely are true.
Not for every possible file etc, but still for a fair share of them.

I'll willingly admit that a very small percentage of my development time on
curl is spent on increasing performance and I need help and input to improve
it. Far too much of the curl development tend to get done by me alone. This
said, I'm sure we can improve several of these things to get cached or
read-once instead of read-every-time. I would love a detailed study on what
curl does today and some suggestions on how we can improve this. Like the
talk about certificates and how we can improve performance on that handling,
provided by Cris Bailiff a few weeks ago.

--
    Daniel Stenberg -- curl groks URLs -- http://curl.haxx.se/
This e-mail may contain confidential information and/or copyright
material. This e-mail is intended for the use of the addressee
only. Any unauthorised use may be unlawful.
If you receive this e-mail by mistake please advise the sender
immediately by using the reply facility in your e-mail software.
Friends Provident Marketing Group consists of the following
companies:
Friends Provident Life and Pensions Limited.
Registered number 4096141.
Friends' Provident Unit Trust Managers Limited.
Registered number 970641
FP Life Assurance Limited.
Registered number 782698
Friends Provident Pensions Limited.
Registered number 475201
Friends Ivory & Sime Managed Pensions Funds Limited.
Registered number 1020044
Registered and Head Office of each of the above companies is at Pixham End,
Dorking, Surrey RH4 1QA.
Registered in England. Incorporated companies limited by shares.
Ivory & Sime Trustlink Limited. Registered number 151198
Registered and Head Office: One Charlotte Square, Edinburgh EH2
4DZ.  Registered in Scotland. Incorporated company limited by
shares.
Companies within this Group transact life assurance, pension,
permanent health, unit trust and investment trust business
All are regulated by the Financial Services Authority.
Received on 2002-02-17