Buy commercial curl support from WolfSSL. We help you work
out your issues, debug your libcurl applications, use the API, port to new
platforms, add new features and more. With a team lead by the curl founder
himself.
Re: Fewer mallocs is better, episode #47
- Contemporary messages sorted: [ by date ] [ by thread ] [ by subject ] [ by author ] [ by messages with attachments ]
From: Timothe Litt <litt_at_acm.org>
Date: Tue, 1 Feb 2022 16:20:04 -0500
On 01-Feb-22 15:38, Thomas Dineen via curl-library wrote:
> Gentle People:
> I know that this is a bit off from the current thread but consider
> this:
> "Few mallocs is good" Why?
> One issue is that in many (all) Operating Systems malloc is a
> system call,
> where a system call causes one or more context switches. Where each
> context switch is considered high overhead ( read that time).
> Consider this: What if you create your own local malloc, which I will
> call LMalloc(). LMalloc is initialized with a very large block via a
> single
> initial call to malloc, keeping in mind that this could be increased
> later,
> with the goal of a very very few large calls to malloc.
> When the user wants memory via the traditional malloc() he uses
> instead LMalloc() where LMalloc maintains a local memory management pool
> and allocates memory on request. At the User Program API the User sees
> LMalloc() behave EXACTLY like malloc() (WITHOUT SYSTEM CALLS).
> Thomas Dineen
>
>
This isn't quite correct for any OS I've worked with - which are a lot.
malloc() is implemented by a runtime library. There are a lot of
different strategies used by the RTLs, but the RTL call generally does
not involve a context-switching syscall. Any reasonably modern RTL will
maintain one or more free lists (typically based on common allocation
sizes). Will allocate from zones that make returning (free'd) memory
efficient. May allocate more memory than requested to reduce
fragmentation and to avoid copies on realloc(). May request physical
placement based on NUMA. Granular locking for threaded environments
(with clever locking strategies) And more.
Only when the RTL's preallocated zones and free lists are unable to
satisfy a request will a syscall to the OS be made to obtain more
memory. Unless your application is pathological, this will not be
frequent.
There was a time when the RTLs' malloc() implementations were less
sophisticated, and the strategy that you suggest could have been
beneficial - especially when the application's use pattern didn't match
the RTL's strategies. But any RTL today already does what you suggest.
Probably better. And it's debugged.
Much work - and many academic papers - have gone into the RTLs'
implementations of dynamic memory. It's rarely the case that a custom
implementation will do sufficiently better to justify the effort.
(There are a few tools that do use custom implementations for debugging
and detecting misuse.)
That said, there is a non-zero cost to allocation, deallocation, and
(especially some) resizing. It can be worthwhile to avoid unnecessary
use of heap memory. And if you expect megabytes, don't do an initial
allocation of 1 byte. If you know a reasonable upper bound on a
stream, allocate that rather than increasing your buffer a byte at a
time. Making unnecessary copies can also add up, which is why many
languages (and packages) will use reference counts instead. (There are
more papers on garbage collection algorithms...)
This area has been studied, engineered, and improved since the 60s.
Bottom line: it's not as simple as you suggest to do better than the
RTL. And heap memory is a useful tool that simplifies programming, so
it should be used. But like any sharp tool, you do need to understand
how to avoid being hurt by misuse.
Timothe Litt
ACM Distinguished Engineer
--------------------------
This communication may not represent the ACM or my employer's views,
if any, on the matters discussed.
Received on 2022-02-01
Date: Tue, 1 Feb 2022 16:20:04 -0500
On 01-Feb-22 15:38, Thomas Dineen via curl-library wrote:
> Gentle People:
> I know that this is a bit off from the current thread but consider
> this:
> "Few mallocs is good" Why?
> One issue is that in many (all) Operating Systems malloc is a
> system call,
> where a system call causes one or more context switches. Where each
> context switch is considered high overhead ( read that time).
> Consider this: What if you create your own local malloc, which I will
> call LMalloc(). LMalloc is initialized with a very large block via a
> single
> initial call to malloc, keeping in mind that this could be increased
> later,
> with the goal of a very very few large calls to malloc.
> When the user wants memory via the traditional malloc() he uses
> instead LMalloc() where LMalloc maintains a local memory management pool
> and allocates memory on request. At the User Program API the User sees
> LMalloc() behave EXACTLY like malloc() (WITHOUT SYSTEM CALLS).
> Thomas Dineen
>
>
This isn't quite correct for any OS I've worked with - which are a lot.
malloc() is implemented by a runtime library. There are a lot of
different strategies used by the RTLs, but the RTL call generally does
not involve a context-switching syscall. Any reasonably modern RTL will
maintain one or more free lists (typically based on common allocation
sizes). Will allocate from zones that make returning (free'd) memory
efficient. May allocate more memory than requested to reduce
fragmentation and to avoid copies on realloc(). May request physical
placement based on NUMA. Granular locking for threaded environments
(with clever locking strategies) And more.
Only when the RTL's preallocated zones and free lists are unable to
satisfy a request will a syscall to the OS be made to obtain more
memory. Unless your application is pathological, this will not be
frequent.
There was a time when the RTLs' malloc() implementations were less
sophisticated, and the strategy that you suggest could have been
beneficial - especially when the application's use pattern didn't match
the RTL's strategies. But any RTL today already does what you suggest.
Probably better. And it's debugged.
Much work - and many academic papers - have gone into the RTLs'
implementations of dynamic memory. It's rarely the case that a custom
implementation will do sufficiently better to justify the effort.
(There are a few tools that do use custom implementations for debugging
and detecting misuse.)
That said, there is a non-zero cost to allocation, deallocation, and
(especially some) resizing. It can be worthwhile to avoid unnecessary
use of heap memory. And if you expect megabytes, don't do an initial
allocation of 1 byte. If you know a reasonable upper bound on a
stream, allocate that rather than increasing your buffer a byte at a
time. Making unnecessary copies can also add up, which is why many
languages (and packages) will use reference counts instead. (There are
more papers on garbage collection algorithms...)
This area has been studied, engineered, and improved since the 60s.
Bottom line: it's not as simple as you suggest to do better than the
RTL. And heap memory is a useful tool that simplifies programming, so
it should be used. But like any sharp tool, you do need to understand
how to avoid being hurt by misuse.
Timothe Litt
ACM Distinguished Engineer
--------------------------
This communication may not represent the ACM or my employer's views,
if any, on the matters discussed.
-- Unsubscribe: https://lists.haxx.se/listinfo/curl-library Etiquette: https://curl.haxx.se/mail/etiquette.html
- application/pgp-signature attachment: OpenPGP digital signature