This reduces the overhead needed to manage the memory blocks as the
blocks are guaranteed to be page-aligned. Also, the superblock is now
alllocated from within one of the memory blocks it manages. While this
does slightly reduce the available cachelines within the first block (by
one or two depending on 32 vs 64 bit pointers), it removes the need for
an extra memory allocation (probably via malloc) for the superblock.
I don't know that the cache line size is 64 bytes on 32 bit systems, but
it should be ok to assume that 64-byte alignment behaves well on systems
with smaller cache lines so long as they are powers of two. This does
mean there is some waste on 32-bit systems, but it should be fairly
minimal (32 bytes per memblock, which manages page sized regions).
While the main bulk of the improvement (36s down from 42s for
gmsp3v2.bsp on my i7-6850K) comes from using a high-tide allocator for
the windings (which necessitated using a fixed size), it is ever so
slightly faster than using malloc as the back-end.
They're binned by powers of two (with in between sizes going to the
smaller bin should I make cache-line allocations NPOT (which I think
might be worthwhile). However, there seems to still be a bug somewhere
causing a nasty leak as now my hacked qfvis consumes 40G in less than a
minute.
The idea is to not search through blocks for an available allocation.
While the goal was to speed up allocation of cache lines of varying
cluster sizes, it's not enough due to fragmentation.
This was inspired by
Hoard: A Scalable Memory Allocator
for Multithreaded Applications
Emery D. Berger, Kathryn S. McKinley, Robert D. Blumofe, Paul R.
Wilson,
It's not anywhere near the same implementation, but it did take a few
basic concepts. The idea is twofold:
1) A pool of memory from which blocks can be allocated and then freed
en-mass and is fairly efficient for small (4-16 byte) blocks
2) Tread safety for use with the Vulkan renderer (and any other
multi-threaded tasks).
However, based on the Hoard paper, small allocations are cache-line
aligned. On top of that, larger allocations are page aligned.
I suspect it would help qfvis somewhat if I ever get around to tweaking
qfvis to use cmem.