This reduces the overhead needed to manage the memory blocks as the
blocks are guaranteed to be page-aligned. Also, the superblock is now
alllocated from within one of the memory blocks it manages. While this
does slightly reduce the available cachelines within the first block (by
one or two depending on 32 vs 64 bit pointers), it removes the need for
an extra memory allocation (probably via malloc) for the superblock.
I don't know that the cache line size is 64 bytes on 32 bit systems, but
it should be ok to assume that 64-byte alignment behaves well on systems
with smaller cache lines so long as they are powers of two. This does
mean there is some waste on 32-bit systems, but it should be fairly
minimal (32 bytes per memblock, which manages page sized regions).
I forgot to right-shift the value so offsets were becoming 0 or 8
instead of 0-15. This fixes the management of small objects. It turns
out that after this fix, qfvis's problems were caused by fragmentation
in the windings. Need to revisit line allocation and use POT-specialized
pools.
They're binned by powers of two (with in between sizes going to the
smaller bin should I make cache-line allocations NPOT (which I think
might be worthwhile). However, there seems to still be a bug somewhere
causing a nasty leak as now my hacked qfvis consumes 40G in less than a
minute.
The idea is to not search through blocks for an available allocation.
While the goal was to speed up allocation of cache lines of varying
cluster sizes, it's not enough due to fragmentation.
This was inspired by
Hoard: A Scalable Memory Allocator
for Multithreaded Applications
Emery D. Berger, Kathryn S. McKinley, Robert D. Blumofe, Paul R.
Wilson,
It's not anywhere near the same implementation, but it did take a few
basic concepts. The idea is twofold:
1) A pool of memory from which blocks can be allocated and then freed
en-mass and is fairly efficient for small (4-16 byte) blocks
2) Tread safety for use with the Vulkan renderer (and any other
multi-threaded tasks).
However, based on the Hoard paper, small allocations are cache-line
aligned. On top of that, larger allocations are page aligned.
I suspect it would help qfvis somewhat if I ever get around to tweaking
qfvis to use cmem.