This removes the last of the arbitrary limits from qfvis. The goal is
not so much supporting crazy maps, but more about better data usage
(cluster_t is now 24 (or 16) bytes instead of 1048 (or 528). And
passages isn't used (yet?)...
It turns out cmem is not so good for many large allocations (probably a
bug in handling the blocks), but was really meant for lots of little
churning allocations anyway. After an analysis of winding lifetimes, it
became clear that the hunk allocator would work very well. The base
windings are allocated from a global hunk (currently 1GB, plenty for
even ad_tears), and ephemeral windings are allocated from a per-thread
hunk of 1MB (seems to be way more than enough: gmsp3v2 uses a maximum of
only 56064 bytes, and ad_tears got through 30% before I gave up on it).
Any speed difference (for gmsp3v2) seems to be lost in the noise: still
completing in 38.4s on my machine.
The output fat-pvs data is the *difference* between the base pvs and fat
pvs. This currently makes for about 64kB savings for marcher.bsp, and
about 233MB savings for ad_tears.bsp (or about 50% (470.7MB->237.1MB)).
I expect using utf-8 encoding for the run lengths to make for even
bigger savings (the second output fat-pvs leaf of marcher.bsp is all 0s,
or 6 bytes in the file, which would reduce to 3 bytes using utf-8).
Extremely large maps take a very long time to process their PVS sets for
PHS or shadows, so having an off-line compiler seems like a good idea.
The data isn't written out yet, and the fat pvs code may not be optimal
for cache access, but it gets through ad_tears in about 500s (12
threads, compared to 2100s single-threaded in the qw server).
While whether it's any faster is debatable (it's slightly slower, but
many more portals are being tested due to different rounding in the base
vis stage), it's certainly easier to read.
While the main bulk of the improvement (36s down from 42s for
gmsp3v2.bsp on my i7-6850K) comes from using a high-tide allocator for
the windings (which necessitated using a fixed size), it is ever so
slightly faster than using malloc as the back-end.
This reverts commit 1ea79e8626.
Conflicts:
tools/qfvis/include/vis.h
tools/qfvis/source/flow.c
I've decided to do reentrant versions of the set allocators and I didn't
particularly like the invasiveness of allocating sets this way.
This bypasses set_new, but completely removes the use of the global lock
from within RecursiveClusterFlow. This seems to give a small speedup: 203
seconds threaded.
I think the reason I didn't think of that when I tried to improve qfvis's
performance many years ago is I just simply did not understand
ClipToSeparators. However, the difference caching the separators makes is
phenomenal. Before the change, single threaded qfvis would get stuck on one
particular portal for at least a day (I gave up waiting), but now even a
debug build will complete gmsp3v2.bsp in less than 12 minutes (4 threads on
my quad-core). And that's at level 2! Getting stuck for a day was at level
0.
While noticeably slower than the previous expanded set manipulation code,
this is much easier to read. I can worry about optimizing the set code when
I get qfvis behaving better.
Unfortunately, just because the header is there doesn't mean anything will
actually work :(. Also, the check is based on the host vendor/os for now.
Yes, it's rather lame but it will do for now.
With this, QF will build on an almost fresh ps3toolchain install. Only two
"fixes" are needed:
o In $PS3DEV/ppu/powerpc64-ps3-elf: ln -s ../include sys-include
o libsamplerate cross-built and installed.
I got rather tired of there being multiple definitions of mostly compatible
plane types (and I need a common type anyway). dplane_t still exists for
now because I want to be careful when messing with the actual bsp format.
Make the params for FreeWinding and CopyWinding consistent with those in
qfbsp. This fixes some doxygen warnings while I think about how best to
handle the duplicate code.