One more step towards BSP thread-safety. This one brought with it a very
noticeable speed boost (ie, not lost in the noise) thanks to the face
visframes being in tightly packed groups instead of 128 bytes apart,
though the sw render's boost is lost in the noise (but it's very
fill-rate limited).
This is next critical step to making BSP rendering thread-safe.
visframe was replaced with cluster (not used yet) in anticipation of BSP
cluster reconstruction (which will be necessary for dealing with large
maps like ad_tears).
The main goal was to get visframe out of mnode_t to make it thread-safe
(each thread can have its own visframe array), but moving the plane info
into mnode_t made for better data access patters when traversing the bsp
tree as the plane is right there with the child indices. Nicely, the
size of mnode_t is the same as before (64 bytes due to alignment), with
4 bytes wasted.
Performance-wise, there seems to be very little difference. Maybe
slightly slower.
The unfortunate thing about the change is the plane distance is negated,
possibly leading to some confusion, particularly since the box and
sphere culling functions were affected. However, this is so point-plane
distance calculations can be done with a single 4d dot product.
While it takes one extra step to grab the marksurface pointer,
R_MarkLeaves and R_MarkLights (the two actual users) seem to be either
the same speed or fractionally faster (by a few microseconds). I imagine
the loss gone to the extra fetch is made up for by better bandwidth
while traversing the leafs array (mleaf_t now fits in a single cache
line, so leafs are cache-aligned since hunk allocations are aligned).
This is an extremely extensive patch as it hits every cvar, and every
usage of the cvars. Cvars no longer store the value they control,
instead, they use a cexpr value object to reference the value and
specify the value's type (currently, a null type is used for strings).
Non-string cvars are passed through cexpr, allowing expressions in the
cvars' settings. Also, cvars have returned to an enhanced version of the
original (id quake) registration scheme.
As a minor benefit, relevant code having direct access to the
cvar-controlled variables is probably a slight optimization as it
removed a pointer dereference, and the variables can be located for data
locality.
The static cvar descriptors are made private as an additional safety
layer, though there's nothing stopping external modification via
Cvar_FindVar (which is needed for adding listeners).
While not used yet (partly due to working out the design), cvars can
have a validation function.
Registering a cvar allows a primary listener (and its data) to be
specified: it will always be called first when the cvar is modified. The
combination of proper listeners and direct access to the controlled
variable greatly simplifies the more complex cvar interactions as much
less null checking is required, and there's no need for one cvar's
callback to call another's.
nq-x11 is known to work at least well enough for the demos. More testing
will come.
The fact that numleafs did not include leaf 0 actually caused in many
places due to never being sure whether to add 1. Hopefully this fixes
some of the confusion. (and that comment in sv_init didn't last long :P)
Modern maps can have many more leafs (eg, ad_tears has 98983 leafs).
Using set_t makes dynamic leaf counts easy to support and the code much
easier to read (though set_is_member and the iterators are a little
slower). The main thing to watch out for is the novis set and the set
returned by Mod_LeafPVS never shrink, and may have excess elements (ie,
indicate that nonexistent leafs are visible).
This is the first step towards component-based entities.
There's still some transform-related stuff in the struct that needs to
be moved, but it's all entirely client related (rather than renderer)
and will probably go into a "client" component. Also, the current
components are directly included structs rather than references as I
didn't want to deal with the object management at this stage.
As part of the process (because transforms use simd) this also starts
the process of moving QF to using simd for vectors and matrices. There's
now a mess of simd and sisd code mixed together, but it works
surprisingly well together.
This is a big step towards a cleaner api. The struct reference in
model_t really should be a pointer, but bsp submodel(?) loading messed
that up, though that's just a matter of taking more care in the loading
code. It seems sensible to make that a separate step.
All of the nastiness is hidden in bspfile.c (including the old bsp29
specific data types). However, the conversions between bsp29 and bsp2 are
implemented but not yet hooked up properly. This commit just gets the data
structures in place and the obvious changes necessary to the rest of the
engine to get it to compile, plus a few obvious "make it work" changes.
I got rather tired of there being multiple definitions of mostly compatible
plane types (and I need a common type anyway). dplane_t still exists for
now because I want to be careful when messing with the actual bsp format.
functions when told to. also make gcc warn if it can't inline a function.
Explicitly inline several functions (including moving VectorNormalize to
mathlib.h so it /can/ be) resulting in a 5.5% speedup for spam2 (88 to 92
fps)