The lightmaps aren't updated at all yet, so everything is static.
Figuring out how lightmap data gets to the gpu was a chore thanks to the
spaghetti in the bsp data, and then I'd forgotten that I was
pre-expanding the light data to rgb so wound up with weird lightmaps,
but without water or particles, demo1 is getting 5000fps at 800x450, and
it seems to be CPU limited.
Finally, quakeworld gets its *ahem* fancy skins. I'm not happy with how
skin loading is handled, but the whole model and skin support needs a
redesign.
Closes#74.
And further clean up skin api.
It turns out that skin functions must all be in the render libs, and
this results in Skin_Set (was Skin_SetSkin) needs to be accessed via a
function pointer rather than directly :(
This takes care of the double free and also cleans up a lot of the skin
api. However, the gl renderer lost top/bottom colors (for now). Vulkan
skins still don't work yet.
That is, those with more than 65520 vertices. Not properly supported for
sw or gl, and glsl isn't rendering properly for some reason (renderdoc
does see the meshes, though, so maybe depth or winding issues).
I suspect I may have done the incorrect offset for color to get a
gradient for some testing, but forgot to put it back. Or, of course, I
just completely and utterly brain-farted when writing the attribute.
Some of them were actual leaks, but tracking memory should be a lot
easier now. However, there's a lot of room for optimization of
allocations (eg, recylcling of hierarchies. There is now 1 active
allocation (according to tracy) when nq exits: Qgetline's string buffer
(I think an api change is in order).
This makes it possible for hierarchies to clean themselves up (by
deleting their entities (though that will cause other problems later
when the hierarchy doesn't own the entities)), thus plugging a memory
leak when parsing passage text.
The main goal of this change was to make it easier to tell when a
hierarchy has been deleted, but as a side benefit, it got rid of the use
of PR_RESMAP. Also, it's easy to track the number of hierarchies.
Unfortunately, it showed how brittle the component side of the ECS is
(scene and canvas registries assumed their components were the first (no
long the case), thus the sweeping changes).
Centerprint doesn't work (but it hasn't for a while).
It was a little off-putting getting an incorrectly clipped console when
using non-unitary scale (especially since I was trying to show abbator
something).
Seems to work well. The other renderers have stubs because I don't feel
like implementing clipping for them. The gl and glsl wouldn't be too
difficult (need to handle the draw queues), but sw needs a fair bit of
work and I'm not sure it's worth the effort.
This makes a possible improvement to e1m3, only barely affects ad_tears,
but makes about 30% difference to gmsp3v2 (21fps to 27, and from 3300
leafs to 2700).
The cascade_shadow and cube_shadow names are no longer relevant thanks
to the staging images, and the output field for render passes is
optional in general and irrelevant for shadow maps.
The rendering of the shadow maps now takes the culling information into
account resulting in a drastic reduction of work. There's still more
work to be done, but demo1 peaks at over 1000fps at 640x480, gmsp3v2 now
gets 14fps (1920x1080) near the front gate (used to be 3, then 6),
ad_tears is up to 3fps, but marcher is still unhappy, but it has
infinite radius lights, so needs more culling work (clipped light
volumes will help, I think). Also, culling lights for which nothing has
moved within their volumes will help somewhat (though not as much for
most id maps, I suspect).
Using the translucency pass made it easy to have depth-tested
translucent "solid" light volumes instead of always visible lines (which
are still an option as that's useful too). Most importantly, being able
to see the surfaces helped no end in figuring out that my hulls were
created with counter-clockwise windings instead of quake's usual
clockwise windings and thus my hulls were being rendered inside-out in
the occlusion pass.
The results of the occlusion queries give the lights that don't have a
visible hull, but unfortunately that includes any lights which the
camera is inside, but simple distance checks sort that out (with a
fudge-factor for the icosahedron vertices (1.583 (3(2+p)/(2+3p), p is
golden ratio)).
My efforts (especially the collect zone (what was I thinking)) got
tracy's knickers in a twist resulting in vanishing zones in the server.
It looks like there are some synchronisation issues between cpu and gpu,
but I'm not *too* worried about it at this stage.
The info isn't used yet, but this shows that vulkan's occlusion queries
are at least somewhat useful. However, the technique isn't perfect:
infinite radius lights (1/r and 1/r^2) are difficult to cull, and all
lights can poke through thin enough walls, and then lights containing
the camera get culled incorrectly (will need a separate test). Still, it
looks like it will help once everything is tied together.
And make it callable directly (needed to be able to submit the command
buffer separately from the main commands (though this does mess with
tracy a little).
They weren't rendering properly at all due to the matrix updates getting
overwritten by the light data (I'd forgotten to advance the packet data
pointer).
This doesn't make much of a difference on the GPU, but it drastically
cuts down CPU usage, especially for ad_tears: shadow map drawing is down
from 16.3ms to 3.7ms thanks to no having to run the alias model queues
as often.
Batching shadow map rendering needs be able to reference matrices for
multiple lights in a single batch, but the only input is the view index,
so use that to look up the matrix index rather than using it to index
the matrices directly (modulo the base index that's still there).
Actually, only 29 are used because nvidia's drivers segfault when there
are more than 29 views (regardless of the exact bit pattern in the view
mask). This will allow rendering shadow maps in large batches, which
should make for better GPU utilization.
Even that's getting pretty big, but with the quanta at 128, that's a
maximum of 8 different image sizes (which is nice for my planned
"staging image" idea).
Interestingly, this caused a reduction in memory use for some maps (but
did increase marcher's again, but not as much as the bogus rounding
did). The idea was to use sparse bindings to remap shadow map layers,
but it turns out sparse bindings are insanely slow (beyond unusable).
However, the reduction in the number of shadow map images seems to be
worth it.
Since switching to the 1.2 api as a requirement, might as well use the
relevant structs instead of extension struct (for multiview). Came up
when double-checking the max views property due to running into what
appears to be an nvidia bug where > 29 views (any bit pattern) cause a
segfault when creating the pipeline.