I got a sync validation error on a scatter command (I think) thus the
setting was probably wrong. Most of the parameters are still what they
were, but I'll be able to tweak the barriers as necessary.
Unfortunately, it didn't help with the hang on fetching the light cull
query data when starting in fisheye mode (no hang when enabling fisheye
after startup). I'm not sure what's going on there other than the
queries aren't getting updated: the counts seem to be fine so maybe the
commands aren't running. I've probably got a tangled mess of
pseudo-parallel command buffers: I need to go through my system and
clean everything up.
Instead of forcing it via unnecessary alignment. I didn't feel like
using aligned malloc to silence ubsan when really the alignment was just
to force the size to be aligned.
Thanks to validation layers showing command buffer debug regions, it was
pretty easy to find the offending buffers. Did need to modify
QFV_PacketCopyBuffer to take a source barrier as well as the destination
barrier, but this is probably for the best.
Now all pipelines and any tasks that have a command buffer attached get
a region using their names (tasks use the function name). I don't know
when it happened, or if I failed to notice last time, but (sync)
validation layers now include the debug region for command buffers: very
nice.
The particle renderer uses the palette texture in the vertex shader, so
updating the palette needs the vertex shader stage included in the
barrier, but I imagine not all texture updates will need it, so add a
parameter to Vulkan_UpdateTex to select inclusion.
While every possible subsystem needs an initialization call, all that
does is add the actual initialization task to the render graph system.
This allows the render graph to be fully configurable, initializing only
those subsystems that the graph needs.
Scripted initialization is still separated from startup as render graph
creation needs various resources (eg, attachments) defined before
creating render and compute passes, but all those need to be created
before the subsystems can actually start up.
Finally. However, it has effect only when no render config is provided.
When a config is provided, things will break currently as nothing is
done yet, but getting a config in will take some work in qwaq and also
the render graph system as I want to make the startup functions
configurable.
And clean up the resulting errors. While some were tricky, there weren't
all that many: just some attachment issues and the multi-stage image
copy for scraps.
Fixing scraps required a barrier between copies. It might be overkill,
but a transfer_dst to transfer_dst image barrier worked.
Fixing attachments was a bit trickier:
- depth needed early and late fragment tests to be treated as one stage
- all attachments that were read later needed storeOp = none (using the
extension)
- and then finalLayout needed to be correct to avoid ghost transitions
- as well, for some reason the deffered gbuffer subpass needed a depth
dependency on the translucent pass even though neither one writes to
the depth attachment (possibly a validation bug, needs more
investigation).
It's not perfect (double fog on translucent surfaces, the
scatter/absorption isn't right, and no local lighting on the fog
itself), but it at least seems to look ok.
I think has been one of the biggest roadblocks to breaking free of
quake, so having dual render paths and thus the different new scene load
sequence has proven to be unexpected helpful. There's a lot more to be
done to make the render graph actually usable by anyone but me, but just
making scene load configurable frees up a lot. I think there needs to be
renderer startup/shutdown configuration too, but this seems to be enough
for now.
The lightmaps aren't updated at all yet, so everything is static.
Figuring out how lightmap data gets to the gpu was a chore thanks to the
spaghetti in the bsp data, and then I'd forgotten that I was
pre-expanding the light data to rgb so wound up with weird lightmaps,
but without water or particles, demo1 is getting 5000fps at 800x450, and
it seems to be CPU limited.
Finally, quakeworld gets its *ahem* fancy skins. I'm not happy with how
skin loading is handled, but the whole model and skin support needs a
redesign.
Closes#74.
Seems to work well. The other renderers have stubs because I don't feel
like implementing clipping for them. The gl and glsl wouldn't be too
difficult (need to handle the draw queues), but sw needs a fair bit of
work and I'm not sure it's worth the effort.
This makes a possible improvement to e1m3, only barely affects ad_tears,
but makes about 30% difference to gmsp3v2 (21fps to 27, and from 3300
leafs to 2700).
The results of the occlusion queries give the lights that don't have a
visible hull, but unfortunately that includes any lights which the
camera is inside, but simple distance checks sort that out (with a
fudge-factor for the icosahedron vertices (1.583 (3(2+p)/(2+3p), p is
golden ratio)).
My efforts (especially the collect zone (what was I thinking)) got
tracy's knickers in a twist resulting in vanishing zones in the server.
It looks like there are some synchronisation issues between cpu and gpu,
but I'm not *too* worried about it at this stage.
The info isn't used yet, but this shows that vulkan's occlusion queries
are at least somewhat useful. However, the technique isn't perfect:
infinite radius lights (1/r and 1/r^2) are difficult to cull, and all
lights can poke through thin enough walls, and then lights containing
the camera get culled incorrectly (will need a separate test). Still, it
looks like it will help once everything is tied together.
And make it callable directly (needed to be able to submit the command
buffer separately from the main commands (though this does mess with
tracy a little).
This doesn't make much of a difference on the GPU, but it drastically
cuts down CPU usage, especially for ad_tears: shadow map drawing is down
from 16.3ms to 3.7ms thanks to no having to run the alias model queues
as often.
Batching shadow map rendering needs be able to reference matrices for
multiple lights in a single batch, but the only input is the view index,
so use that to look up the matrix index rather than using it to index
the matrices directly (modulo the base index that's still there).
Since switching to the 1.2 api as a requirement, might as well use the
relevant structs instead of extension struct (for multiview). Came up
when double-checking the max views property due to running into what
appears to be an nvidia bug where > 29 views (any bit pattern) cause a
segfault when creating the pipeline.
I had missed that upping max lights to 2048 meant that up to 12288
matrices are needed for all the possible lights. This made it so the
light type could not be encoded in id_data, but the shaders never used
it anyway. This leaves one bit free.
While QFV_PacketScatterBuffer works on only one destination buffer, it
turns out it's still useful for scattering to multiple buffers, just
with multiple calls. This makes it pretty easy to combine multiple
buffer updates into a single staging buffer packet, resulting in
reducing lighting's packet use from up to 7 to just one, drastically
reducing the pressure on the stating buffer packet pool, and thus
reducing the chances of QFV_PacketAcquire stalling.
This relies on my fork of tracy: https://github.com/taniwha/tracy
on the wip-c-vulkan branch. Everything is still rather flaky though.
This necessitated the jump to vulkan 1.2 as a requirement.
I'm still not happy with it being a compile time constant, but this
takes care of the interlock between frames in flight... for now: it's
fragile and really needs the excessive small-packet use in draw and
lighting to be cleaned up.
After discussion with Darian, I've decided to go with one big staging
buffer (with lots of packets) shared between FiF as the large size will,
in the end, be more flexible.
This seems excessive, but gmsp3v2 map has 1399 lights. Worse, it has a
lot of different light sizes that go up by small increments (generally
around 10) resulting in 33 shadow map images (1 too many). Quantizing
the sizes to 32 drops this nicely to 20, and reduces memory consumption
slightly too (image buffer overhead, I guess).
This covers only the rendering of the shadow maps (actual use still
needs to be implemented). Working with orthographic projection matrices
is surprisingly difficult, partly because creating one includes the
translations needed to get the scene into the view (and depth range),
which means care needs to be taken with the view (camera) matrix in
order to avoid double-translating depending on just how the orthographic
matrix is set up (if it's set up to focus on the origin, then the camera
matrix will need translation, otherwise the camera matrix needs to avoid
translation).
Updating directional light CSM matrices made me realize I needed to be
able to send the contents of a packet to multiple locations in a buffer
(I may need to extend it to multiple buffers). Seems to work, but I have
only the one directional light with which to test.
This improves the projection API in that near clip is a parameter rather
than being taken directly from the cvar, and a far clip (ie, finite far
plane) version is available (necessary for cascaded shadow maps as it's
rather hard to fit a box to an infinite frustum).
Also, the orthographic projection matrix is now reversed as per the
perspective matrix (and the code tidied up a little), and a version that
takes min and max vectors is available.
gcc didn't like a couple of the changes (rightly so: one was actually
incorrect), and the fix for qfcc I didn't think to suggest while working
with Emily.
The general CFLAGS etc fixes mostly required just getting the order of
operations right: check for attributes after setting the warnings flags,
though those needed some care for gcc as it began warning about main
wanting the const attribute.
Fixing the imui link errors required moving the ui functions and setup
to vulkan_lighting.c, which is really the only place they're used.
Fixing a load of issues related to autoconf and some small source-level issues to re-add clang support.
autoconf feature detection probably needs some addressing - partially as -Werror is applied late.
This has resulted in some rather interesting information: it seems the
surfaces (and thus, presumably bounding boxes) for leafs have little to
do with the actual leaf node's volume.