I got a sync validation error on a scatter command (I think) thus the
setting was probably wrong. Most of the parameters are still what they
were, but I'll be able to tweak the barriers as necessary.
Unfortunately, it didn't help with the hang on fetching the light cull
query data when starting in fisheye mode (no hang when enabling fisheye
after startup). I'm not sure what's going on there other than the
queries aren't getting updated: the counts seem to be fine so maybe the
commands aren't running. I've probably got a tangled mess of
pseudo-parallel command buffers: I need to go through my system and
clean everything up.
Thanks to validation layers showing command buffer debug regions, it was
pretty easy to find the offending buffers. Did need to modify
QFV_PacketCopyBuffer to take a source barrier as well as the destination
barrier, but this is probably for the best.
And clean up the resulting errors. While some were tricky, there weren't
all that many: just some attachment issues and the multi-stage image
copy for scraps.
Fixing scraps required a barrier between copies. It might be overkill,
but a transfer_dst to transfer_dst image barrier worked.
Fixing attachments was a bit trickier:
- depth needed early and late fragment tests to be treated as one stage
- all attachments that were read later needed storeOp = none (using the
extension)
- and then finalLayout needed to be correct to avoid ghost transitions
- as well, for some reason the deffered gbuffer subpass needed a depth
dependency on the translucent pass even though neither one writes to
the depth attachment (possibly a validation bug, needs more
investigation).
I guess it's kind of UB, but it's handy for images that will be
conditionally written by the GPU but need to be in shader-read-only for
draw calls and the validation layers can't tell that the layers won't be
used.
It's a bit flaky for particles, especially at higher frame rates, but
that's due to supporting only 64 overlapping pixels. A reasonable
solution is probably switching to a priority heap for the "sort" and
upping the limit.
I got tired of writing the same 13 or so lines of code over and over (it
actually put me off experimenting with Vulkan). Thus...
QFV_PacketCopyBuffer does the work of handling barriers and a (full
packet) copy from the staging buffer to a GPU buffer.
QFV_PacketCopyImage does a similar job, but for images. However, it
still needs a lot of work, but it does make getting a basic texture onto
the GPU much less of a hassle.
Both functions should make staging data much less error-prone.
Mostly, this gets the stage flags in with the barrier, but also adds a
couple more barrier templates. It should make for slightly less verbose
code, and one less opportunity for error (mismatched barrier/stages).