Interestingly, this caused a reduction in memory use for some maps (but
did increase marcher's again, but not as much as the bogus rounding
did). The idea was to use sparse bindings to remap shadow map layers,
but it turns out sparse bindings are insanely slow (beyond unusable).
However, the reduction in the number of shadow map images seems to be
worth it.
Since switching to the 1.2 api as a requirement, might as well use the
relevant structs instead of extension struct (for multiview). Came up
when double-checking the max views property due to running into what
appears to be an nvidia bug where > 29 views (any bit pattern) cause a
segfault when creating the pipeline.
I had missed that upping max lights to 2048 meant that up to 12288
matrices are needed for all the possible lights. This made it so the
light type could not be encoded in id_data, but the shaders never used
it anyway. This leaves one bit free.
I'd added some developer output to see how the layers were distributed
between images and found the image widths to be... odd. It turns out I
was double-adding the shadow_quanta. Oops. Results in ~164MB less memory
used by marcher (for 32 pixel quanta).
This allows "large" updates to be done in a single staging buffer packet
instead of one packet per quad (or slice). Currently, they're batched
into groups of 64 (not really enough for conchars, but that's only at
init-time, so not all that bad). Nicely, this seems to simplify the
staging code.
Fixes#65.
When looking at a struct and seeing "count" and "size", I had to hunt to
see what "size" really meant. Cherno is very much right about size vs
count being bytes vs number of objects.
load_conchars and load_crosshairs were using create_quad directly (due
to make_static_quad having the wrong parameters), but this spread the
handling of which buffer and index where used through the code. Thus fix
make_static_quad to take the x, y offsets (like make_dyn_quad) and then
use it in load_conchars and load_crosshairs.
While QFV_PacketScatterBuffer works on only one destination buffer, it
turns out it's still useful for scattering to multiple buffers, just
with multiple calls. This makes it pretty easy to combine multiple
buffer updates into a single staging buffer packet, resulting in
reducing lighting's packet use from up to 7 to just one, drastically
reducing the pressure on the stating buffer packet pool, and thus
reducing the chances of QFV_PacketAcquire stalling.
This relies on my fork of tracy: https://github.com/taniwha/tracy
on the wip-c-vulkan branch. Everything is still rather flaky though.
This necessitated the jump to vulkan 1.2 as a requirement.
This gets the dynamic data closer to the gpu, so should make a
difference when there's a lot going on. However, for simple tests, it
made no difference.
I'm still not happy with it being a compile time constant, but this
takes care of the interlock between frames in flight... for now: it's
fragile and really needs the excessive small-packet use in draw and
lighting to be cleaned up.
After discussion with Darian, I've decided to go with one big staging
buffer (with lots of packets) shared between FiF as the large size will,
in the end, be more flexible.
Tracy is a frame profiler: https://github.com/wolfpld/tracy
This uses Tracy's C API to instrument the code (already added in several
places). It turns out there is something very weird with the fence
behavior between the staging buffers and render commands as the
inter-frame delay occurs in a very strangle place (in the draw code's
packet acquisition rather than the fence waiter that's there for that
purpose). I suspect some tangled dependencies.
They're not quite working (trail path offset is incorrect) but their
pixels are getting to the screen. Also, lifetimes are off for rocket
trails in that as soon as the entity dies, so does the trail.
This gets things *compiling* again, though it's still non-functional and
definitely wrong (don't want trail in renderer_t), but I need to think
about the design for getting trails as components. Also need to think
about integrating trails into the client effects system so trails can be
shared between renderers.
I'm not sure what's up, but arm gcc thinks the array isn't properly
initialized even though x86_64 gcc does. Maybe something with padding.
At least c23 makes it easy to 0-initialize VLAs.
I'm actually surprised anything worked, though I guess it was just the
one entry getting corrupted (and not 32, but I figured allocate slots
for all of the dynamic lights just in case). Or none, really, since
larger scenes (ie, those with multiple lights that fit in the same image
size) would result in not all the maps getting used and thus one spare
for dynamic lights.
This seems excessive, but gmsp3v2 map has 1399 lights. Worse, it has a
lot of different light sizes that go up by small increments (generally
around 10) resulting in 33 shadow map images (1 too many). Quantizing
the sizes to 32 drops this nicely to 20, and reduces memory consumption
slightly too (image buffer overhead, I guess).
While the gl renderer does (or did) have it's attempt at shadows, the
others don't even try, thus the onlyshadows-marked player model doesn't
work so well (looks rather goofy seeing the arms like that).
Having more than one copy of ShadowMatrices went against my plans, and I
had trouble finding the attachments set (light_attach.h wasn't such a
good idea).
This covers only the rendering of the shadow maps (actual use still
needs to be implemented). Working with orthographic projection matrices
is surprisingly difficult, partly because creating one includes the
translations needed to get the scene into the view (and depth range),
which means care needs to be taken with the view (camera) matrix in
order to avoid double-translating depending on just how the orthographic
matrix is set up (if it's set up to focus on the origin, then the camera
matrix will need translation, otherwise the camera matrix needs to avoid
translation).
Updating directional light CSM matrices made me realize I needed to be
able to send the contents of a packet to multiple locations in a buffer
(I may need to extend it to multiple buffers). Seems to work, but I have
only the one directional light with which to test.
This improves the projection API in that near clip is a parameter rather
than being taken directly from the cvar, and a far clip (ie, finite far
plane) version is available (necessary for cascaded shadow maps as it's
rather hard to fit a box to an infinite frustum).
Also, the orthographic projection matrix is now reversed as per the
perspective matrix (and the code tidied up a little), and a version that
takes min and max vectors is available.
gcc didn't like a couple of the changes (rightly so: one was actually
incorrect), and the fix for qfcc I didn't think to suggest while working
with Emily.
The general CFLAGS etc fixes mostly required just getting the order of
operations right: check for attributes after setting the warnings flags,
though those needed some care for gcc as it began warning about main
wanting the const attribute.
Fixing the imui link errors required moving the ui functions and setup
to vulkan_lighting.c, which is really the only place they're used.
Fixing a load of issues related to autoconf and some small source-level issues to re-add clang support.
autoconf feature detection probably needs some addressing - partially as -Werror is applied late.
Lines are drawn for a light's leaf, the leafs visible to it, or those in
its efrags chain. Still no idea why lights are drawing when they
shouldn't. Deek suggest holes in the map, but I think if that was the
case, there'd be something visible. My suspicion is I'm doing something
wrong in with efrags.
This has resulted in some rather interesting information: it seems the
surfaces (and thus, presumably bounding boxes) for leafs have little to
do with the actual leaf node's volume.
I really don't know what I was thinking when I wrote that code. Maybe I
was trying for a half angle. Now the rendered "cone" matches up with a
hard-clipped cone light (soft edges stick out a bit).