Batching shadow map rendering needs be able to reference matrices for
multiple lights in a single batch, but the only input is the view index,
so use that to look up the matrix index rather than using it to index
the matrices directly (modulo the base index that's still there).
Since switching to the 1.2 api as a requirement, might as well use the
relevant structs instead of extension struct (for multiview). Came up
when double-checking the max views property due to running into what
appears to be an nvidia bug where > 29 views (any bit pattern) cause a
segfault when creating the pipeline.
I had missed that upping max lights to 2048 meant that up to 12288
matrices are needed for all the possible lights. This made it so the
light type could not be encoded in id_data, but the shaders never used
it anyway. This leaves one bit free.
While QFV_PacketScatterBuffer works on only one destination buffer, it
turns out it's still useful for scattering to multiple buffers, just
with multiple calls. This makes it pretty easy to combine multiple
buffer updates into a single staging buffer packet, resulting in
reducing lighting's packet use from up to 7 to just one, drastically
reducing the pressure on the stating buffer packet pool, and thus
reducing the chances of QFV_PacketAcquire stalling.
This relies on my fork of tracy: https://github.com/taniwha/tracy
on the wip-c-vulkan branch. Everything is still rather flaky though.
This necessitated the jump to vulkan 1.2 as a requirement.
This allows tracy to clean up properly. However, Sys_Quit will use the
jump buffer (sys_exit_jmpbuf) only if it has been set, so the use of
Sys_setjmp is optional.
I'm still not happy with it being a compile time constant, but this
takes care of the interlock between frames in flight... for now: it's
fragile and really needs the excessive small-packet use in draw and
lighting to be cleaned up.
After discussion with Darian, I've decided to go with one big staging
buffer (with lots of packets) shared between FiF as the large size will,
in the end, be more flexible.
Host_Error and Host_EndGame use setjmp/longjmp to implement an exception
of sorts, but this messes with tracy's state even with cleanup
attributes. However, it turns out that those cleanup attributes are
exactly how gcc implements C++ destructors, and so the standard Unwind
api (part of libgcc) respects them (so long as -fexceptions is enabled
for C). Thus... replace longjmp with an implementation that uses Unwind
to unwind the stack and call the cleanup functions as needed. This is
actually important for more than just tracy as the cleanup attributed
vars can be thread locks.
Tracy is a frame profiler: https://github.com/wolfpld/tracy
This uses Tracy's C API to instrument the code (already added in several
places). It turns out there is something very weird with the fence
behavior between the staging buffers and render commands as the
inter-frame delay occurs in a very strangle place (in the draw code's
packet acquisition rather than the fence waiter that's there for that
purpose). I suspect some tangled dependencies.
This fixes the weird slug when running nq on windows. It turns out it
was the "friendly neighbor" sleep code activating due to bitrot. In
addition, there are cvars for enabling unfocused sleep (defaults off)
and disabling minimized sleep (defaults on).
The event handling changes take care of VagueLobster's segfaults on
startup for all renderers (vulkan will still be iffy depending on his
hardware: it dies on my GTX 965 M, probably due to memory and QF's
shadows). One nice side effect is it takes care of the broken CD audio
event handling (does anyone even care, though?).
They're not quite working (trail path offset is incorrect) but their
pixels are getting to the screen. Also, lifetimes are off for rocket
trails in that as soon as the entity dies, so does the trail.
This gets things *compiling* again, though it's still non-functional and
definitely wrong (don't want trail in renderer_t), but I need to think
about the design for getting trails as components. Also need to think
about integrating trails into the client effects system so trails can be
shared between renderers.
I think I had though it using a constructor init was ok, but it turns
out that was problematic. That, or I missed it in my recent audit. Fixes
a sys syserror during shutdown.
This fixes another segfault on shutdown (not sure just which recent
change caused it, but the listener pointer needed clearing) but while
fixing the listener issue, I noticed that binding and imt shutdown were
in the wrong order with respect to buttons and axes.
It turns out that initializing them via constructors led to their
shutdowns happening too late which resulted in problems with button and
axis cleanup.
By default. Conversion of quake strings needs to be requested (which is
done by nq and qw clients and servers, as well as qfprogs via an
option). I got tired of seeing mangled source code in the disassembly.
This gets only some very basics working:
* Algebra (multi-vector) types: eg @algebra(float(3,0,1)).
* Algebra scopes (using either the above or @algebra(TYPE_NAME) where
the above was used in a typedef.
* Basis blades (eg, e12) done via procedural symbols that evaluate to
suitable constants based on the basis group for the blade.
* Addition and subtraction of multi-vectors (only partially tested).
* Assignment of sub-algebra multi-vectors to full-algebra multi-vectors
(missing elements zeroed).
There's still much work to be done, but I thought it time to get
something into git.
I realized recently that I had made a huge mistake making Ruamoko's
based addressing use unsigned offsets as it makes stack-relative
addressing more awkward when it comes to runtime-determined stack frames
(eg, using alloca). This does put a bit of an extra limit on directly
addressable globals, but that's what the based addressing is meant to
help with anyway.
This seems excessive, but gmsp3v2 map has 1399 lights. Worse, it has a
lot of different light sizes that go up by small increments (generally
around 10) resulting in 33 shadow map images (1 too many). Quantizing
the sizes to 32 drops this nicely to 20, and reduces memory consumption
slightly too (image buffer overhead, I guess).
This covers only the rendering of the shadow maps (actual use still
needs to be implemented). Working with orthographic projection matrices
is surprisingly difficult, partly because creating one includes the
translations needed to get the scene into the view (and depth range),
which means care needs to be taken with the view (camera) matrix in
order to avoid double-translating depending on just how the orthographic
matrix is set up (if it's set up to focus on the origin, then the camera
matrix will need translation, otherwise the camera matrix needs to avoid
translation).