My efforts (especially the collect zone (what was I thinking)) got
tracy's knickers in a twist resulting in vanishing zones in the server.
It looks like there are some synchronisation issues between cpu and gpu,
but I'm not *too* worried about it at this stage.
The info isn't used yet, but this shows that vulkan's occlusion queries
are at least somewhat useful. However, the technique isn't perfect:
infinite radius lights (1/r and 1/r^2) are difficult to cull, and all
lights can poke through thin enough walls, and then lights containing
the camera get culled incorrectly (will need a separate test). Still, it
looks like it will help once everything is tied together.
And make it callable directly (needed to be able to submit the command
buffer separately from the main commands (though this does mess with
tracy a little).
They weren't rendering properly at all due to the matrix updates getting
overwritten by the light data (I'd forgotten to advance the packet data
pointer).
This doesn't make much of a difference on the GPU, but it drastically
cuts down CPU usage, especially for ad_tears: shadow map drawing is down
from 16.3ms to 3.7ms thanks to no having to run the alias model queues
as often.
Batching shadow map rendering needs be able to reference matrices for
multiple lights in a single batch, but the only input is the view index,
so use that to look up the matrix index rather than using it to index
the matrices directly (modulo the base index that's still there).
Actually, only 29 are used because nvidia's drivers segfault when there
are more than 29 views (regardless of the exact bit pattern in the view
mask). This will allow rendering shadow maps in large batches, which
should make for better GPU utilization.
Even that's getting pretty big, but with the quanta at 128, that's a
maximum of 8 different image sizes (which is nice for my planned
"staging image" idea).
Interestingly, this caused a reduction in memory use for some maps (but
did increase marcher's again, but not as much as the bogus rounding
did). The idea was to use sparse bindings to remap shadow map layers,
but it turns out sparse bindings are insanely slow (beyond unusable).
However, the reduction in the number of shadow map images seems to be
worth it.
Since switching to the 1.2 api as a requirement, might as well use the
relevant structs instead of extension struct (for multiview). Came up
when double-checking the max views property due to running into what
appears to be an nvidia bug where > 29 views (any bit pattern) cause a
segfault when creating the pipeline.
I had missed that upping max lights to 2048 meant that up to 12288
matrices are needed for all the possible lights. This made it so the
light type could not be encoded in id_data, but the shaders never used
it anyway. This leaves one bit free.
I'd added some developer output to see how the layers were distributed
between images and found the image widths to be... odd. It turns out I
was double-adding the shadow_quanta. Oops. Results in ~164MB less memory
used by marcher (for 32 pixel quanta).
This allows "large" updates to be done in a single staging buffer packet
instead of one packet per quad (or slice). Currently, they're batched
into groups of 64 (not really enough for conchars, but that's only at
init-time, so not all that bad). Nicely, this seems to simplify the
staging code.
Fixes#65.
When looking at a struct and seeing "count" and "size", I had to hunt to
see what "size" really meant. Cherno is very much right about size vs
count being bytes vs number of objects.
load_conchars and load_crosshairs were using create_quad directly (due
to make_static_quad having the wrong parameters), but this spread the
handling of which buffer and index where used through the code. Thus fix
make_static_quad to take the x, y offsets (like make_dyn_quad) and then
use it in load_conchars and load_crosshairs.
While QFV_PacketScatterBuffer works on only one destination buffer, it
turns out it's still useful for scattering to multiple buffers, just
with multiple calls. This makes it pretty easy to combine multiple
buffer updates into a single staging buffer packet, resulting in
reducing lighting's packet use from up to 7 to just one, drastically
reducing the pressure on the stating buffer packet pool, and thus
reducing the chances of QFV_PacketAcquire stalling.
This relies on my fork of tracy: https://github.com/taniwha/tracy
on the wip-c-vulkan branch. Everything is still rather flaky though.
This necessitated the jump to vulkan 1.2 as a requirement.
This gets the dynamic data closer to the gpu, so should make a
difference when there's a lot going on. However, for simple tests, it
made no difference.
This allows tracy to clean up properly. However, Sys_Quit will use the
jump buffer (sys_exit_jmpbuf) only if it has been set, so the use of
Sys_setjmp is optional.
I'm still not happy with it being a compile time constant, but this
takes care of the interlock between frames in flight... for now: it's
fragile and really needs the excessive small-packet use in draw and
lighting to be cleaned up.
After discussion with Darian, I've decided to go with one big staging
buffer (with lots of packets) shared between FiF as the large size will,
in the end, be more flexible.
Host_Error and Host_EndGame use setjmp/longjmp to implement an exception
of sorts, but this messes with tracy's state even with cleanup
attributes. However, it turns out that those cleanup attributes are
exactly how gcc implements C++ destructors, and so the standard Unwind
api (part of libgcc) respects them (so long as -fexceptions is enabled
for C). Thus... replace longjmp with an implementation that uses Unwind
to unwind the stack and call the cleanup functions as needed. This is
actually important for more than just tracy as the cleanup attributed
vars can be thread locks.
Tracy is a frame profiler: https://github.com/wolfpld/tracy
This uses Tracy's C API to instrument the code (already added in several
places). It turns out there is something very weird with the fence
behavior between the staging buffers and render commands as the
inter-frame delay occurs in a very strangle place (in the draw code's
packet acquisition rather than the fence waiter that's there for that
purpose). I suspect some tangled dependencies.
Mostly just macro conflicts (and a little white space in passing).
Commits for integrating tracy will come later when I've come up with a
wrapper-api that I like (so non-tracy builds are easy even with tracy
available).
This fixes the weird slug when running nq on windows. It turns out it
was the "friendly neighbor" sleep code activating due to bitrot. In
addition, there are cvars for enabling unfocused sleep (defaults off)
and disabling minimized sleep (defaults on).
A lot is broken, especially direct input, but things are working. Better
yet, it seems the X11 and Windows key bindings are at least mostly
compatible.
The event handling changes take care of VagueLobster's segfaults on
startup for all renderers (vulkan will still be iffy depending on his
hardware: it dies on my GTX 965 M, probably due to memory and QF's
shadows). One nice side effect is it takes care of the broken CD audio
event handling (does anyone even care, though?).
They're not quite working (trail path offset is incorrect) but their
pixels are getting to the screen. Also, lifetimes are off for rocket
trails in that as soon as the entity dies, so does the trail.
Although the model subsystem does this too, it does it too late relative
to the video shutdown, resulting in segfaults for glsl due to the
drivers having been unloaded.
Although the model subsystem does this too, it does it too late relative
to the video shutdown, resulting in segfaults for glsl due to the
drivers having been unloaded.
This gets things *compiling* again, though it's still non-functional and
definitely wrong (don't want trail in renderer_t), but I need to think
about the design for getting trails as components. Also need to think
about integrating trails into the client effects system so trails can be
shared between renderers.
I think I had though it using a constructor init was ok, but it turns
out that was problematic. That, or I missed it in my recent audit. Fixes
a sys syserror during shutdown.
This fixes another segfault on shutdown (not sure just which recent
change caused it, but the listener pointer needed clearing) but while
fixing the listener issue, I noticed that binding and imt shutdown were
in the wrong order with respect to buttons and axes.
Reloading progs (or shutting down) needs to clean up the buttons
otherwise the input system will have issues when it cleans up because
the buttons and axes have ceased to exist.
It turns out that initializing them via constructors led to their
shutdowns happening too late which resulted in problems with button and
axis cleanup.
sincos is just a wrapper around the GNU libc sincos. sincosh is the
equivalent for sinh and cosh, but there doesn't seem to be any such
function, so it's just the two wrapped. They both return their results
in a vec2/vec2d as (sih[h], cos[h]).
VM side of the work needed for #58. Tests are still only 4-component,
but the geometric algebra tests seem to have 2-component covered at
least a little bit.
While it could be emulated using a 3d cross-product, it was a hack and
required the use of a swizzle (or alias) to extract the scalar value.
This will make 2d PGA a little nicer when I get to modifying qfcc for it
While the progs engine itself implements the instructions correctly, the
opcode specs (and thus qfcc) treated the results as 32-bit (which was,
really, a hidden fixme, it seems).
Using swizzle for some of the geometric product operations was a bit
clunky and has problems with alignment. While I will still need it in
the future, it turns out that the extend instruction almost did what I
needed. However, I'm unsure that reversing components within an extended
vector is the way to go.
By default. Conversion of quake strings needs to be requested (which is
done by nq and qw clients and servers, as well as qfprogs via an
option). I got tired of seeing mangled source code in the disassembly.
This gets only some very basics working:
* Algebra (multi-vector) types: eg @algebra(float(3,0,1)).
* Algebra scopes (using either the above or @algebra(TYPE_NAME) where
the above was used in a typedef.
* Basis blades (eg, e12) done via procedural symbols that evaluate to
suitable constants based on the basis group for the blade.
* Addition and subtraction of multi-vectors (only partially tested).
* Assignment of sub-algebra multi-vectors to full-algebra multi-vectors
(missing elements zeroed).
There's still much work to be done, but I thought it time to get
something into git.
I realized recently that I had made a huge mistake making Ruamoko's
based addressing use unsigned offsets as it makes stack-relative
addressing more awkward when it comes to runtime-determined stack frames
(eg, using alloca). This does put a bit of an extra limit on directly
addressable globals, but that's what the based addressing is meant to
help with anyway.
I'm not sure what's up, but arm gcc thinks the array isn't properly
initialized even though x86_64 gcc does. Maybe something with padding.
At least c23 makes it easy to 0-initialize VLAs.
I'm actually surprised anything worked, though I guess it was just the
one entry getting corrupted (and not 32, but I figured allocate slots
for all of the dynamic lights just in case). Or none, really, since
larger scenes (ie, those with multiple lights that fit in the same image
size) would result in not all the maps getting used and thus one spare
for dynamic lights.
This seems excessive, but gmsp3v2 map has 1399 lights. Worse, it has a
lot of different light sizes that go up by small increments (generally
around 10) resulting in 33 shadow map images (1 too many). Quantizing
the sizes to 32 drops this nicely to 20, and reduces memory consumption
slightly too (image buffer overhead, I guess).
While the gl renderer does (or did) have it's attempt at shadows, the
others don't even try, thus the onlyshadows-marked player model doesn't
work so well (looks rather goofy seeing the arms like that).
Having more than one copy of ShadowMatrices went against my plans, and I
had trouble finding the attachments set (light_attach.h wasn't such a
good idea).
This covers only the rendering of the shadow maps (actual use still
needs to be implemented). Working with orthographic projection matrices
is surprisingly difficult, partly because creating one includes the
translations needed to get the scene into the view (and depth range),
which means care needs to be taken with the view (camera) matrix in
order to avoid double-translating depending on just how the orthographic
matrix is set up (if it's set up to focus on the origin, then the camera
matrix will need translation, otherwise the camera matrix needs to avoid
translation).
I found it rather confusing that the matrices were all backwards, and
the existing comments about being "horizontal" didn't really help all
that much. After spending some time with maxima, I was able to verify
that the comments were indeed correct, just transposed (horizontal),
with the final composition reversed to reflect that transposition.
Updating directional light CSM matrices made me realize I needed to be
able to send the contents of a packet to multiple locations in a buffer
(I may need to extend it to multiple buffers). Seems to work, but I have
only the one directional light with which to test.
This improves the projection API in that near clip is a parameter rather
than being taken directly from the cvar, and a far clip (ie, finite far
plane) version is available (necessary for cascaded shadow maps as it's
rather hard to fit a box to an infinite frustum).
Also, the orthographic projection matrix is now reversed as per the
perspective matrix (and the code tidied up a little), and a version that
takes min and max vectors is available.
gcc didn't like a couple of the changes (rightly so: one was actually
incorrect), and the fix for qfcc I didn't think to suggest while working
with Emily.
The general CFLAGS etc fixes mostly required just getting the order of
operations right: check for attributes after setting the warnings flags,
though those needed some care for gcc as it began warning about main
wanting the const attribute.
Fixing the imui link errors required moving the ui functions and setup
to vulkan_lighting.c, which is really the only place they're used.
Fixing a load of issues related to autoconf and some small source-level issues to re-add clang support.
autoconf feature detection probably needs some addressing - partially as -Werror is applied late.
Lines are drawn for a light's leaf, the leafs visible to it, or those in
its efrags chain. Still no idea why lights are drawing when they
shouldn't. Deek suggest holes in the map, but I think if that was the
case, there'd be something visible. My suspicion is I'm doing something
wrong in with efrags.
This has resulted in some rather interesting information: it seems the
surfaces (and thus, presumably bounding boxes) for leafs have little to
do with the actual leaf node's volume.
I really don't know what I was thinking when I wrote that code. Maybe I
was trying for a half angle. Now the rendered "cone" matches up with a
hard-clipped cone light (soft edges stick out a bit).
I spent way too long tracking down the easy teleporter disappearing only
to realize it might be the watervised map. After moving it out of the
way and using id's maps, it works just fine.
This takes care of rockets and lava balls casting shadows when they
shouldn't (rockets more because the shadow doesn't look that nice, lava
balls because they glow and thus shouldn't cast shadows). Same for
flames, though the small torches lost their cool sconce shadows (need to
split up the model into flame and sconce parts and mark each
separately).
This clears up the shadow acne, but does cause problems with lights
inside models. However, this can be fixed by setting the models to not
cast shadows.
The use of a static set makes Mod_LeafPVS not thread safe and also means
that the set is not usable with the set iterators after going to a
smaller map from a larger map.
Dynamic lights can't go directly on visible entities as one or the other
will fail to be queued. In addition, the number of lights on an entity
needs to be limited. For now, one general purpose light for various
effects (eg, quad damage etc) and one for the muzzle flash.
This also fixes the segfault in the previous commit.
Dynamic light shadow sizes are fixed, but can be controlled via the
dynlight_size cvar (defaults to 250).
While the insertion of dlights into the BSP might wind up being overly
expensive, the automatic management of the component pool cleans up the
various loops in the renderers.
Unfortunately, (current bug) lights on entities cause the entity to
disappear due to how the entity queue system works, and the doubled
efrag chain causes crashes when changing maps, meaning lights should be
on their own entities, not additional components on entities with
visible models.
Also, the vulkan renderer segfaults on dlights (fix incoming, along with
shadows for dlights).
The reversed depth buffer is very nice, but it also reversed the OIT
blending. Too much demo watching not enough walking around in the maps
(especially start near the episode 4 gate).
Other than the rather bad shadow acne, this is actually quake itself
working nicely. Still need to get directional lights working for
community maps, and all sorts of other little things (hide view model,
show player, fix brush backfaces, etc).
This takes care of the type punning issue by each pass using the correct
sampler type with the correct view types bound. Also, point light and
spot light shadow maps are now guaranteed to be separated (it was just
luck that they were before) and spot light maps may be significantly
smaller as their cone angle is taken into account. Lighting is quite
borked, but at least the engine is running again.
I guess it's kind of UB, but it's handy for images that will be
conditionally written by the GPU but need to be in shader-read-only for
draw calls and the validation layers can't tell that the layers won't be
used.
This gets everything but the actual shadow map bindings working: the
validation layers don't like my type punning (which may well be the
right thing) and specialization constants don't help (yet, anyway) but I
want to get things into git.
Directional lights don't get correct matrices yet as I need to study the
math involved for cascaded shadow maps (and id maps don't have
directional lights).
Getting spotlights working correctly was insanely frustrating: I just
couldn't get the entities into the view of the spotlight with any
sensible combination of inverses and the z_up matrix. It turned out it
was all due to an incorrect reference vector: it was +Z instead of +X.
It turns out bsp faces are still back-face culled despite the null point
being on the front of every possible plane... or really, because it's on
the front of every possible plane: sometimes the back face is the front
face, and this breaks the face selection code (a separate traversal
function will be needed for non-culling rendering).
Despite that, other than having to deal with different pipelines,
getting the model renderers working went better than expected.
This involved rewriting the descriptor update code too, but that now
feels cleaner.
The matrices are loaded into a storage buffer as it can get quite big at
6 matrices per light (and the current max lights is 768).
The parameter will be passed on to the pipeline tasks in their task
context, allowing for communication between the subsystem calling
QFV_RunRenderPass and the pipeline tasks (for the case of lighting,
passing the current matrix base index).
They're now qfv_* and shared within the vulkan renderer. qfv_z_up cannot
be shared across renderers as they have their own ideas for the world
frame. qfv_box_rotations currently can't be shared across renderers
because if the Y-axis flip and the way it's handled, but sharing should
be achievable by modifying the other renderers to handle the sides
correctly (glsl and gl need to do lookups for the side enums, sw just
needs to be shown which way is up).
Using set iterators can be quite a lot faster for sparse sets due to the
function call overhead in testing each element. Times for ad_tears
dropped from about 1200us to about 670us (hard to say due to there being
only 3 data points and a lot of noise in the time).
If a step has process tasks, any render or compute
pipelines/renderpasses are **not** run automatically: the idea is the
process tasks need to run the relevant pipelines in a custom manner but
needs the objects to be created.
The recent light changes highlighted that the renderer does not own the
efrags (segfault in qwaq when shutting down my test scene). After
digging through the history of efrag clearing, it turns out that the
renderer never owned them, I just didn't understand the concept of
scenes at the time that I moved efrags into the renderer.
This makes a pretty significant difference: 8-25% for demo1, demo2,
demo3 and bigass1. Many thanks to Nick Alger
(https://stackoverflow.com/questions/5122228/box-to-sphere-collision).
It took me a moment to recognize that it's the Minkowski difference (and
maybe GJK?). Of course, I adapted it for simd.
This eliminates the O(N^2) (N = map leaf count) operation of finding
visible lights and will later allow for finer culling of the lights as
they can be tested against the leaf volume (which they currently are
not as this was just getting things going). However, this has severely
hurt ad_tears' performance (I suspect due to the extreme number of
leafs), but the speed seems to be very steady. Hopefully, reconstructing
the vis clusters will help (I imagine it will help in many places, not
just lights).
I guess I wasn't sure how to find all the allocated entities from within
the registry, but it turned out to be trivial. This takes care of leaked
static entities (and, in a later commit, leaked light entities, which is
how I found the problem).
The grid calculations are modified from those of Inigo Quilez
(https://iquilezles.org/articles/filterableprocedurals/), but give very
nice results: when thin enough, the lines fade out nicely instead of
producing crazy moire patterns. Though currently disabled, the default
planes are the xy, yz and zx planes with colored axes.
Based on the article
(https://developer.nvidia.com/content/depth-precision-visualized), this
should give nice precision behavior, and removes the need to worry about
large maps getting clipped. If I'm doing my math correctly, despite
being reversed, near precision is still crazy high. And (thanks to the
reversed depth) about a quarter of a unit (for near clip of 4) out at 1M
unit distance.
This seems to be more for legacy X11 (ie, without fixes etc), but
fullscreen really shouldn't affect grabbing directly (rather, it should
be up to the client whether grabbing (and thus warping) is enabled at
all.
con_linewidt starts out as 0, which leads to bad results for the initial
widths of input lines and later calculations. However, really, they
probably shouldn't be using size_t for the width, but this is a nice
quick fix.
Due to doing most of my testing using the demos, I hadn't noticed the
double-draw until flying around with the debug camera (and it showed as
a weird shimmer behind the sky layers).
Panels can be anchored to a widget in another hierarchy, allowing for
things like cascading menus. They can also be extended via referencing
them by name, allowing for subsystems to add items to an already panel
(eg, extending a menu).
This prevent the layout system from repositioning the text view and thus
breaking text-shaping. Now Tengwar Telcontar looks much more balanced in
the widgets.
The lights debug is from the light splat experiment (this is why I kept
the code), and the bsp debug is based on that. Both currently disabled
for now until I get UI controls in.