While the progs engine itself implements the instructions correctly, the
opcode specs (and thus qfcc) treated the results as 32-bit (which was,
really, a hidden fixme, it seems).
Using swizzle for some of the geometric product operations was a bit
clunky and has problems with alignment. While I will still need it in
the future, it turns out that the extend instruction almost did what I
needed. However, I'm unsure that reversing components within an extended
vector is the way to go.
By default. Conversion of quake strings needs to be requested (which is
done by nq and qw clients and servers, as well as qfprogs via an
option). I got tired of seeing mangled source code in the disassembly.
This gets only some very basics working:
* Algebra (multi-vector) types: eg @algebra(float(3,0,1)).
* Algebra scopes (using either the above or @algebra(TYPE_NAME) where
the above was used in a typedef.
* Basis blades (eg, e12) done via procedural symbols that evaluate to
suitable constants based on the basis group for the blade.
* Addition and subtraction of multi-vectors (only partially tested).
* Assignment of sub-algebra multi-vectors to full-algebra multi-vectors
(missing elements zeroed).
There's still much work to be done, but I thought it time to get
something into git.
I realized recently that I had made a huge mistake making Ruamoko's
based addressing use unsigned offsets as it makes stack-relative
addressing more awkward when it comes to runtime-determined stack frames
(eg, using alloca). This does put a bit of an extra limit on directly
addressable globals, but that's what the based addressing is meant to
help with anyway.
I'm not sure what's up, but arm gcc thinks the array isn't properly
initialized even though x86_64 gcc does. Maybe something with padding.
At least c23 makes it easy to 0-initialize VLAs.
I'm actually surprised anything worked, though I guess it was just the
one entry getting corrupted (and not 32, but I figured allocate slots
for all of the dynamic lights just in case). Or none, really, since
larger scenes (ie, those with multiple lights that fit in the same image
size) would result in not all the maps getting used and thus one spare
for dynamic lights.
This seems excessive, but gmsp3v2 map has 1399 lights. Worse, it has a
lot of different light sizes that go up by small increments (generally
around 10) resulting in 33 shadow map images (1 too many). Quantizing
the sizes to 32 drops this nicely to 20, and reduces memory consumption
slightly too (image buffer overhead, I guess).
While the gl renderer does (or did) have it's attempt at shadows, the
others don't even try, thus the onlyshadows-marked player model doesn't
work so well (looks rather goofy seeing the arms like that).
Having more than one copy of ShadowMatrices went against my plans, and I
had trouble finding the attachments set (light_attach.h wasn't such a
good idea).
This covers only the rendering of the shadow maps (actual use still
needs to be implemented). Working with orthographic projection matrices
is surprisingly difficult, partly because creating one includes the
translations needed to get the scene into the view (and depth range),
which means care needs to be taken with the view (camera) matrix in
order to avoid double-translating depending on just how the orthographic
matrix is set up (if it's set up to focus on the origin, then the camera
matrix will need translation, otherwise the camera matrix needs to avoid
translation).
I found it rather confusing that the matrices were all backwards, and
the existing comments about being "horizontal" didn't really help all
that much. After spending some time with maxima, I was able to verify
that the comments were indeed correct, just transposed (horizontal),
with the final composition reversed to reflect that transposition.
Updating directional light CSM matrices made me realize I needed to be
able to send the contents of a packet to multiple locations in a buffer
(I may need to extend it to multiple buffers). Seems to work, but I have
only the one directional light with which to test.
This improves the projection API in that near clip is a parameter rather
than being taken directly from the cvar, and a far clip (ie, finite far
plane) version is available (necessary for cascaded shadow maps as it's
rather hard to fit a box to an infinite frustum).
Also, the orthographic projection matrix is now reversed as per the
perspective matrix (and the code tidied up a little), and a version that
takes min and max vectors is available.
gcc didn't like a couple of the changes (rightly so: one was actually
incorrect), and the fix for qfcc I didn't think to suggest while working
with Emily.
The general CFLAGS etc fixes mostly required just getting the order of
operations right: check for attributes after setting the warnings flags,
though those needed some care for gcc as it began warning about main
wanting the const attribute.
Fixing the imui link errors required moving the ui functions and setup
to vulkan_lighting.c, which is really the only place they're used.
Fixing a load of issues related to autoconf and some small source-level issues to re-add clang support.
autoconf feature detection probably needs some addressing - partially as -Werror is applied late.
Lines are drawn for a light's leaf, the leafs visible to it, or those in
its efrags chain. Still no idea why lights are drawing when they
shouldn't. Deek suggest holes in the map, but I think if that was the
case, there'd be something visible. My suspicion is I'm doing something
wrong in with efrags.
This has resulted in some rather interesting information: it seems the
surfaces (and thus, presumably bounding boxes) for leafs have little to
do with the actual leaf node's volume.
I really don't know what I was thinking when I wrote that code. Maybe I
was trying for a half angle. Now the rendered "cone" matches up with a
hard-clipped cone light (soft edges stick out a bit).
I spent way too long tracking down the easy teleporter disappearing only
to realize it might be the watervised map. After moving it out of the
way and using id's maps, it works just fine.
This takes care of rockets and lava balls casting shadows when they
shouldn't (rockets more because the shadow doesn't look that nice, lava
balls because they glow and thus shouldn't cast shadows). Same for
flames, though the small torches lost their cool sconce shadows (need to
split up the model into flame and sconce parts and mark each
separately).
This clears up the shadow acne, but does cause problems with lights
inside models. However, this can be fixed by setting the models to not
cast shadows.
The use of a static set makes Mod_LeafPVS not thread safe and also means
that the set is not usable with the set iterators after going to a
smaller map from a larger map.
Dynamic lights can't go directly on visible entities as one or the other
will fail to be queued. In addition, the number of lights on an entity
needs to be limited. For now, one general purpose light for various
effects (eg, quad damage etc) and one for the muzzle flash.
This also fixes the segfault in the previous commit.
Dynamic light shadow sizes are fixed, but can be controlled via the
dynlight_size cvar (defaults to 250).
While the insertion of dlights into the BSP might wind up being overly
expensive, the automatic management of the component pool cleans up the
various loops in the renderers.
Unfortunately, (current bug) lights on entities cause the entity to
disappear due to how the entity queue system works, and the doubled
efrag chain causes crashes when changing maps, meaning lights should be
on their own entities, not additional components on entities with
visible models.
Also, the vulkan renderer segfaults on dlights (fix incoming, along with
shadows for dlights).
The reversed depth buffer is very nice, but it also reversed the OIT
blending. Too much demo watching not enough walking around in the maps
(especially start near the episode 4 gate).
Other than the rather bad shadow acne, this is actually quake itself
working nicely. Still need to get directional lights working for
community maps, and all sorts of other little things (hide view model,
show player, fix brush backfaces, etc).
This takes care of the type punning issue by each pass using the correct
sampler type with the correct view types bound. Also, point light and
spot light shadow maps are now guaranteed to be separated (it was just
luck that they were before) and spot light maps may be significantly
smaller as their cone angle is taken into account. Lighting is quite
borked, but at least the engine is running again.
I guess it's kind of UB, but it's handy for images that will be
conditionally written by the GPU but need to be in shader-read-only for
draw calls and the validation layers can't tell that the layers won't be
used.
This gets everything but the actual shadow map bindings working: the
validation layers don't like my type punning (which may well be the
right thing) and specialization constants don't help (yet, anyway) but I
want to get things into git.
Directional lights don't get correct matrices yet as I need to study the
math involved for cascaded shadow maps (and id maps don't have
directional lights).
Getting spotlights working correctly was insanely frustrating: I just
couldn't get the entities into the view of the spotlight with any
sensible combination of inverses and the z_up matrix. It turned out it
was all due to an incorrect reference vector: it was +Z instead of +X.
It turns out bsp faces are still back-face culled despite the null point
being on the front of every possible plane... or really, because it's on
the front of every possible plane: sometimes the back face is the front
face, and this breaks the face selection code (a separate traversal
function will be needed for non-culling rendering).
Despite that, other than having to deal with different pipelines,
getting the model renderers working went better than expected.
This involved rewriting the descriptor update code too, but that now
feels cleaner.
The matrices are loaded into a storage buffer as it can get quite big at
6 matrices per light (and the current max lights is 768).
The parameter will be passed on to the pipeline tasks in their task
context, allowing for communication between the subsystem calling
QFV_RunRenderPass and the pipeline tasks (for the case of lighting,
passing the current matrix base index).
They're now qfv_* and shared within the vulkan renderer. qfv_z_up cannot
be shared across renderers as they have their own ideas for the world
frame. qfv_box_rotations currently can't be shared across renderers
because if the Y-axis flip and the way it's handled, but sharing should
be achievable by modifying the other renderers to handle the sides
correctly (glsl and gl need to do lookups for the side enums, sw just
needs to be shown which way is up).
Using set iterators can be quite a lot faster for sparse sets due to the
function call overhead in testing each element. Times for ad_tears
dropped from about 1200us to about 670us (hard to say due to there being
only 3 data points and a lot of noise in the time).
If a step has process tasks, any render or compute
pipelines/renderpasses are **not** run automatically: the idea is the
process tasks need to run the relevant pipelines in a custom manner but
needs the objects to be created.
The recent light changes highlighted that the renderer does not own the
efrags (segfault in qwaq when shutting down my test scene). After
digging through the history of efrag clearing, it turns out that the
renderer never owned them, I just didn't understand the concept of
scenes at the time that I moved efrags into the renderer.
This makes a pretty significant difference: 8-25% for demo1, demo2,
demo3 and bigass1. Many thanks to Nick Alger
(https://stackoverflow.com/questions/5122228/box-to-sphere-collision).
It took me a moment to recognize that it's the Minkowski difference (and
maybe GJK?). Of course, I adapted it for simd.
This eliminates the O(N^2) (N = map leaf count) operation of finding
visible lights and will later allow for finer culling of the lights as
they can be tested against the leaf volume (which they currently are
not as this was just getting things going). However, this has severely
hurt ad_tears' performance (I suspect due to the extreme number of
leafs), but the speed seems to be very steady. Hopefully, reconstructing
the vis clusters will help (I imagine it will help in many places, not
just lights).
I guess I wasn't sure how to find all the allocated entities from within
the registry, but it turned out to be trivial. This takes care of leaked
static entities (and, in a later commit, leaked light entities, which is
how I found the problem).
The grid calculations are modified from those of Inigo Quilez
(https://iquilezles.org/articles/filterableprocedurals/), but give very
nice results: when thin enough, the lines fade out nicely instead of
producing crazy moire patterns. Though currently disabled, the default
planes are the xy, yz and zx planes with colored axes.
Based on the article
(https://developer.nvidia.com/content/depth-precision-visualized), this
should give nice precision behavior, and removes the need to worry about
large maps getting clipped. If I'm doing my math correctly, despite
being reversed, near precision is still crazy high. And (thanks to the
reversed depth) about a quarter of a unit (for near clip of 4) out at 1M
unit distance.
This seems to be more for legacy X11 (ie, without fixes etc), but
fullscreen really shouldn't affect grabbing directly (rather, it should
be up to the client whether grabbing (and thus warping) is enabled at
all.
con_linewidt starts out as 0, which leads to bad results for the initial
widths of input lines and later calculations. However, really, they
probably shouldn't be using size_t for the width, but this is a nice
quick fix.
Due to doing most of my testing using the demos, I hadn't noticed the
double-draw until flying around with the debug camera (and it showed as
a weird shimmer behind the sky layers).
Panels can be anchored to a widget in another hierarchy, allowing for
things like cascading menus. They can also be extended via referencing
them by name, allowing for subsystems to add items to an already panel
(eg, extending a menu).
This prevent the layout system from repositioning the text view and thus
breaking text-shaping. Now Tengwar Telcontar looks much more balanced in
the widgets.
The lights debug is from the light splat experiment (this is why I kept
the code), and the bsp debug is based on that. Both currently disabled
for now until I get UI controls in.
SCR_UpdateScreen_Legacy now takes only the screen functions pointer (it
didn't need camera or realtime), and the camera sizzle code has been
moved into one place to make cleaning it up easier (when I get around to
auditing AngleVectors etc).
Skipping the root view (widget) sort of made sense before windows became
separate canvases as there was only the one hierarchy, but doing so
prevented windows (panels) from fitting themselves to their children.
However, now I need to think of a good way of specifying a minimum size
for panels.
WIdgets can't possibly be active when the entire UI is hidden, and
resetting the active widget when hiding the UI helps when the state gets
broken due to widget id conflicts.
The intent is to use them for menus, tooltips and anything else along
those lines, but windows was a good starting point (and puts a border
along the top of the window too).
It's not perfect as the first N expanding children get grown by 1 pixel
regarless of weight, but it's much better than leaving a (possibly quite
large) gap at the edge of the layout.
I'm not sure this is what I want, especially in the long run, but it
does make simple windows much easier to create (and not look broken due
to being specified too small).
Canvas draw order is sorted by group then order within the group. As a
fallback, the canvas entity id is used to keep the sort stable, but
that's only as stable as the ids themselves (if the canvases are
destroyed and recreated, the ids may switch around).
This has use when the order of components in the pool affects draw order
(or has other significance), especially at the subpool level. I plan to
use it for fixing overlapping windows in imui.
Shaped text is cached using all the shaping parameters as well as the
text itself as a key. This makes text shaping a non-issue for imui when
the text is stable, taking my simple test from 120fps to 1000fps
(optimized build).
As I had long suspected, building large hierarchies is fiendishly
expensive (at least O(N^2)). However, this is because the hierarchies
are structured such that adding high-level nodes results in a lot of
copying due to the flattened (breadth-first) layout (which does make for
excellent breadth-first performance when working with a hierarchy).
Using tree mode allows adding new nodes to be O(1) (I guess O(N) for the
size of the sub-tree being added, but that's not supported yet) and
costs only an additional 8 bytes per node. Switching from flat mode to
tree mode is very cheap as only the additional tree-related indices need
to be fixed up (they're almost entirely ignored in flat mode). Switching
from tree to flat mode is a little more expensive as the entire tree
needs to be copied, but it seems to be an O(N) (size of the tree).
With this, building the style editor window went from about 25% to about
5% (and most of that is realloc!), with a 1.3% cost for switching from
tree mode to flat mode.
There's still a lot of work to do (supporting removal and tree inserts).
There's still the problem with unused variables when building for
windows because of vulkan debug stuff, but this fixes the important
errors. It actually still works (at least under wine).
By default, horizontal and vertical layouts expand to fill their parent
in their on-axis direction (horizontally for horizontal layouts), but
fit to their child views in their off-axis.
Flexible space views take advantage of auto-expansion, pushing sibling
views such that the grandparent view is filled on the parent view's
on-axis, and the parent view is filled by the space in the parent view's
off-axis. Flexible views currently have a background fill, allowing them
to provide background filling of the overall view with minimal overdraw
(ancestor views don't need to have any fill at all).
Removing a hierarchy from an entity can result in a large number of
component removes in the same pool, thus changing index of the lest
element of the pool. This *seems* to fix the memory corruption I've been
experiencing with the debug UI.
The biggest change was splitting up the job resources into
per-render-pass resources, allowing individual render passes to
reallocate their resources without affecting any others. After that, it
was just getting translucency and capture working after a window resize.
I had forgotten that Hash_NewTable checked the hashctx parameter, so
calling Hash_NewTable in the struct initializer meant the hasctx was
uninitialized.
This takes care of element order stability. It did need reworking the
mouse tracking code (including adding an active flag to views), but now
buttons seem to work correctly.
It looks horrible due to the lack of lighting etc, but it's good enough
for basic testing, especially of my render job design (that passed with
flying colors).
It's there for a reason :P. Fixes most of the really bad behavior after
disabling some widgets (re-layout isn't working at all, though, and
adding the widgets back again puts them in the wrong place).
Using label + key_offset in both imui_state_getkey and the call to
Hash_Find greatly simplifies the logic of using the correct key. Fixes
an ever-growing set of buttons when using separators (hmm, I think this
means pruning isn't working correctly).
TextContent seems redundant at this stage since a text view is always
sized to its content, and PercentOfParent doesn't work yet. Pixels
definitely works and Null seems to work in that it does no sizing or
positioning. Vertical layout is supported but not yet tested, similar
for ChildrenSum, but I can have two buttons side by side.
Text_StringView sets the view bounds to the bounds of the glyphs in the
view. This has its advantages, but is not suitable for automatic UI
sizing. However, the view is positioned correctly, so nothing needs to
be done there. Now full height glyphs (eg, C) look balanced in the view,
and glyphs with descenders (g) brush the bottom of the view.
It does almost nothing (just puts a non-function button on the screen),
but it will help develop the IMUI code and, of course, come to help with
debugging in general.
Both passage and simple text are supported, but only simple text has
been tested at this stage. However, as passage text was taken directly
from rua_gui.c and formed the basis for simple text rendering, I expect
it's at least close to working.
The same underlying mechanism is used for both simple text strings and
passages, but without the intervening hierarchy of paragraphs etc.
Results in only the one view for a simple text string.
I'm not sure I like fontconfig (docs are...), but it is pretty standard,
and I was able to find some reasonable examples on stackexchange
(https://stackoverflow.com/questions/10542832/how-to-use-fontconfig-to-get-font-list-c-c).
Currently, only the one font is handled, no font sets for fall backs
etc. It's meant for the debug UI I'm working on, so that shouldn't be a
big deal.
It's usually desirable to hide the cursor when playing quake, but when
using the console, or in various other states, being able to see the
cursor can be quite important.
It's currently very simplistic (visible, not visible), but it gets
things started for making QF more usable in a windowed environment (not
having a visible cursor was fine in DOS, or when full screen, but not
when windowed (and not actively playing).
This let me keep clearValue's simple default rgba float interpretation,
but also have full control over access to the float32, int32 and uint32
fields.
This is necessary because fisheye rendering draws the scene up to 6
times per frame, which results in many of the limits being hit
prematurely, but updating r_framecount that often breaks dynamic lights.
Really? More to clean up before (vulkan) bsp rendering is thread-safe?
However, R_MarkLeaves was pretty close: just oldviewleaf and
visframecount, but that's still too much. Also, the reliance on
r_refdef.worldmodel irked me.
While there will be some GPU resources to sort out for multi-pass bsp
processing, I think this is the last piece required before shadow passes
can be implemented.
They were an interesting idea and might be useful in the future, but
they don't work as well as I had hoped for quake's maps due to the
overlapping light volumes causing contention while doing the additive
blends in the frame buffer. The cause was made obvious when testing in
the marcher map: most of its over 400 lights have infinite radius thus
require full screen passes: all those passes fighting for the frame
buffer did very nasty things to performance. However, light splats might be
useful for many small, non-overlapping light volumes, thus the code is
being kept (and I like the cleanups that came with it).
Move things around a bit so I can restore the previous behavior of doing
all lights in a single full screen pass but keep the code improvements
from trying to do splatted lighting.
The old system used just "views", but I had at some time decided that I
might want to support specifying buffers and buffer views, but forgot to
change the name in vkparse.c.
While hash tables are useful for large symbol tables, the bool "enum" is
too small to justify one and even bsearch is too expensive (also,
bsearch requires knowing the number of elements, which is a bit of a
hassle currently).
Samplers have no direct relation to render passes or pipelines, so
should not necessarily be in the same config file. This makes all the
old config files obsolete, and quite a bit of support code in vkparse.c.
This gets screenshots working again. As the implementation is now a
(trivial) state machine, the pause when grabbing a screenshot is
significantly reduced (it can be reduced even further by doing the png
compression in a separate thread).
The new system seems to work quite nicely with brush models, which was
the intent, but it's nice to see. Hopefully, it works well when it comes
to shadows. There's still water warp and screen shots to fix, and
fisheye to get working, as well.
Gotta be sure :)
With the new system mostly up and running (just bsp rendering and
descriptor sets/layout handling to go, and they're independent of the
old render pass system), the old system can finally be cleared out.
This fixes the insta-death of particles. Interestingly, other than
particles (due to the ring of buffers not being used correctly),
everything else worked nicely, so I guess 1-frame rendering got tested.
The particles die instantly due to curFrame not updating (next commit),
but otherwise work nicely, especially sync is better (many thanks to
Darian for his help with understanding sync scope).
bsp_draw_queue isn't the right place, but it's just place-holder code to
help get the rest of the renderers up and running before I tackle bsp
rendering. Fixes the segfault in demo1 when the zombies get gibbed,
resulting in zombie entities.
This was necessary to get the 2d elements drawn after the fence had been
fired (thus indicating descriptors could be updated) but before actual
rendering of the 2d elements (which is how it was done before the switch
to the new system).
It turns out there was a bug in the old iqm push constants spec (I still
need to figure out how to use layouts in the new system so I can
completely delete the old).
The output system's update_input takes a parameter specifying the render
step from which it is to get the output view of that step and updates
its descriptors as necessary.
With this, the full render job is working for alias models (minus a few
glitches).
When creating a new command buffer and appending it to a queue, the
active buffer count needs to be incremented too otherwise the new
command buffer will be accidentally reused prematurely. Not noticed
earlier because only one buffer was being created.
Many thanks to Peter and Darian for clearing up my misunderstanding of
how vkResetCommandPool works. The manager creates command buffers from
the command pool on an as-needed basis (when the queue of available
buffers is empty), and keeps track of those buffers in a queue. When the
pool is reset, the queues (one each for primary and secondary command
buffers) are reset such that the tracked buffers are available again.
Imageless framebuffers would probably be easier and cleaner, but this
takes care of the validation error attempting to present the second
frame (because rendering was being done to the first frame's swapchain
image instead of the second frame's).
Command buffer pools can't be reset until the commands have all been
executed. Having per-frame pools makes keeping track of pool lifetime
fairly easy.
Interleaving Vulkan objects with stucts containing vec4f_t results in
the vectors becoming unaligned when there is an odd number of objects in
a set, thus producing a segfault. Putting all the structs first prevents
any such issue.
The new render system now passes validation for the first frame (but
no drawing is done by the various subsystems yet). Something is wrong
with how swap chain semaphores are handled thus the second frame fails.
Frame buffer attachments can now be defined externally, with
"$swapchain" supported for now (in which case, the swap chain defines
the size of the frame buffer).
Also, render pass render areas and pipeline viewport and scissor rects
are updated when necessary.
I don't like the current name (update_framebuffer), but if the
referenced render pass doesn't have a framebuffer, one is created. The
renderpass is referenced via the active renderpass of the named render
step. Unfortunately, this has uncovered a bug in the setup of renderpass
objects: main.deferred has output's renderpass, and main.deferred_cube
and output have bogus renderpass objects.