I guess it's kind of UB, but it's handy for images that will be
conditionally written by the GPU but need to be in shader-read-only for
draw calls and the validation layers can't tell that the layers won't be
used.
This gets everything but the actual shadow map bindings working: the
validation layers don't like my type punning (which may well be the
right thing) and specialization constants don't help (yet, anyway) but I
want to get things into git.
Directional lights don't get correct matrices yet as I need to study the
math involved for cascaded shadow maps (and id maps don't have
directional lights).
Getting spotlights working correctly was insanely frustrating: I just
couldn't get the entities into the view of the spotlight with any
sensible combination of inverses and the z_up matrix. It turned out it
was all due to an incorrect reference vector: it was +Z instead of +X.
It turns out bsp faces are still back-face culled despite the null point
being on the front of every possible plane... or really, because it's on
the front of every possible plane: sometimes the back face is the front
face, and this breaks the face selection code (a separate traversal
function will be needed for non-culling rendering).
Despite that, other than having to deal with different pipelines,
getting the model renderers working went better than expected.
This involved rewriting the descriptor update code too, but that now
feels cleaner.
The matrices are loaded into a storage buffer as it can get quite big at
6 matrices per light (and the current max lights is 768).
The parameter will be passed on to the pipeline tasks in their task
context, allowing for communication between the subsystem calling
QFV_RunRenderPass and the pipeline tasks (for the case of lighting,
passing the current matrix base index).
They're now qfv_* and shared within the vulkan renderer. qfv_z_up cannot
be shared across renderers as they have their own ideas for the world
frame. qfv_box_rotations currently can't be shared across renderers
because if the Y-axis flip and the way it's handled, but sharing should
be achievable by modifying the other renderers to handle the sides
correctly (glsl and gl need to do lookups for the side enums, sw just
needs to be shown which way is up).
Using set iterators can be quite a lot faster for sparse sets due to the
function call overhead in testing each element. Times for ad_tears
dropped from about 1200us to about 670us (hard to say due to there being
only 3 data points and a lot of noise in the time).
If a step has process tasks, any render or compute
pipelines/renderpasses are **not** run automatically: the idea is the
process tasks need to run the relevant pipelines in a custom manner but
needs the objects to be created.
The recent light changes highlighted that the renderer does not own the
efrags (segfault in qwaq when shutting down my test scene). After
digging through the history of efrag clearing, it turns out that the
renderer never owned them, I just didn't understand the concept of
scenes at the time that I moved efrags into the renderer.
This makes a pretty significant difference: 8-25% for demo1, demo2,
demo3 and bigass1. Many thanks to Nick Alger
(https://stackoverflow.com/questions/5122228/box-to-sphere-collision).
It took me a moment to recognize that it's the Minkowski difference (and
maybe GJK?). Of course, I adapted it for simd.
This eliminates the O(N^2) (N = map leaf count) operation of finding
visible lights and will later allow for finer culling of the lights as
they can be tested against the leaf volume (which they currently are
not as this was just getting things going). However, this has severely
hurt ad_tears' performance (I suspect due to the extreme number of
leafs), but the speed seems to be very steady. Hopefully, reconstructing
the vis clusters will help (I imagine it will help in many places, not
just lights).
I guess I wasn't sure how to find all the allocated entities from within
the registry, but it turned out to be trivial. This takes care of leaked
static entities (and, in a later commit, leaked light entities, which is
how I found the problem).
The grid calculations are modified from those of Inigo Quilez
(https://iquilezles.org/articles/filterableprocedurals/), but give very
nice results: when thin enough, the lines fade out nicely instead of
producing crazy moire patterns. Though currently disabled, the default
planes are the xy, yz and zx planes with colored axes.
Based on the article
(https://developer.nvidia.com/content/depth-precision-visualized), this
should give nice precision behavior, and removes the need to worry about
large maps getting clipped. If I'm doing my math correctly, despite
being reversed, near precision is still crazy high. And (thanks to the
reversed depth) about a quarter of a unit (for near clip of 4) out at 1M
unit distance.
This seems to be more for legacy X11 (ie, without fixes etc), but
fullscreen really shouldn't affect grabbing directly (rather, it should
be up to the client whether grabbing (and thus warping) is enabled at
all.
con_linewidt starts out as 0, which leads to bad results for the initial
widths of input lines and later calculations. However, really, they
probably shouldn't be using size_t for the width, but this is a nice
quick fix.
Due to doing most of my testing using the demos, I hadn't noticed the
double-draw until flying around with the debug camera (and it showed as
a weird shimmer behind the sky layers).
Panels can be anchored to a widget in another hierarchy, allowing for
things like cascading menus. They can also be extended via referencing
them by name, allowing for subsystems to add items to an already panel
(eg, extending a menu).
This prevent the layout system from repositioning the text view and thus
breaking text-shaping. Now Tengwar Telcontar looks much more balanced in
the widgets.
The lights debug is from the light splat experiment (this is why I kept
the code), and the bsp debug is based on that. Both currently disabled
for now until I get UI controls in.
SCR_UpdateScreen_Legacy now takes only the screen functions pointer (it
didn't need camera or realtime), and the camera sizzle code has been
moved into one place to make cleaning it up easier (when I get around to
auditing AngleVectors etc).
Skipping the root view (widget) sort of made sense before windows became
separate canvases as there was only the one hierarchy, but doing so
prevented windows (panels) from fitting themselves to their children.
However, now I need to think of a good way of specifying a minimum size
for panels.
WIdgets can't possibly be active when the entire UI is hidden, and
resetting the active widget when hiding the UI helps when the state gets
broken due to widget id conflicts.
The intent is to use them for menus, tooltips and anything else along
those lines, but windows was a good starting point (and puts a border
along the top of the window too).
It's not perfect as the first N expanding children get grown by 1 pixel
regarless of weight, but it's much better than leaving a (possibly quite
large) gap at the edge of the layout.
I'm not sure this is what I want, especially in the long run, but it
does make simple windows much easier to create (and not look broken due
to being specified too small).
Canvas draw order is sorted by group then order within the group. As a
fallback, the canvas entity id is used to keep the sort stable, but
that's only as stable as the ids themselves (if the canvases are
destroyed and recreated, the ids may switch around).
This has use when the order of components in the pool affects draw order
(or has other significance), especially at the subpool level. I plan to
use it for fixing overlapping windows in imui.
Shaped text is cached using all the shaping parameters as well as the
text itself as a key. This makes text shaping a non-issue for imui when
the text is stable, taking my simple test from 120fps to 1000fps
(optimized build).
As I had long suspected, building large hierarchies is fiendishly
expensive (at least O(N^2)). However, this is because the hierarchies
are structured such that adding high-level nodes results in a lot of
copying due to the flattened (breadth-first) layout (which does make for
excellent breadth-first performance when working with a hierarchy).
Using tree mode allows adding new nodes to be O(1) (I guess O(N) for the
size of the sub-tree being added, but that's not supported yet) and
costs only an additional 8 bytes per node. Switching from flat mode to
tree mode is very cheap as only the additional tree-related indices need
to be fixed up (they're almost entirely ignored in flat mode). Switching
from tree to flat mode is a little more expensive as the entire tree
needs to be copied, but it seems to be an O(N) (size of the tree).
With this, building the style editor window went from about 25% to about
5% (and most of that is realloc!), with a 1.3% cost for switching from
tree mode to flat mode.
There's still a lot of work to do (supporting removal and tree inserts).
There's still the problem with unused variables when building for
windows because of vulkan debug stuff, but this fixes the important
errors. It actually still works (at least under wine).
By default, horizontal and vertical layouts expand to fill their parent
in their on-axis direction (horizontally for horizontal layouts), but
fit to their child views in their off-axis.
Flexible space views take advantage of auto-expansion, pushing sibling
views such that the grandparent view is filled on the parent view's
on-axis, and the parent view is filled by the space in the parent view's
off-axis. Flexible views currently have a background fill, allowing them
to provide background filling of the overall view with minimal overdraw
(ancestor views don't need to have any fill at all).
Removing a hierarchy from an entity can result in a large number of
component removes in the same pool, thus changing index of the lest
element of the pool. This *seems* to fix the memory corruption I've been
experiencing with the debug UI.