I think has been one of the biggest roadblocks to breaking free of
quake, so having dual render paths and thus the different new scene load
sequence has proven to be unexpected helpful. There's a lot more to be
done to make the render graph actually usable by anyone but me, but just
making scene load configurable frees up a lot. I think there needs to be
renderer startup/shutdown configuration too, but this seems to be enough
for now.
Other than decoupled lightmap support, I think that has the vulkan
forward renderer feature complete (though a little buggy with its
lightmap updates and fisheye gets validation errors).
The lightmaps aren't updated at all yet, so everything is static.
Figuring out how lightmap data gets to the gpu was a chore thanks to the
spaghetti in the bsp data, and then I'd forgotten that I was
pre-expanding the light data to rgb so wound up with weird lightmaps,
but without water or particles, demo1 is getting 5000fps at 800x450, and
it seems to be CPU limited.
Some of them were actual leaks, but tracking memory should be a lot
easier now. However, there's a lot of room for optimization of
allocations (eg, recylcling of hierarchies. There is now 1 active
allocation (according to tracy) when nq exits: Qgetline's string buffer
(I think an api change is in order).
The main goal of this change was to make it easier to tell when a
hierarchy has been deleted, but as a side benefit, it got rid of the use
of PR_RESMAP. Also, it's easy to track the number of hierarchies.
Unfortunately, it showed how brittle the component side of the ECS is
(scene and canvas registries assumed their components were the first (no
long the case), thus the sweeping changes).
Centerprint doesn't work (but it hasn't for a while).
Since switching to the 1.2 api as a requirement, might as well use the
relevant structs instead of extension struct (for multiview). Came up
when double-checking the max views property due to running into what
appears to be an nvidia bug where > 29 views (any bit pattern) cause a
segfault when creating the pipeline.
Tracy is a frame profiler: https://github.com/wolfpld/tracy
This uses Tracy's C API to instrument the code (already added in several
places). It turns out there is something very weird with the fence
behavior between the staging buffers and render commands as the
inter-frame delay occurs in a very strangle place (in the draw code's
packet acquisition rather than the fence waiter that's there for that
purpose). I suspect some tangled dependencies.
This has resulted in some rather interesting information: it seems the
surfaces (and thus, presumably bounding boxes) for leafs have little to
do with the actual leaf node's volume.
It turns out bsp faces are still back-face culled despite the null point
being on the front of every possible plane... or really, because it's on
the front of every possible plane: sometimes the back face is the front
face, and this breaks the face selection code (a separate traversal
function will be needed for non-culling rendering).
Despite that, other than having to deal with different pipelines,
getting the model renderers working went better than expected.
Really? More to clean up before (vulkan) bsp rendering is thread-safe?
However, R_MarkLeaves was pretty close: just oldviewleaf and
visframecount, but that's still too much. Also, the reliance on
r_refdef.worldmodel irked me.
While there will be some GPU resources to sort out for multi-pass bsp
processing, I think this is the last piece required before shadow passes
can be implemented.
Samplers have no direct relation to render passes or pipelines, so
should not necessarily be in the same config file. This makes all the
old config files obsolete, and quite a bit of support code in vkparse.c.
The new system seems to work quite nicely with brush models, which was
the intent, but it's nice to see. Hopefully, it works well when it comes
to shadows. There's still water warp and screen shots to fix, and
fisheye to get working, as well.
Gotta be sure :)
With the new system mostly up and running (just bsp rendering and
descriptor sets/layout handling to go, and they're independent of the
old render pass system), the old system can finally be cleared out.
bsp_draw_queue isn't the right place, but it's just place-holder code to
help get the rest of the renderers up and running before I tackle bsp
rendering. Fixes the segfault in demo1 when the zombies get gibbed,
resulting in zombie entities.
They're currently just stubs, but this gets the render info loading
working without any errors. The next step is to connect up pipelines and
create the image resources, then implementing the task functions will
have meaning.
I suspect this is a hold-over from before the bsp thread safety changes,
but with the nicely separated queues, it's easy to pass the sky surfaces
through the depth pass as well as the translucency pass (I think the
reason for that is lighting). This prevents bits of world being seen
through sky surfaces when the sky isn't fully opaque (like skysheet due
to the shortcuts in the shader).
It's a bit flaky for particles, especially at higher frame rates, but
that's due to supporting only 64 overlapping pixels. A reasonable
solution is probably switching to a priority heap for the "sort" and
upping the limit.
Now each (high level) render pass can have its own frame buffer. The
current goal is to get the final output render pass to just transfer the
composed output to the swap chain image, potentially with scaling (my
laptop might be able to cope).
While the HUD and status bar don't cut out a lot of screen (normally),
they might start to make a difference when I get transparency working
properly. The main thing is this is a step towards pulling the 2d
rendering into another render pass so the main deferred pass is
world-only.
While the libraries are probably getting a little out of hand, the
separation into its own directory is probably a good thing as an ECS
should not be tied to scenes. This should make the ECS more generally
useful.
Since entity_t has a pointer to the registry owning the entity, there's
no need to access a global to get at the registry. Also move component
getting closer to where it's used.
This puts the hierarchy (transform) reference, animation, visibility,
renderer, active, and old_origin data in separate components. There are
a few bugs (crashes on grenade explosions in gl/glsl/vulkan, immediately
in sw, reasons known, missing brush models in vulkan).
While quake doesn't really need an ECS, the direction I want to take QF
does, and it does seem to have improved memory bandwidth a little
(uncertain). However, there's a lot more work to go (especially fixing
the above bugs), but this seems to be a good start.
This did involve changing some field names and a little bit of cleanup,
but I've got a better handle on what's going on (I think I was in one of
those coding trances where I quickly forget how things work).
This makes bsp traversal even more re-entrant (needed for shadows).
Everything needed for a "pass" is taken from bsp_pass_t (or indirectly
through bspctx_t (though that too might need some revising)).
The software renderer uses Bresenham's line slice algorithm as presented
by Michael Abrash in his Graphics Programming Black Book Special Edition
with the serial numbers filed off (as such, more just so *I* can read
the code easily), along with the Chen-Sutherland line clipping
algorithm. The other renderers were more or less trivial in comparison.
Surfaces marked with SURF_DRAWALPHA but not SURF_DRAWTURB are put in a
separate queue for the water shader and run with a turb scale of 0.
Also, entities with colormod alpha < 1 are marked to go in the same
queue as SURF_DRAWALPHA surfaces (ie, no SURF_DRAWTURB unless the
model's texture indicated such).
The texture animation data is compacted into a small struct for each
texture, resulting in much less data access when animating the texture.
More importantly, no looping over the list of frames. I plan on
migrating this to at least the other hardware renderers.