This does mean that the gl and sw renderers can no longer call
S_ExtraUpdate, but really, they shouldn't be anyway. And I seem to
remember it not really helping (been way too long since quake ran that
slowly for me).
sfx_t is now private, and cd_file no longer accesses channel_t's
internals. This is necessary for hiding the code needed to make mixing
and channel management *properly* lock-free (I've been getting away with
murder thanks to x86's strong memory model and just plain luck with
gcc).
The tests fail as they exercise how the cache *SHOULD* work rather than
how it does now.
The tests do currently pass for the pending work I've done on the cache
system, but while working on it, I remembered why I reworked cache
allocation...
The essential problem is that sounds are loaded into the cache, which is
fine for synchronous output targets, but has proven to be a minefield
for asynchronous output targets (JACK, ALSA).
The reason for the minefield is the hunk takes priority over the cache,
and is free to move cache blocks around, and *even dispose of them
entirely* in order to satisfy memory allocations from either end of the
hunk. Doing this in an entirely single-threaded process (as DOS Quake
was) is perfectly safe, as the users of the cache just reload the
pointer each time, and bail if it's null (meaning the block has been
freed), or even cause the data to be reloaded if possible (I'm a little
fuzzy on the details for that as I didn't write that code). However, in
multi-threaded code, especially real-time (JACK, possibly ALSA), it's a
recipe for disaster. The 4cab5b90e6 commit was a (mostly) successful
attempt to mitigate the problem by allocating the cache blocks from the
high-hunk (thus minimizing any movement caused by low-hunk allocations),
it resulted in cache allocates and regular high-hunk allocations somehow
getting intertwined: while investigating just how much memory ad_tears
needs (somewhere between 192MB and 256MB), I got "trashed sentinel"
errors and upon investigation, I found what looks very suspiciously like
audio data written across a hunk control block.
I've decided that the cache allocation *algorithm* should be reverted to
how it was originally designed by Id (details will remain "modern"), but
while working on the tests, I remembered why I had done the changes in
the first place (above story). Thus the work on reverting the cache
allocation can't go in until I get sound memory management independent
of the cache. The tests are going in now so I have a constant reminder :)
And make Sys_MaskPrintf take the developer enum rather than just a raw
int.
It was actually getting some nasty hunk corruption errors when under
memory pressure that made it clear the sound system needs some work.
I had been trimming for the solid leaf, but not the empty leafs. I had
assumed the vis tool would trim the bits, but it seems to not be
reliable (though it could be a bug in qfvis, I think the map in question
is one of my test maps).
The texture animation data is compacted into a small struct for each
texture, resulting in much less data access when animating the texture.
More importantly, no looping over the list of frames. I plan on
migrating this to at least the other hardware renderers.
I found a test map with no texture data. Even after fixing the bsp
loader, vulkan didn't like it. Now vulkan is happy. The "Missing" text
is full-bright magenta on a dim grey background so it should be visible
in any lighting conditions.
Conflagrant Rodent has a sub-model with 0 faces (double bit error?)
causing simply counting faces to get out of sync with actual model
starts thus breaking *all* brush models that come after it (including
other maps). Thus be a little less lazy in figuring out model start
faces.
The models are broken up into N sub-(sub-)models, one for each texture,
but all faces using the same texture are drawn as an instance, making
for both reduced draw calls and reduced index buffer use (and thus,
hopefully, reduced bandwidth). While texture animations are broken, this
does mark a significant milestone towards implementing shadows as it
should now be possible to use multiple threads (with multiple index and
entid buffers) to render the depth buffers for all the lights.
This allows the use of an entity id to index into the entity data and
fetch the transform and colormod data in the vertex shader, thus making
instanced rendering possible. Non-world brush entities are still not
rendered, but the world entity is using both the entity data buffer and
entid buffer.
Sub-models and instance models need an instance data buffer, but this
gets the basics working (and the proof of concept). Using arrays like
this actually simplified a lot of the code, and will make it easy to get
transparency without turbulence (just another queue).
The gl water warp ones have been useless since very early on due to not
doing water warp in gl (vertex warping just didn't work well), and the
recent water warp implementation doesn't need those hacks. The rest of
the removed flags just aren't needed for anything. SURF_DRAWNOALPHA
might get renamed, but should be useful for translucent bsp surfaces
(eg, vines in ad_tears).
One more step towards BSP thread-safety. This one brought with it a very
noticeable speed boost (ie, not lost in the noise) thanks to the face
visframes being in tightly packed groups instead of 128 bytes apart,
though the sw render's boost is lost in the noise (but it's very
fill-rate limited).
This is next critical step to making BSP rendering thread-safe.
visframe was replaced with cluster (not used yet) in anticipation of BSP
cluster reconstruction (which will be necessary for dealing with large
maps like ad_tears).
The main goal was to get visframe out of mnode_t to make it thread-safe
(each thread can have its own visframe array), but moving the plane info
into mnode_t made for better data access patters when traversing the bsp
tree as the plane is right there with the child indices. Nicely, the
size of mnode_t is the same as before (64 bytes due to alignment), with
4 bytes wasted.
Performance-wise, there seems to be very little difference. Maybe
slightly slower.
The unfortunate thing about the change is the plane distance is negated,
possibly leading to some confusion, particularly since the box and
sphere culling functions were affected. However, this is so point-plane
distance calculations can be done with a single 4d dot product.
The map uses 41% of a 4k light map scrap, and 512 texture descriptors
wasn't enough for vulkan. Ouch. I do need to get cvars on these things,
but this will do for now (decades later...)
Sounds in Arcane Dimensions (at least those used by ad_tears) specify
start and end cue points. The code was using only the final point in the
list and thus breaking looped sounds. Now, the first cue point is used
as the loop start, and the second (if present), the sample length. Both
are bounds-checked against the wav's sample count. Fixes sound locking
up during the first seconds in ad_tears.
This one is ancient: the code was essentially unmodified since release
(just some formatting). Malformed vectors could sneak through due to map
bugs (eg, "angles -90" instead of "angle -90" as in ad_tears) and the
vector parsing code would continue past the end of the string and
writing into unowned memory, potentially messing up the libc allocation
records. Replacing with the obvious sscanf works nicely.
Sometimes, Quake code is brilliant. Other times, it's a real face-palm.
This fixes the annoying persistence of inputs when respawning and
changing levels. Axis input clearing is hooked up but does nothing as of
yet. Active device input clearing has always been hooked up, but also
does nothing in the evdev and x11 drivers.
It was added only because FitzQuake used it in its pre-bsp2 large-map
support. That support has been hidden in bspfile.c for some time now.
This doesn't gain much other than having one less type to worry about.
Well tested on Conflagrant Rodent (the map that caused the need for
mclipnode_t in the first place).
This was one of the biggest reasons I had trouble understanding the bsp
display list code, but it turns out it was for dealing with GLES's
16-bit limit on vertex indices. Since vulkan uses 32-bit indices,
there's no need for the extra layer of indirection. I'm pretty sure it
was that lack of understanding that prevented me from removing it when I
first converted the glsl bsp code to vulkan (ie, that 16-bit indices
were the only reason for elements_t).
It's hard to tell whether the change makes much difference to
performance, though it seems it might (noisy stats even over 50 timedemo
loops) and the better data localization indicate it should at least be
just as good if not better. However, the reason for the change is
simplifying the data structures so I can make bsp rendering thread-safe
in preparation for rendering shadow maps.
And maybe a nano-optimization. Switching from (~side + 1) to (-side)
seems to give glsl a very tiny speed boost, but certainly doesn't hurt.
Looking at some assembly output for the three cases, the two hacks seem
to generate the same code as each other, but 3 instructions vs 6 for ?:.
While ?: is more generically robust, the hacks are tuned for the
knowledge side is either 0 or 1. The later xor might alter things, but
at least I now know that the hack (either version) is worthwhile.
This is a particularly ancient bug, sort of introduced by rhamph when he
optimized temp entity model handling and later exacerbated by me.
However, I suspect the actual problem is limited to nq as qw's gamedir
handling would have caused the models to be reloaded, but nq doesn't
ever change game directories once running.
With experience, I have found that trying to continue after a validation
error tends to result in a segfault or some other nastiness, and
Sys_Shutdown (and the full shutdown sequence) is triggered for any error
signal (segfault, abort, etc) so just exit(1).
Although the skin pointer was being advanced after recording the
information in for the batch array, it was being reset the next time
around the loop (due to a mistranslation of the previous code). This
fixes the segfault while loading (gl, glsl, vulkan) or rendering (sw)
the sphere model from Rogue.
Some very much needed comments :P Still, nicely, I now have a much
better understanding of how the display lists are created (10 years
is a long time to remember how intricate code works (I do remember
fighting to get it working back then))
This makes it much easier to see just what is being done to build a
polygon to be passed to the GPU, and it served as a test for the
lightmap st changes since Vulkan currently never used them.
Many modders use negative lights for interesting effects, but vulkan
doesn't like the result of a negative int treated as unsigned when it
comes to texture sizes.
However, this time it doesn't modify the light array when it sorts the
lights by size since the lights are now located before the renderer gets
to see them, and having the fix up the light leafs array would be too
painful (and probably the completely wrong thing to do anyway: the light
array should be treated as constant by the renderer). 1.6GB of memory
for gmsp3v2's lights (a little better than marcher: more smaller lights?).
For reference:
gmsp3v2: shadow maps: 8330 layers in 29 images: 1647706112
marcher: shadow maps: 2440 layers in 11 images: 2358575104
While it wasn't the root cause of the disappearing lights (even after
sorting out the light limit issue), because the cause of that was
everything working as designed, I suspect sunlight wasn't reaching as
far as it should. Even it it was, this should be slightly faster
(especially for larger maps) as leafs can be tested 32 or 64 at a time
rather than individually.
For now, at least (I have some ideas to possibly reduce the numbers and
also to avoid the need for actual limits). I've seen gmsp3v2 use over
500 lights at once (it has over 1300), and I spent too long figuring out
that weird light behavior was due to the limit being hit and lights
getting dropped (and even longer figuring out that more weird behavior
was due to the lack of shadows and the world being too bright in the
first place).
Moving the negation into the calculation of the sun angle prevents -0
getting into the vector (not that it makes much difference other than
minor confusion when reading the light data).
Since the staging buffer allocates the command buffers it uses, it
needs to free them when it is freed. I think I was confused by the
validation layers not complaining about unfreed buffers when shutting
down, but that's because destroying the pool (during program shutdown,
when the validation layers would complain) frees all the buffers. Thus,
due to staging buffers being created and destroyed during the level load
process, (rather large) command buffers were piling up like imps in a
Doom level.
In the process, it was necessary to rearrange some of the shutdown code
because vulkan_vid_render_shutdown destroys the shared command pool, but
the pool is required for freeing the command buffers, but there was a
minor mess of long-lived staging buffers being freed afterwards. That
didn't end particularly well.
This ensures that the plugin's shutdown function won't get called twice
in the event of an error in the plugin's unload sequence triggering a
second Sys_Shutdown, especially if the plugin is being unloaded as a
part of another sub-system's shutdown sequence (which is probably in
itself a design mistake, need to look into that).
While gcc was quite correct in its warning, all I needed was to
explicitly truncate the string. I don't remember why I didn't do that
back when I made the changes in 4f58429137, but it works now, and the
surrounding code does expect the string to be no more than 15 chars
long. This fixes yet another memory leak (but timedemo over multiple
runs still leaks like a sieve).
This is meant for a "permanent" tear-down before freeing the memory
holding the VM state or at program shutdown. As a consequence, builtin
sub-systems registering resources are now required to pass a "destroy"
function pointer that will be called just before the memory holding
those resources is freed by the VM resource manager (ie, the manager
owns the resource memory block, but each subsystem is responsible for
cleaning up any resources held within that block).
This even enhances thread-safety in rua_obj (there are some problems
with cmd, cvar, and gib).
This gives a rather significant speed boost to timedemo demo1: from
about 2300-2360fps up to 2520-2600fps, at least when using
multi-texture.
Since it was necessary for testing the scrap, gl got the ability to set
the console background texture, too.
While it takes one extra step to grab the marksurface pointer,
R_MarkLeaves and R_MarkLights (the two actual users) seem to be either
the same speed or fractionally faster (by a few microseconds). I imagine
the loss gone to the extra fetch is made up for by better bandwidth
while traversing the leafs array (mleaf_t now fits in a single cache
line, so leafs are cache-aligned since hunk allocations are aligned).
Unfortunately, the animations are pre-baked (by the loader) blocking
run-time determined animations (IK etc). However, this at least gets
everything working so the basics can be verified (the shader posed some
issue resulting in horror movies ;).
It copies an entire hierarchy (minus actual entities, but I'm as yet
unsure how to proceed with them), even across scenes as the source scene
is irrelevant and the destination scene is used for creating the new
transforms.
Brush models looked a little too tricky due to the very different style
of command queue, so that's left for now, but alias, iqm and sprite
entities are now labeled. The labels are made up of the lower 5 hex
digits of the entity address, the position, and colored by the
normalized position vector. Not sure that's the best choice as it does
mean the color changes as the entity moves, and can be quite subtle
between nearby entities, but it still helps identify the entities in the
command buffer.
And, as I suspected, I've got multiple draw calls for the one ogre. Now
to find out why.
The bones aren't animated yet (and I realized I made the mistake of
thinking the bone buffer was per-model when it's really per-instance (I
think this mistake is in the rest of QF, too)), skin rendering is a
mess, need to default vertex attributes that aren't in the model...
Still, it's quite satisfying seeing Mr Fixit on screen again :)
I wound up moving the pipeline spec in with the rest of the pipelines as
the system isn't really ready for separating them.
The plists can now be accessed by name and the forward render pass
config is available (but not used, or tested beyond syntax). I was going
to have the IQM pipeline spec separate but ran into limitations in the
system (which needs a lot of polish, really).
That @inherit is pretty useful :) This makes it much easier to see how
different pipelines differ or how they are the similar. It also makes it
much clearer which sub-pass they're for.
I was wondering why scaled-down quake-guy was dimmer than full-size
quake-guy. And the per-fragment normalization gives the illusion of
smoothness if you don't look at his legs (and even then...).
Maps specify sunlight as shining in a specific direction, but the
lighting system wants the direction to the sun as it's used directly in
shading calculations. Direction correctness confirmed by disabling other
lights and checking marcher's outside scene (ensuring the flat ground
was lit). As a bonus, I've finally confirmed I actually have the skybox
in the correct orientation (sunlight vector more or less matched the
position of the sun in marcher's sky).
I'm not sure what's up with the weird lighting that results from dynamic
lights being directional (sunlight works nicely in marcher, but it has a
unit vector for position).
Abyss of Pandemonium uses global ambient light a lot, but doesn't
specify it in every map (nothing extracting entities and adding a
reasonable value can't fix). I imagine some further tweaking will be
needed.