This is meant for a "permanent" tear-down before freeing the memory
holding the VM state or at program shutdown. As a consequence, builtin
sub-systems registering resources are now required to pass a "destroy"
function pointer that will be called just before the memory holding
those resources is freed by the VM resource manager (ie, the manager
owns the resource memory block, but each subsystem is responsible for
cleaning up any resources held within that block).
This even enhances thread-safety in rua_obj (there are some problems
with cmd, cvar, and gib).
This gives a rather significant speed boost to timedemo demo1: from
about 2300-2360fps up to 2520-2600fps, at least when using
multi-texture.
Since it was necessary for testing the scrap, gl got the ability to set
the console background texture, too.
While it takes one extra step to grab the marksurface pointer,
R_MarkLeaves and R_MarkLights (the two actual users) seem to be either
the same speed or fractionally faster (by a few microseconds). I imagine
the loss gone to the extra fetch is made up for by better bandwidth
while traversing the leafs array (mleaf_t now fits in a single cache
line, so leafs are cache-aligned since hunk allocations are aligned).
Unfortunately, the animations are pre-baked (by the loader) blocking
run-time determined animations (IK etc). However, this at least gets
everything working so the basics can be verified (the shader posed some
issue resulting in horror movies ;).
It copies an entire hierarchy (minus actual entities, but I'm as yet
unsure how to proceed with them), even across scenes as the source scene
is irrelevant and the destination scene is used for creating the new
transforms.
Brush models looked a little too tricky due to the very different style
of command queue, so that's left for now, but alias, iqm and sprite
entities are now labeled. The labels are made up of the lower 5 hex
digits of the entity address, the position, and colored by the
normalized position vector. Not sure that's the best choice as it does
mean the color changes as the entity moves, and can be quite subtle
between nearby entities, but it still helps identify the entities in the
command buffer.
And, as I suspected, I've got multiple draw calls for the one ogre. Now
to find out why.
The bones aren't animated yet (and I realized I made the mistake of
thinking the bone buffer was per-model when it's really per-instance (I
think this mistake is in the rest of QF, too)), skin rendering is a
mess, need to default vertex attributes that aren't in the model...
Still, it's quite satisfying seeing Mr Fixit on screen again :)
I wound up moving the pipeline spec in with the rest of the pipelines as
the system isn't really ready for separating them.
The plists can now be accessed by name and the forward render pass
config is available (but not used, or tested beyond syntax). I was going
to have the IQM pipeline spec separate but ran into limitations in the
system (which needs a lot of polish, really).
That @inherit is pretty useful :) This makes it much easier to see how
different pipelines differ or how they are the similar. It also makes it
much clearer which sub-pass they're for.
I was wondering why scaled-down quake-guy was dimmer than full-size
quake-guy. And the per-fragment normalization gives the illusion of
smoothness if you don't look at his legs (and even then...).
Maps specify sunlight as shining in a specific direction, but the
lighting system wants the direction to the sun as it's used directly in
shading calculations. Direction correctness confirmed by disabling other
lights and checking marcher's outside scene (ensuring the flat ground
was lit). As a bonus, I've finally confirmed I actually have the skybox
in the correct orientation (sunlight vector more or less matched the
position of the sun in marcher's sky).
I'm not sure what's up with the weird lighting that results from dynamic
lights being directional (sunlight works nicely in marcher, but it has a
unit vector for position).
Abyss of Pandemonium uses global ambient light a lot, but doesn't
specify it in every map (nothing extracting entities and adding a
reasonable value can't fix). I imagine some further tweaking will be
needed.
The parsing of light data from maps is now in the client library, and
basic light management is in scene. Putting the light loading code into
the Vulkan renderer was a mistake I've wanted to correct for a while.
The client code still needs a bit of cleanup, but the basics are working
nicely.
This replaces *_NewMap with *_NewScene and adds SCR_NewScene to handle
loading a new map (for quake) in the renderer, and will eventually be
how any new scene is loaded.
This leaves only the one conditional in the shader code, that being the
distance check. It doesn't seem to make any noticeable difference to
performance, but other than explosion sprites being blue, lighting
quality seems to have improved. However, I really need to get shadows
working: marcher is just silly-bright without them, and light levels
changing as I move around is a bit disconcerting (but reasonable as
those lights' leaf nodes go in and out of visibility).
Id Software had pretty much nothing to do with the vulkan renderer (they
still get credit for code that's heavily based on the original quake
code, of course).
It's not used yet, and thus may have some incorrect settings, but I
decided that I will probably want it at some stage for qwaq. It's
essentially was was in the original spec, but updated for some of the
niceties added to parsing since I removed it back then. It's also in its
own file.
Just "loading" and "unloading" (both really just hints due to the
caching system), and an internal function for converting a handle to a
model pointer, but it let me test IQM loading and unloading in Vulkan.
The model system is rather clunky as it is focused around caching, so
unloading is more of a suggestion than anything, but it was good enough
for testing loading and unloading of IQM models in Vulkan.
Despite the base IQM specification not supporting blend-shapes, I think
IQM will become the basis for QF's generic model representation (at
least for the more advanced renderers). After my experience with .mu
models (KSP) and unity mesh objects (both normal and skinned), and
reviewing the IQM spec, it looks like with the addition of support for
blend-shapes, IQM is actually pretty good.
This is just the preliminary work to get standard IQM models loading in
vulkan (seems to work, along with unloading), and they very basics into
the renderer (most likely not working: not tested yet). The rest of the
renderer seems to be unaffected, though, which is good.
The resource subsystem creates buffers, images, buffer views and image
views in a single batch operation, using a single memory object to back
all the buffers and images. I had been doing this by hand for a while,
but got tired of jumping through all those vulkan hoops. While it's
still a little tedious to set up the arrays for QFV_CreateResource (and
they need to be kept around for QFV_DestroyResource), it really eases
calculation of memory object size and sub-resource offsets. And
destroying all the objects is just one call to QFV_DestroyResource.
I might need to do similar for other formats, but i ran into the problem
of the texture type being tex_palette instead of the expected tex_rgba
when pre-(no-)loading a tga image resulting in Vulkan not liking my
attempt at generating mipmaps.
This allows the fuzzy bsearch used to find a def by address to work
properly (ie, find the actual def instead of giving some other def +
offset). Makes for a much more readable instruction stream.
The scene id is in the lower 32-bits for all objects (upper 32-bits are
0 for actual scene objects) and entity/transform ids are in the upper
32-bits. Saves having to pass around a second parameter in progs code.
pr_type_t now contains only the one "value" field, and all the access
macros now use their PACKED variant for base access, making access to
larger types more consistent with the smaller types.
Vulkan doesn't appreciate the empty buffers that result from the model
not having any textures or surfaces that can be rendered (rightfully so,
for such a bare-metal api).
I doubt the calls were ever actually made in a normal map due to the
node actually being a node when breaking out of the loop, but when I
experimented with an empty world model (no nodes, one infinite empty
leaf) I found that visit_leaf was getting called twice instead of once.
Since it is updated every frame, it needs to be as fast as possible for
the cpu code. This seems to make a difference of about 10us (~130 ->
~120) when testing in marcher. Not a huge change, but the timing
calculation was wrapped around the entire base world pass, so there was
a fair bit of overhead from bsp traversal etc.
It makes a significant difference to level load times (approximately
halves them for demo1 and demo2). Nicely, it turns out I had implemented
the rest of the staging buffer code (in particular, flushing) correctly
in that it seems there's no corruption any of the data.
They're really redundant, and removing the next pointer makes for a
slightly smaller cvar struct. Cvar_Select was added to allow finding
lists of matching cvars.
The tab-completion and config saving code was reworked to use the hash
table DO functions. Comments removed since the code was completely
rewritten, but still many thanks to EvilTypeGuy and Fett.
Hash_Select returns a list of elements that match a given criterion
(select callback returning non-0).
Hash_ForEach simply calls a function for every element.
And use it for hud_scoreboard_gravity. Putting the enum def in view made
the most sense as view does own the base type and the enum is likely to
be by useful for other settings.
I think I'd gotten distracted while making the changes to the server,
then simply copied the partial changes to the client. It didn't blow up
thanks to the backing store bing char * and the type sized for int, so
safe on any platform, but useless as it wasn't connected properly.
It's actually pretty neat being able to directly, but safely, control a
function pointer via a cvar :)
The misinterpretations were due to either the cvar not being accessed
directly by the engine, but via only the callback, or the cvars were
accesssed only by progs (in which case, they should be float). The
remainder are a potential enum (hud gravity) and a "too hard basket"
(rcon password: need to figure out how I want to handle secret strings).
Other parts of quakefs treat an empty path as an error, so fs_sharepath
and fs_userpath must never be empty or they will effectively be
rejected. While the user explicitly setting them to empty strings is one
way for them to become empty, another is QFS_CompressPath compressing
'.' to an empty path, which makes it rather difficult to set up the
traditional quake directory tree (ie, operate from the current
directory).
My script didn't know what type to make the cvars since they're not used
directly by the code, so they got treated as strings instead of ints or
floats.
This is an extremely extensive patch as it hits every cvar, and every
usage of the cvars. Cvars no longer store the value they control,
instead, they use a cexpr value object to reference the value and
specify the value's type (currently, a null type is used for strings).
Non-string cvars are passed through cexpr, allowing expressions in the
cvars' settings. Also, cvars have returned to an enhanced version of the
original (id quake) registration scheme.
As a minor benefit, relevant code having direct access to the
cvar-controlled variables is probably a slight optimization as it
removed a pointer dereference, and the variables can be located for data
locality.
The static cvar descriptors are made private as an additional safety
layer, though there's nothing stopping external modification via
Cvar_FindVar (which is needed for adding listeners).
While not used yet (partly due to working out the design), cvars can
have a validation function.
Registering a cvar allows a primary listener (and its data) to be
specified: it will always be called first when the cvar is modified. The
combination of proper listeners and direct access to the controlled
variable greatly simplifies the more complex cvar interactions as much
less null checking is required, and there's no need for one cvar's
callback to call another's.
nq-x11 is known to work at least well enough for the demos. More testing
will come.
The prefix gives more context to the error messages, making the system a
lot easier to use (it was especially helpful when getting my cvar revamp
into shape).
Based on the flags type used in vkparse (difference is the lack of
support for plists). Having this will make supporting named flags in
cvars much easier (though setting up the enum type is a bit of a chore).
This allows for easy (and safe) printing of cexpr values where the type
supports it. Types that don't support printing would be due to being too
complex or possibly write-only (eg, password strings, when strings are
supported directly).
Surprisingly, only two, but they were caught by the different value
fields being used, thus the cvar was checked in multiple places. I
imagine that's not really all that common, so there may be some
inconsistencies between default value and use.
This is progress towards #23. There are still some references to
host_time and host_client (via nq's server.h), and a lot of references
to sv and svs, but this is definitely a step in the right direction.
This allows a single render pass description to be used for both
on-screen and off-screen targets. While Vulkan does allow a VkRenderPass
to be used with any compatible frame buffer, and vkparse caches a
VkRenderPass created from the same description, this allows the same
description to be used for a compatible off-screen target without any
dependence on the swapchain. However, there is a problem in the caching
when it comes to targeting outputs with different formats.
As I had suspected, it's due to a synchronization problem between the
scrap and drawing. There's actually a double problem in that data
uploaded to the scrap isn't flushed until the first frame is rendered
causing a quick init-shutdown sequence to take at least five seconds due
to the staging buffer waiting (and timing out) on a stuck fence.
Rendering just one frame "fixes" the problem (draw was one of the
earliest subsystems to get going in vulkan).
Surprisingly, only two, but they were caught by the different value
fields being used, thus the cvar was checked in multiple places. I
imagine that's not really all that common, so there may be some
inconsistencies between default value and use.
This is progress towards #23. There are still some references to
host_time and host_client (via nq's server.h), and a lot of references
to sv and svs, but this is definitely a step in the right direction.
Since it is updated every frame, it needs to be as fast as possible for
the cpu code. This seems to make a difference of about 10us (~130 ->
~120) when testing in marcher. Not a huge change, but the timing
calculation was wrapped around the entire base world pass, so there was
a fair bit of overhead from bsp traversal etc.
The improved allocation overheads have been implemented for gl and sw,
and glsl no longer uses malloc. Using array textures will have to wait
as the current texture loading code doesn't support them.
Really, this won't make all that much difference because alias models
with more than one skin are quite rare, and those with animated skin
groups are even rarer. However, for those models that do have more than
one skin, it will allow for reduced allocation overheads, and when
supported (glsl, vulkan, maybe gl), loading all the skins into an array
texture (since all skins are the same size, though external skins may
vary), but that's not implemented yet, this just wraps the old one skin
at a time code.
While looking at the deferred attachment images with using a template in
mind, I noticed that the opaque attachment was using 8-bit color. The
problem is, it's meant to be HDRI with the compose pass crunching it
down to LDRI. Switching to 16-bit float does seem to have made a subtle
difference (hey, it's still quake data, not much HDRI in there).
That certainly makes it nicer to work with large sets, and shows one way
to be careful with allocated resources: don't allocate them in the
inherited data and use a template that needs a few things filled in to
be valid. Also, it seems that overriding values in sub-structures "just
works" :)
It simply parses the referenced plist dictionary (via @inherit =
plist.path;) into the current data block, then allows the data to be
overwritten by the current plist dictionary. This may be a bit iffy for
any allocated resources, so some care must be taken, but it seems to
work nicely.
This allows a single render pass description to be used for both
on-screen and off-screen targets. While Vulkan does allow a VkRenderPass
to be used with any compatible frame buffer, and vkparse caches a
VkRenderPass created from the same description, this allows the same
description to be used for a compatible off-screen target without any
dependence on the swapchain. However, there is a problem in the caching
when it comes to targeting outputs with different formats.
This makes much more sense as they are intimately tied to the frame
buffer on which a render pass is working. Now, just the window width
and height are stored in vulkan_ctx_t. As a side benefit,
QFV_CreateSwapchain no long references viddef (now just palette and
conview in vulkan_draw.c to go).
While I have trouble imagining it making that much performance
difference going from 4 verts to 3 for a whopping 2 polygons, or even
from 2 triangles to 1 for each poly, using only indices for the vertices
does remove a lot of code, and better yet, some memory and buffer
allocations... always a good thing.
That said, I guess freeing up a GPU thread for something else could make
a difference.
I think I had gotten lucky with captures not being corrupt due to them
being much bigger than all but the L3 cache (and then they're over 1/2
the size), so the memory was being automatically invalidated by other
activity. Don't want to trust such luck, though.
It makes a significant difference to level load times (approximately
halves them for demo1 and demo2). Nicely, it turns out I had implemented
the rest of the staging buffer code (in particular, flushing) correctly
in that it seems there's no corruption any of the data.
This means that a tex_t object is passed in instead of just raw bytes
and width and height, but it means the texture can specify whether it's
flipped or uses BGR instead of RGB. This fixes the upside down
screenshots for vulkan.
This fixes (*ahem*) the vulkan renderer segfaulting when attempting to
take a screenshot. However, the image is upside down. Also, remote
snapshots and demo capture are broken for the moment.
QFS_NextFilename was renamed to QFS_NextFile to reflect the fact it now
returns a QFile pointer for the newly created file (as well as the
name). This necessitated updating WritePNG to take a file pointer
instead of a file name, with the advantage that WritePNGqfs is no longer
necessary and callers have much more control over the creation of the
file.
This makes QFS_NextFile much more secure against file system race
conditions and attacks (at least in theory). If nothing else, it will
make it more robust in a multi-threaded environment.
It's not there yet as it promptly closes the file and returns only the
filename (and then only the portion within the user's directory tree).
However, this worked nicely as a test for Sys_UniqueFile.
QF currently uses unique file names for screenshots and server-side
demos (and remote snapshots), but they're generally useful.
QFS_NextFilename has been filling this role, but it is highly insecure
in its current implementation. This is the first step to taking care of
that.
The tests fail due to differences in how clang and gcc treat floating
point to unsigned integral type conversions when the values overflow. It
wouldn't be so bad if clang was consistent with conversions to 32-bit
unsigned integers, like it seems to be with conversion to 64-bit
unsigned integers.
With this, the "get QF building with clang" mini-project is done and I
won't have to panic when someone comes to me and asks if it will work.
At worst, there'll be a little bit-rot.
Only edicts themselves get a smaller alignment (4, 8 or 32 bytes,
depending on hardware and progs version). I didn't want to waste too
much memory on edict alignment for progs that don't need any better than
void *, but the zone really wants 64 bytes, and the stack might as well
have 64 bytes as well. Fixes a segfault when running v6 progs in a clang
build (clang got really agressive with simd in zone.c).
gcc and clang have rather different swizzle builtins, but both do a nice
job of optimizing the intuitive initializer swizzle (I think gcc 8(?)
didn't do such a good job thus my use of __builtin_shuffle).
clang doesn't like anything but a bare 0 as null (and in some of the
cases, it was quite right: '\0' should not be treated as a null
pointer). And the crashers were just for paranoia and probably aren't
needed any more (kept for now, though).
It seems clang defaults to unsigned for enums. Interestingly, gcc was ok
with the checks being either way. I guess gcc treats enums that *can* be
unsigned as DWIM.
Still work with gcc, of course, and I still need to fix them properly,
but now they're actually slightly easier to find as they all have vec_t
and FIXME on the same line.