Brush models looked a little too tricky due to the very different style
of command queue, so that's left for now, but alias, iqm and sprite
entities are now labeled. The labels are made up of the lower 5 hex
digits of the entity address, the position, and colored by the
normalized position vector. Not sure that's the best choice as it does
mean the color changes as the entity moves, and can be quite subtle
between nearby entities, but it still helps identify the entities in the
command buffer.
And, as I suspected, I've got multiple draw calls for the one ogre. Now
to find out why.
The bones aren't animated yet (and I realized I made the mistake of
thinking the bone buffer was per-model when it's really per-instance (I
think this mistake is in the rest of QF, too)), skin rendering is a
mess, need to default vertex attributes that aren't in the model...
Still, it's quite satisfying seeing Mr Fixit on screen again :)
I wound up moving the pipeline spec in with the rest of the pipelines as
the system isn't really ready for separating them.
The plists can now be accessed by name and the forward render pass
config is available (but not used, or tested beyond syntax). I was going
to have the IQM pipeline spec separate but ran into limitations in the
system (which needs a lot of polish, really).
That @inherit is pretty useful :) This makes it much easier to see how
different pipelines differ or how they are the similar. It also makes it
much clearer which sub-pass they're for.
I was wondering why scaled-down quake-guy was dimmer than full-size
quake-guy. And the per-fragment normalization gives the illusion of
smoothness if you don't look at his legs (and even then...).
I'm not sure what's up with the weird lighting that results from dynamic
lights being directional (sunlight works nicely in marcher, but it has a
unit vector for position).
The parsing of light data from maps is now in the client library, and
basic light management is in scene. Putting the light loading code into
the Vulkan renderer was a mistake I've wanted to correct for a while.
The client code still needs a bit of cleanup, but the basics are working
nicely.
This replaces *_NewMap with *_NewScene and adds SCR_NewScene to handle
loading a new map (for quake) in the renderer, and will eventually be
how any new scene is loaded.
This leaves only the one conditional in the shader code, that being the
distance check. It doesn't seem to make any noticeable difference to
performance, but other than explosion sprites being blue, lighting
quality seems to have improved. However, I really need to get shadows
working: marcher is just silly-bright without them, and light levels
changing as I move around is a bit disconcerting (but reasonable as
those lights' leaf nodes go in and out of visibility).
Id Software had pretty much nothing to do with the vulkan renderer (they
still get credit for code that's heavily based on the original quake
code, of course).
It's not used yet, and thus may have some incorrect settings, but I
decided that I will probably want it at some stage for qwaq. It's
essentially was was in the original spec, but updated for some of the
niceties added to parsing since I removed it back then. It's also in its
own file.
Despite the base IQM specification not supporting blend-shapes, I think
IQM will become the basis for QF's generic model representation (at
least for the more advanced renderers). After my experience with .mu
models (KSP) and unity mesh objects (both normal and skinned), and
reviewing the IQM spec, it looks like with the addition of support for
blend-shapes, IQM is actually pretty good.
This is just the preliminary work to get standard IQM models loading in
vulkan (seems to work, along with unloading), and they very basics into
the renderer (most likely not working: not tested yet). The rest of the
renderer seems to be unaffected, though, which is good.
The resource subsystem creates buffers, images, buffer views and image
views in a single batch operation, using a single memory object to back
all the buffers and images. I had been doing this by hand for a while,
but got tired of jumping through all those vulkan hoops. While it's
still a little tedious to set up the arrays for QFV_CreateResource (and
they need to be kept around for QFV_DestroyResource), it really eases
calculation of memory object size and sub-resource offsets. And
destroying all the objects is just one call to QFV_DestroyResource.
Vulkan doesn't appreciate the empty buffers that result from the model
not having any textures or surfaces that can be rendered (rightfully so,
for such a bare-metal api).
I doubt the calls were ever actually made in a normal map due to the
node actually being a node when breaking out of the loop, but when I
experimented with an empty world model (no nodes, one infinite empty
leaf) I found that visit_leaf was getting called twice instead of once.
Since it is updated every frame, it needs to be as fast as possible for
the cpu code. This seems to make a difference of about 10us (~130 ->
~120) when testing in marcher. Not a huge change, but the timing
calculation was wrapped around the entire base world pass, so there was
a fair bit of overhead from bsp traversal etc.
It makes a significant difference to level load times (approximately
halves them for demo1 and demo2). Nicely, it turns out I had implemented
the rest of the staging buffer code (in particular, flushing) correctly
in that it seems there's no corruption any of the data.
This is an extremely extensive patch as it hits every cvar, and every
usage of the cvars. Cvars no longer store the value they control,
instead, they use a cexpr value object to reference the value and
specify the value's type (currently, a null type is used for strings).
Non-string cvars are passed through cexpr, allowing expressions in the
cvars' settings. Also, cvars have returned to an enhanced version of the
original (id quake) registration scheme.
As a minor benefit, relevant code having direct access to the
cvar-controlled variables is probably a slight optimization as it
removed a pointer dereference, and the variables can be located for data
locality.
The static cvar descriptors are made private as an additional safety
layer, though there's nothing stopping external modification via
Cvar_FindVar (which is needed for adding listeners).
While not used yet (partly due to working out the design), cvars can
have a validation function.
Registering a cvar allows a primary listener (and its data) to be
specified: it will always be called first when the cvar is modified. The
combination of proper listeners and direct access to the controlled
variable greatly simplifies the more complex cvar interactions as much
less null checking is required, and there's no need for one cvar's
callback to call another's.
nq-x11 is known to work at least well enough for the demos. More testing
will come.
This allows for easy (and safe) printing of cexpr values where the type
supports it. Types that don't support printing would be due to being too
complex or possibly write-only (eg, password strings, when strings are
supported directly).
This allows a single render pass description to be used for both
on-screen and off-screen targets. While Vulkan does allow a VkRenderPass
to be used with any compatible frame buffer, and vkparse caches a
VkRenderPass created from the same description, this allows the same
description to be used for a compatible off-screen target without any
dependence on the swapchain. However, there is a problem in the caching
when it comes to targeting outputs with different formats.
As I had suspected, it's due to a synchronization problem between the
scrap and drawing. There's actually a double problem in that data
uploaded to the scrap isn't flushed until the first frame is rendered
causing a quick init-shutdown sequence to take at least five seconds due
to the staging buffer waiting (and timing out) on a stuck fence.
Rendering just one frame "fixes" the problem (draw was one of the
earliest subsystems to get going in vulkan).
Since it is updated every frame, it needs to be as fast as possible for
the cpu code. This seems to make a difference of about 10us (~130 ->
~120) when testing in marcher. Not a huge change, but the timing
calculation was wrapped around the entire base world pass, so there was
a fair bit of overhead from bsp traversal etc.
While looking at the deferred attachment images with using a template in
mind, I noticed that the opaque attachment was using 8-bit color. The
problem is, it's meant to be HDRI with the compose pass crunching it
down to LDRI. Switching to 16-bit float does seem to have made a subtle
difference (hey, it's still quake data, not much HDRI in there).
That certainly makes it nicer to work with large sets, and shows one way
to be careful with allocated resources: don't allocate them in the
inherited data and use a template that needs a few things filled in to
be valid. Also, it seems that overriding values in sub-structures "just
works" :)
It simply parses the referenced plist dictionary (via @inherit =
plist.path;) into the current data block, then allows the data to be
overwritten by the current plist dictionary. This may be a bit iffy for
any allocated resources, so some care must be taken, but it seems to
work nicely.
This allows a single render pass description to be used for both
on-screen and off-screen targets. While Vulkan does allow a VkRenderPass
to be used with any compatible frame buffer, and vkparse caches a
VkRenderPass created from the same description, this allows the same
description to be used for a compatible off-screen target without any
dependence on the swapchain. However, there is a problem in the caching
when it comes to targeting outputs with different formats.
This makes much more sense as they are intimately tied to the frame
buffer on which a render pass is working. Now, just the window width
and height are stored in vulkan_ctx_t. As a side benefit,
QFV_CreateSwapchain no long references viddef (now just palette and
conview in vulkan_draw.c to go).
While I have trouble imagining it making that much performance
difference going from 4 verts to 3 for a whopping 2 polygons, or even
from 2 triangles to 1 for each poly, using only indices for the vertices
does remove a lot of code, and better yet, some memory and buffer
allocations... always a good thing.
That said, I guess freeing up a GPU thread for something else could make
a difference.
I think I had gotten lucky with captures not being corrupt due to them
being much bigger than all but the L3 cache (and then they're over 1/2
the size), so the memory was being automatically invalidated by other
activity. Don't want to trust such luck, though.
It makes a significant difference to level load times (approximately
halves them for demo1 and demo2). Nicely, it turns out I had implemented
the rest of the staging buffer code (in particular, flushing) correctly
in that it seems there's no corruption any of the data.
This means that a tex_t object is passed in instead of just raw bytes
and width and height, but it means the texture can specify whether it's
flipped or uses BGR instead of RGB. This fixes the upside down
screenshots for vulkan.
Still work with gcc, of course, and I still need to fix them properly,
but now they're actually slightly easier to find as they all have vec_t
and FIXME on the same line.
Viewport and FOV updates are now separate so updating one doesn't cause
recalculations of the other. Also, perspective setup is now done
directly from the tangents of the half angles for fov_x and fov_y making
the renderers independent of fov/aspect mode. I imagine things are a bit
of a mess with view size changes, and especially screen size changes
(not supported yet anyway), and vulkan winds up updating its projection
matrices every frame, but everything that's expected to work does
(vulkan errors out for fisheye or warp due to frame buffer creation not
being supported yet).
r_screen isn't really the right place, but it gets the scene rendering
out of the low-level renderers and will make it easier to sort out
later, and hopefully easier to figure out a good design for vulkan.
The code is really part of scene (not a typo wrt r_screen: that is
misnamed as such, or at least SCR_UpdateScreen needs to be split into
screen (2d overlay, really) and scene updates).
This breaks fisheye rendering as the fisheye code calls the actual scene
render code multiple times, but the fisheye code is called by said scene
render code via a diversion. The fisheye needs to be moved out to the
high level scene render, but that will takes some extra work for frame
buffer setup.
This moves the common camera setup code out of the individual drivers,
and completely removes vup/vright/vpn from the non-software renderers.
This has highlighted the craziness around AngleVectors with it putting
+X forward, -Y right and +Z up. The main issue with this is it requires
a 90 degree pre-rotation about the Z axis to get the camera pointing in
the right direction, and that's for the native sw renderer (vulkan needs
a 90 degree pre-rotation about X, and gl and glsl need to invert an
axis, too), though at least it's just a matrix swizzle and vector
negation. However, it does mean the camera matrices can't be used
directly.
Also rename vpn to vfwd (still abbreviated, but fwd is much clearer in
meaning (to me, at least) than pn (plane normal, I guess, but which
way?)).
This is a step towards high-level unification of the renderers, as far
as possible keeping only actual low-level implementation details in the
individual renderers (some higher level stuff, eg shadows, is expected
to be per-renderer as some things are just not feasible to implement in
all renderers). However, the idea is to move the high-level
functionality into scene rendering.
While there's currently only the one still, this will allow the entities
to be multiply queued for multi-pass rendering (eg, shadows). As the
avoidance of putting an entity in the same queue more than once relies
on the entity id, all entities now come from the scene (which is stored
in cl_world in the client code for nq and qw), thus the extensive
changes in the clients.
While I doubt the difference is all that significant, this should speed
up entity rendering because it cuts out a lot of branching, and
eliminates scanning the same list multiple times only to not do anything
for large chunks of the list.
While both matrices had positive determinants in the first place, I find
the projection matrix easier to understand without all the negatives,
and having quake-x/vulkan-z positively parallel in the z-up matrix makes
that a lot easier to think about.
Regardless of whether the sky is spinning or not, the matrix needs to be
updated with the current origin in order to get the direction vector
right in the shader. Also, it's in the update that the required x-y
plane rotation gets in so the skies move in the correct direction.
long is ignored for double, and v6p progs are stuck with 32 bits for
longs (don't feel like extending v6p any further), but the basics are
there for Ruamoko.
short is ignored for ints because the minimum size is 32, and signed is
just noise for ints anyway (and no chars, so...).
unsigned, however, is finally implemented properly (or at least seems to
be working correctly: tests pass after getting things compiling again,
and lt.u is used where it should be :)
And other related fields so integer is now int (and uinteger is uint). I
really don't know why I went with integer in the first place, but this
will make using macros easier for dealing with types.
Forgetting to invoke [super dealloc] in a derived class's -dealloc
method has caused me to waste far too much time chasing down the
resulting memory leaks and crashes. This is actually the main focus of
issue #24, but I want to take care of multiple paths before I consider
the issue to be done.
However, as a bonus, four cases were found :)
This takes care of the global variables to a point (there is still the
global struct shared between the non-vulkan renderers), but it also
takes care of glsl's points-only rendering.
After yesterday's crazy marathon editing all the particles files, and
starting to do another big change to them today, I realized that I
really do need to merge them down. All the actual spawning is now in the
client library (though particle insertion will need to be moved). GLSL
particle rendering is semi-broken in that it now does only points (until
I come up with a way to select between points and quads (probably a
context object, which I need anyway for Vulkan)).
This has the advantage of getting entity_t out of the particle system,
and much easier to read math. Also, it served as a nice test for my
particle physics shaders (implemented the ideas in C). There's a lot of
code that needs merging down: all but the actual drawing can be merged.
There's some weirdness with color ramps, but I'll look into that later.
They should increment by one for each pic, not 4 (I think some fluff
remaining from copying glsl's draw code).
I noticed the problem when I saw large gaps of 0s in the vertex data in
renderdoc.
This means color, emission, and translucent. Fixes the HOM issues on my
VersaPro (but halves the frame-rate... definitely need to bring back the
forward renderer as an option).
This gets the pipelines loaded (and unloaded on shutdown). Probably the
easy part :P. Still need to sort out the command buffers,
synchronization, and particle generation (and probably a bunch else
that's not coming to mind).
This needed changing Vulkan_CreatePipeline to
Vulkan_CreateGraphicsPipeline for consistency (and parsing the
difference from a plist seemed... not worth thinking about).
It turned out the bindless approach wouldn't work too well for my design
of the sprite objects, but I don't think that's a big issue at this
stage (and it seems bindless is causing problems for brush/alias
rendering via renderdoc and on my versa pro). However, I have figured
out how to make effective use of descriptor sets (finally :P).
The actual normal still needs checking, but the sprites are currently
unlit so not an issue at this stage.
I'm not at all sure what I was thinking when I designed it, but I
certainly designed it wrong (to the point of being fairly useless). It
turns out memory requirements are already aligned in size (so just
multiplying is fine), and what I really wanted was to get the next
offset aligned to the given requirements.
This adds the shaders and the pipeline specs. I'm not sure that the
deferred rendering side of the render pass is appropriate, but I thought
I'd give it a go, since quake sprites are really cutoff rather than
translucent.
With the switch to multi-layer textures for brush models, the bsp and
alias texture descriptor sets became identical and thus the definitions
shareable. However, due to complications I don't want to address yet,
they're still separately identified, but I should be able to use the
texture set for most, if not all, pipelines.
I'd forgotten (when doing the original brush texture loader) that
turbulent surfaces were unlit and thus always full-bright, then never
wrote the turb shader to take care of it. The best solution seems to be
to just mix the two colors in the shader as it will allow turb surfaces
to be lit in the future (probably with severely limited light counts due
to being a forward renderer).
This gets the alias pipeline in line with the bsp pipeline, and thus
everything is about as functional as it was before the rework (minus
dealing with large texture sets).
I guess it's not quite bindless as the texture index is a push constant,
but it seems to work well (and I may have fixed some full-bright issues
by accident, though I suspect that's just my imagination, but they do
look good).
This should fix the horrid frame rate dependent behavior of the view
model.
They are also in their own descriptor set so they can be easily shared
between pipelines. This has been verified to work for Draw.
BSP textures are now two-layered with the albedo and emission in the two
layers rather than two separate images. While this does increase memory
usage for the textures themselves (most do not have fullbright pixels),
it cuts down on image and image view handles (and shader resources).
Smashing everything in the process :P (need to work on the C side).
However, while bindless is supposedly good for performance, the biggest
gain this will bring is portability: the texture counts are
automatically limited to what the hardware can handle, and the reliance
on push descriptors is removed (though they were nice and did help get
things up and running).
I had forgotten that the parameters are in reverse order, and even if I
had remembered, I forgot to reset offset before the second loop.
Pre-decrementing offset takes care of both issues at once.
My VersaPro doesn't support more than 32 per-stage samplers (lavapipe).
This is a small part of getting Vulkan to run on lavapipe and even in
itself is rather incomplete.
Fixes the warning about parse_fixed_array not being used (oops, the
problem with partial commits), but more importantly, gives access to
things like maxDescriptorSetSamplers.
This will make property list expressions easier to work with. The
library is rather limited right now (trig, dot, min/max/bound) but even
just min adds a lot of functionality.
I want to support reading VkPhysicalDeviceLimits but it has some arrays.
While I don't need to parse them (VkPhysicalDeviceLimits should be
treated as read-only), I do need to be able to access them in property
list expressions, and vkgen generates the cexpr type descriptors too.
However, I will probably want to parse arrays some time in the future.
This ensures that unused parser blocks do not get emitted. In the
testing of the upcoming support for fixed arrays, the blend color
constants were being double emitted (both as custom and normal parser)
due to being an array. gcc did not like that (what with all those
warning flags).
Multiple render passes are needed for supporting shadow mapping, and
this is a huge step towards breaking the Vulkan render free of Quake,
and hopefully will lead the way for breaking the GL renderers free as
well.
This is actually a better solution to the renderer directly accessing
client code than provided by 7e078c7f9c.
Essentially, V_RenderView should not have been calling R_RenderView, and
CL_UpdateScreen should have been calling V_RenderView directly. The
issue was that the renderers expected the world entity model to be valid
at all times. Now, R_RenderView checks the world entity model's validity
and immediately bails if it is not, and R_ClearState (which is called
whenever the client disconnects and thus no longer has a world to
render) clears the world entity model. Thus R_RenderView can (and is)
now called unconditionally from within the renderer, simplifying
renderer-specific variants.
While using binary data objects for specialization data works for bools
(as they can be 0 or -1), they don't work so well for numeric values due
to having to get the byte order correct and thus are not portable, and
difficult to get right.
Binary data is still supported, but the data can be written as a string
with an array(...) "constructor" expression taking any number of
parameters, with each parameter itself being an expression (though
values are limited at this stage).
Due to the plist format, quotes are required around the expression
("array(...)")
Sets never shrink, so assigning a dynamically created set to a
statically created set after the working size has reduced (going from
demo2 to demo3) causes the set code to attempt to resize the statically
created set, which leads to libc having a bad time.
Why nvidia's drivers accepted double-destroyed framebuffers is beyond
me, but this fixes the Intel drivers complaining about such (and the
subsequent segfault).
When I changed the matrices from an array of floats to an array of
vec4f_t, I forgot to update the flush offsets. Yay for having a
Vulkan-capable Intel device with its different alignment requirements.
When allocating memory for multiple objects that have alignment
requirements, it gets tedious keeping track of the offset and the
alignment. This is a simple function for walking the offset respecting
size and alignment requirements, and doubles as a size calculator.
The stack is arbitrary strings that the validation layer debug callback
prints in reverse order after each message. This makes it easy to work
out what nodes in a pipeline/render pass plist are causing validation
errors. Still have to narrow down the actual line, but the messages seem
to help with that.
Putting qfvPushDebug/qfvPopDebug around other calls to vulkan should
help out a lot, tool.
As a bonus, the stack is printed before debug_breakpoint is called, so
it's immediately visible in gdb.
Rather than just 0/1, it now acts as flags to control what messages are
printed. In addition to the Vulkan enum names (long and short), none and
all are supported (as well as raw numbers, but they're not checked for
validity). This makes vulkan_use_validation a bit easier to use and less
verbose by default.
Now, if only it was easier to remember the name :P
Well... it could be done better, but this works for now assuming it's in
/usr/include (and it's correct for mxe builts). Does need proper
autoconfiscation, though.
The fact that numleafs did not include leaf 0 actually caused in many
places due to never being sure whether to add 1. Hopefully this fixes
some of the confusion. (and that comment in sv_init didn't last long :P)
Modern maps can have many more leafs (eg, ad_tears has 98983 leafs).
Using set_t makes dynamic leaf counts easy to support and the code much
easier to read (though set_is_member and the iterators are a little
slower). The main thing to watch out for is the novis set and the set
returned by Mod_LeafPVS never shrink, and may have excess elements (ie,
indicate that nonexistent leafs are visible).
-999999 seems to be a hold-over from the software renderer passed
through both gl renderers. I guess it didn't matter in the gl renderers
due to various draw hacks, but it made quite a difference in vulkan.
Fixes the view model covering the hud.
Quake just looked wrong without the view model. I can't say I like the
way the depth range is hacked, but it was necessary because the view
model needs to be processed along with the rest of the alias models
(didn't feel like adding more command buffers, which I imagine would be
expensive with the pipeline switching).
Without shadows, this is quite the cheat, but noclip is a cheat anyway,
so probably not that big a deal. It does, however, make noclip usable
for debugging.
Since vulkan supports 32-bit indexes, there's no need for the
shenanigans the EGL-based glsl renderer had to go through to render bsp
models (maps often had quite a bit more than 65536 vertices), though the
reduced GPU memory requirements of 16-bit indices does have its
advantages.
Any sun (a directional light) is in the outside node, which due to not
having its own PVS data is visible to all nodes, but that's a tad
excessive. However, any leaf node with sky surfaces will potentially see
any suns, and leaf nodes with no sky surfaces will see the sun only if
they can see a leaf that does have sky surfaces. This can be quite
expensive to calculate (already known to be moderately expensive for
just the camera leaf node (singular!) when checking for in-map lights)