I suspect this is a hold-over from before the bsp thread safety changes,
but with the nicely separated queues, it's easy to pass the sky surfaces
through the depth pass as well as the translucency pass (I think the
reason for that is lighting). This prevents bits of world being seen
through sky surfaces when the sky isn't fully opaque (like skysheet due
to the shortcuts in the shader).
Partial because frame buffer creation isn't handled yet (using six
layers), but using layer a layer capable view and shaders doesn't cause
problems (other than maybe slightly slower code).
It turns out that my laptop doesn't do multiview properly (or I've
misconfigured something, later), but the biggest issue I had on my
desktop seems to be that I had the push constants wrong: fov in aspect,
time in fov, and I had degrees instead of radians (half angle) anyway.
There are some missing parts from this commit as these are the fairly
clean changes. Missing is building a separate set of pipelines for the
new render pass (might be able to get away from that), OIT heads texture
is flat rather than an array, view matrices aren't set up, and the
fisheye renderer isn't hooked up to the output pass (code exists but is
messy). However, with the missing parts included, testing shows things
mostly working: the cube map is rendered correctly even though it's not
displayed correctly (incorrect view). This has definitely proven to be a
good test for Vulkan's multiview feature (very nice).
Some of the queues start don't get fully initialized, but rather than go
through everything making sure they do, it's just easier to zero the
whole lot at the beginning.
The flashing pink around the Q menu cursor was caused by vulkan command
buffer writes and draw queue population being out of phase, which was
fixed by the recent screen update changes (specifically,
42441e87d4).
Rather important for debugging 2d stuff (draw's lines are 2d-only).
Other than translucent console, this gets the vulkan draw api back to
full operation.
This needed either more font ids to be supported, or small lump pics (up
to 32 x 32) to be loaded into the atlas. I went with both. The menus
don't use Draw_TextBox, but quakeworld's netgraph does.
This makes use of slice rendering to achieve the effective scaling, but
the slice data is created only when needed so pics that never use slices
don't waste 16 vertices.
When a pic needs dynamic vertices (eg, for sub-pics), a descriptor set
is allocated and updated if one has not been created for the pic. This
is done each frame: the descriptor sets are recycled (there currently is
rarely a need for more than a small handful of dynamic descriptors, so
64 should be plenty for now).
Unfortunately, due to the order of operations issue between draw items
getting queued and submitting commands to vulkan (the cause of the pics
not rendering correctly per 8fff71ed4b),
the validation layers complain (correctly) about the command buffers
being executed with updated descriptor sets. Getting the canvas system
up and running will fix that.
The pic is scaled to fill the specified rect (then clipped to the
screen (effectively)). Done just for the console background for now, but
it will be used for slice-pics as well.
Not implemented for vulkan yet as I'm still thinking about the
descriptor management needed for the instanced rendering.
Making the conback rendering conditional gave an approximately 3% speed
boost to glsl with the GL stub (~12200fps to ~12550fps), for either
conback render method.
They are usually larger images (eg, the main menu graphic) and thus make
a mess of the atlas (thus, making them separate means a smaller atlas
can be used). All sorts of things are in a mess (descriptor management,
subpic rendering not supported, wrong alpha value for the transparent
pixel), but this gets the basic loader going.
This just takes advantage of the dynamic verts for doing subpics. It's
not really the most optimal code as it has to write both the vertices
(64 bytes per quad) and the instances (24 bytes per quad), but that's
still better than the old 128 bytes per quad (and having a single
pipeline is nice).
The problem was that I had mixed up the purpose of the per-frame vertex
buffers and used them for the core quad data when they were meant for
subpic and the like, and forgotten about the static vertex buffer.
This gets at least conchars working (pic in general not tested yet).
Any performance gains will be utterly swamped by the deferred renderer,
but it will allow better control of quad render order by any client
code (and should be slightly better for simpler renderers when I get
support for them working).
Right now, plenty is broken (much of the higher level draw functions are
disabled, and pics don't render correctly), but this gets at least the
basics in so I'm not bouncing diffs around as much.
It turns out the slice pipeline is compatible with the glyph pipeline in
that its vertex attribute data is a superset (just the addition of the
offset attributes). While the queues have yet to be merged, this will
eventually get glyphs, sliced sprites, and general (static) quads into
the one pipeline. Although this is slightly slower for glyph rendering
(due to the need to pass an extra 8 bytes per glyph), this should be
faster for quad rendering (when done) as it will be 24 bytes per quad
instead of 32 bytes per vertex (ie, 128 bytes per quad), but this does
serve as a proof of concept for doing quads, glyphs and sprites in the
one pipeline.
The main reason I had created in the first place was I hadn't thought of
using image view swizzles to handle coverage-alpha textures (for
monochrome glyphs), and for whatever reason also had the texture in a
different binding slot to the twod fragment shader. With both issues out
of the way, there's no reason to have an almost identical (just some
naming) shader just for glyphs.
With an eye towards merging the 2d pipelines as much as possible, I
found that the glyph and basic 2d quad texture descriptors were in
different slots for no reason I can think of. Having them in the same
slot would mean I could use the same fragment shader for all 2d
pipelines (though the plan is to get it down to two: (sliced) quads and
lines).
I hadn't noticed the problem until playing with early fragment tests for
the sprite fragment shaders, but passing data that expects triangle
strips to a pipeline that expects triangle lists doesn't work too well
when drawing quads.
While Draw_Glyph does draw only one glyph at a time, it doesn't shape
the text every time, so is a major win for performance (especially
coupled with pre-shaped text).
And add a function to process a passage into a set of views with glyphs.
The views can be flowed: they have flow gravity and their sizes set to
contain all the glyphs within each view (nominally, words). Nothing is
tested yet, and font rendering is currently broken completely.
Font and text handling is very much part of user interface and at least
partially independent of rendering, but does fit it better with GUI than
genera UI (ie, both graphics and text mode), thus libQFgui as well as
libQFui are built in the ui directory.
The existing font related builtins have been moved into the ruamoko
client library.
While it doesn't really make any difference to the texture upload (8-bit
is 8-bit), and the sampler is in control of the interpretation, this
makes vulkan more consistent with the specification of the glyph
texture.
Thanks to the 3d frame buffer output being separate from the swap chain,
it's possible to have a different frame buffer size from the window
size, allowing for a smaller buffer and thus my laptop can cope (mostly)
with the vulkan renderer.
The escape was actually harmless as the buffers would not be read due to
the particle count being 0 (thus why the buffers were at the end of the
staging buffer: no space was allocated for them, only for the system
buffer, but their offsets were just past the system buffer). However,
the validation layers quite rightly did not like that. Thus, the two
buffers are pointed to the system buffer so all three descriptors are
always valid.
I had debated putting the blending in the compose subpass or a separate
pass but went with the separate pass originally, but it turns out that
removing the separate pass gains 1-3% (5-15/545 fps in a timedemo of
demo1).
It's a bit flaky for particles, especially at higher frame rates, but
that's due to supporting only 64 overlapping pixels. A reasonable
solution is probably switching to a priority heap for the "sort" and
upping the limit.
This required making the texture set accessible to the vertex shader
(instead of using a dedicated palette set), which I don't particularly
like, but I don't feel like dealing with the texture code's hard-coded
use of the texture set. QF style particles need something mostly for the
smoke puffs as they expect a texture.
It doesn't want to work on my nvidia (or more recent sid?) and doesn't
seem to be necessary. The problem may be multiple event sets before the
first wait, but investigation can wait for now.
This is probably the biggest reason I had problems with particles not
updating correctly: the descriptors were generally point pointing to
where the data actually was in the staging buffer.
I don't yet know whether they actually work (not rendering yet), but the
system isn't locking up, and shutdown is clean, so at least resources
are handled correctly.
Although it works as intended (tested via hacking), it's not hooked up
as the current frame buffer handling in r_screen is not readily
compatible with how vulkan output is handled. This will need some
thought to get working.
This splits up render pass creation so that the creation of the various
resources can be tailored to the needs of the actual render pass
sub-system. In addition, it gets window resizing mostly working (just
some problems with incorrect rendering).
It turns out the semaphore used for vkAcquireNextImageKHR may be left in
a signaled state for VK_ERROR_OUT_OF_DATE_KHR. While it seems to be
possible to clear the semaphore using an empty queue submission,
destroying and recreating the semaphore works well.
Still have problems with the frame buffer after window resize, though.
Swap chain acquisition is part of final output handling. However, as the
correct frame buffers are required for the render passes, the
acquisition needs to be performed during the preoutput render pass.
Window resize is still broken, but this is a big step towards fixing it.
This is the minimum maximum count for sampled images, and with layered
shadow maps (with a minimum of 2048 layers supported), that's really way
more than enough.
I guess nvidia gives a non-srgb format as the first in the list, but my
laptop gives an srgb format first, thus the unexpected difference in
rendering brightness. Hard-coding BGRA isn't any better, but it will do
for now.
Things are a bit of a mess with interdependence between sub-module
initialization and render pass initialization, and window resizing is
broken, but the main render pass rendering to an image that is then
post-processed (currently just blitted) is working. This will make it
possible to implement fisheye and water warp (and other effects, of
course).
When working, this will handle the output to the swap-chain images and
any final post-processing effects (gamma correction, screen scaling,
etc). However, currently the screen is just black because the image
for getting the main render pass output isn't hooked up yet.
Now each (high level) render pass can have its own frame buffer. The
current goal is to get the final output render pass to just transfer the
composed output to the swap chain image, potentially with scaling (my
laptop might be able to cope).
While the HUD and status bar don't cut out a lot of screen (normally),
they might start to make a difference when I get transparency working
properly. The main thing is this is a step towards pulling the 2d
rendering into another render pass so the main deferred pass is
world-only.
Using swizzles in an image view allows the same shader to be used with
different image "types" (eg, color vs coverage).
Of course, this needed to abandon QFV_CreateImageView, but that is
likely for the best.
It turns out that nearest filtering doesn't need any offsets to avoid
texel leaks so long as the screen isn't also offset. With this, the 2d
rendering looks good at any scale (minus the inherent blockiness).
There's no API yet as I need to look into the handling of qpic_t before
I can get any of this into the other renderers (or even vulkan, for that
matter).
However, the current design for slice rendering is based on glyphs (ie,
using instances and vertex pulling), with 3 strips of 3 quads, 16 verts,
and 26 indices (2 reset). Hacky testing seems to work, but real tests
need the API.
It's currently used only by the vulkan renderer, as it's the only
renderer that can make good use of it for alias models, but now vulkan
show shirt/pants colors (finally).
This cuts down on the memory requirements for skins by 25%, and
simplifies the shader a bit more, too. While at it, I made alias skins
nominally compatible with bsp textures: layer 0 is color, 1 is emissive,
and 2 is the color map (emissive was on 3).
As the RGB curves for many of the color rows are not linearly related,
my idea of scaling the brightest color in the row just didn't work.
Using a masked palette lookup works much better as it allows any curves.
Also, because the palette is uploaded as a grid and the coordinates are
calculated on the CPU, the system is extendable beyond 8-bit palettes.
This isn't quite complete as the top and bottom colors are still in
separate layers but their indices and masks can fit in just one, but
this requires reworking the texture setup (for another commit).
For whatever reason, I had added an extra 4 bytes to the fragment
shader's push-constants. It took me a while to figure out why renderdoc
wouldn't stop complaining about me not writing enough data.
It turns out my approach to alias skin coloring just doesn't work for
the quake data due to the color curves not having a linear relationship,
especially the bottom colors.
It works on only one layer and one mip, and assumes the provided texture
data is compatible with the image, but does support sub-image updates
(x, y location as parameters, width and height in the texture data).
The bright end of the color map is actually twice the palette value, but
I didn't understand this when I came up with the shirt/pants color
scheme for vulkan. However, the skin texture can store only 0..1, so the
mapping to 0..2 needs to be done in the shader. It looks like it works
at least better: the gold key at the end of demo1 doesn't look as bleh,
though I do get some weird colors still on ogres etc.
Currently only for gl/glsl/vulkan. However, rather than futzing with
con_width and con_height (and trying to guess good values), con_scale
(currently an integer) gives consistent pixel scaling regardless of
window size.
The resource functions assume the requested layers is correct (really,
the lighting code assumes that the resource functions assume such), but
QFV_CreateImage multiplies the layer count by 6 for cube maps (really,
the issue is in QFV_CreateImage, but I want to move away from it
anyway).
The check for the entity being the view model was checking only the
view model id, which is not sufficient when the view model is invalid by
never being set to other than 0s. A better system for dealing with the
view model is needed.
Another step towards moving all resource creation into the one place.
The motivation for doing the change was getting my test scene to work
with only ambient lights or no lights at all.
While the libraries are probably getting a little out of hand, the
separation into its own directory is probably a good thing as an ECS
should not be tied to scenes. This should make the ECS more generally
useful.
Since entity_t has a pointer to the registry owning the entity, there's
no need to access a global to get at the registry. Also move component
getting closer to where it's used.
It was being set to -1 unconditionally due to forgetting to use id.
However, I decided I didn't like reusing the id var and did some
renaming while I was at it.
This puts the hierarchy (transform) reference, animation, visibility,
renderer, active, and old_origin data in separate components. There are
a few bugs (crashes on grenade explosions in gl/glsl/vulkan, immediately
in sw, reasons known, missing brush models in vulkan).
While quake doesn't really need an ECS, the direction I want to take QF
does, and it does seem to have improved memory bandwidth a little
(uncertain). However, there's a lot more work to go (especially fixing
the above bugs), but this seems to be a good start.
Not that anything is actually rendered yet, but the validation layers
don't like the null render pass. Came up now because ctf1 seems to make
the first light an ambient light.
It didn't really add anything of value as the glyph bitmap rects and the
bearings were never used together, and the rest of the fields were
entirely redundant. A small step towards using a component system for
text.
The inconsistencies in clang's handling of casts was bad enough, and its
silliness with certain extensions, but discovering that it doesn't
support raw strings was just too much. Yes, it gives a 3s boost to qfvis
on gmsp3v2.bsp, but it's not worth the hassle.
While the base of a memory object was aligned when calculating the
memory block size, the top end was not, which could result in the memory
block not getting enough bytes allocated to satisfy alignment
requirements (eg, for flushing).
While fixing that, I noticed the offsets of objects were not being
aligned when binding, so that is fixed as well.
Fixes Mr Fixit on my VersaPro.
Currently, only 16 fonts can be loaded (I need to sort out descriptor
set pools), but glyphs are grouped into batches of the same font. While
not quite optimal as it can result in bouncing between descriptor sets a
lot, it's still reasonably efficient.
Line rendering now has its own pipeline (removing the texture issue).
Glyph rendering (for fonts) has been reworked to use instanced quad
rendering, with the geometry (position and texture coords) in a static
buffer (uniform texture buffer), and each instance has a glyph index,
color, and 2d base position.
Multiple fonts can be loaded, but aren't used yet: still just the one
(work needs to be done on the queues to support multiple
textures/fonts).
Quads haven't changed much, but buffer creation and destruction has been
cleaned up to use the resource functions.
Its value on input is ignored. QFV_CreateResource writes the resource
object's offset relative to the beginning of the shared memory block.
Needed for the Draw overhaul.
I got tired of writing the same 13 or so lines of code over and over (it
actually put me off experimenting with Vulkan). Thus...
QFV_PacketCopyBuffer does the work of handling barriers and a (full
packet) copy from the staging buffer to a GPU buffer.
QFV_PacketCopyImage does a similar job, but for images. However, it
still needs a lot of work, but it does make getting a basic texture onto
the GPU much less of a hassle.
Both functions should make staging data much less error-prone.
This moves the qfv_resobj_t image initialization code from the IQM
loader into the resource management code. This will allow me to reuse
the code for setting up glyph data. As a bonus, it cleans up the IQM
code nicely.
UVs being 0 meant that lines were picking up the upper left pixel of
char 0 of conchars. With quake data, this meant a transparent pixel.
Fixes invisible debug lines :P.
It turns out that using the swapchain image for the size requirements is
unreliable: when running under renderdoc, vkGetImageMemoryRequirements
sets the memory requirements fields to 0, leading eventually to a null
memory object being passed to vkMapMemory, which does not end well.
I had missed that vkCmdCopyImage requires the source and destination
images to have exactly the same size, and I guess assumed that the
swapchain images would always be the size they said they were, but this
is not the case for tiled-optimal images. However,
vkCmdCopyImageToBuffer does the right thing regardless of the source
image size.
This fixes the skewed screenshots when the window size is not a multiple
of 8 (for me, might differ for others).
There's a problem with screenshot capture in that the image is sheared
after window resize, but the screen view looks good, and vulkan is happy
with the state changes.
As gbuf_base derives from the base pipeline, it inherits base's dynamic
setting, and thus doesn't need its own. I had a FIXME there as I wasn't
sure why I had a redundant setting, but I really can't see why I'd want
it different from any of the other main renderpass pipelines.
I've found and mostly isolated the parts of the code that will be
affected by window resizing, minus pipelines but they use dynamic
viewport and scissor settings and thus shouldn't be affected so long as
the swapchain format doesn't change (how does that happen?)
As of a recent nvidia driver update, it became necessary to enable the
feature. I guess older drivers (or vulkan validation layers?) were a bit
slack in their checking (or perhaps I didn't actually get those lighting
changes working at the time despite having committed them).
This did involve changing some field names and a little bit of cleanup,
but I've got a better handle on what's going on (I think I was in one of
those coding trances where I quickly forget how things work).
This makes bsp traversal even more re-entrant (needed for shadows).
Everything needed for a "pass" is taken from bsp_pass_t (or indirectly
through bspctx_t (though that too might need some revising)).
Ambient lights are represented by a point at infinity and a zero
direction vector (spherical lights have a non-zero direction vector but
the cone angle is 360 degrees). This fixes what appeared to be mangled
light renderers (it was actually just an ambient light being treated as
a directional light (point at infinity, but non-zero direction vector).
There are some issues with the light renderers getting mangled, and only
the first light is even touched (just begin and end of render pass), but
this gets a lot of the framework into place.
Sounds odd, but it's part of the problem with calling two different
things with essentially the same name. The "high level" render pass in
question may be a compute pass, or a complex series of (Vulkan) render
passes and so won't create a Vulkan render pass for the "high level"
render pass (I do need to come up with a better name for it).
I really don't remember why I made it separate, though it may have been
to do with r_ent_queue. However, putting it together with the rest is
needed for the "render pass" rework.
It now lives in vulkan_renderpass.c and takes most of its parameters
from plist configs (just the name (which is used to find the config),
output spec, and draw function from C). Even the debug colors and names
are taken from the config.
QFV_CreateRenderPass is no longer used, and QFV_CreateFramebuffer hasn't
been used for a long time. The C file is still there for now but is
basically empty.
The real reason for the delay in implementing support for pNext is I
didn't know how to approach it at the time, but with the experience I've
gained using and modifying vkparse, the solution turned out to be fairly
simple. This allows for the use of various extensions (eg, multiview,
which was used for testing, though none of the hookup is in this
commit). No checking is done on the struct type being valid other than
it must be of a chainable type (ie, have its own pNext).
The software renderer uses Bresenham's line slice algorithm as presented
by Michael Abrash in his Graphics Programming Black Book Special Edition
with the serial numbers filed off (as such, more just so *I* can read
the code easily), along with the Chen-Sutherland line clipping
algorithm. The other renderers were more or less trivial in comparison.
Enabled by 'developer lighting'. It was good for confirming that the
lights in ad_e1m1 (Doom Hangar 16) were actually being output (over 600
of them sometimes, ouch). Turned out to be the color scale ambiguity.
Surfaces marked with SURF_DRAWALPHA but not SURF_DRAWTURB are put in a
separate queue for the water shader and run with a turb scale of 0.
Also, entities with colormod alpha < 1 are marked to go in the same
queue as SURF_DRAWALPHA surfaces (ie, no SURF_DRAWTURB unless the
model's texture indicated such).
This breaks console scaling for now (con_width and con_height are gone),
but is a major step towards window resize support as console stuff
should never have been in viddef_t in the first place.
The client screen init code now sets up a screen view (actually the
renderer's scr_view) that is passed to the client console so it can know
the size of the screen. The same view is used by the status bar code.
Also, the ram/cache/paused icon drawing is moved into the client screen
update code. A bit of duplication, but I do plan on merging that
eventually.
This is intended for the built-in 8x8 bitmap characters and quake's
"conchars", but could potentially be used for any simple (non-composed
characters) mono-spaced font. Currently, the buffers can be created,
destroyed, cleared, scrolled vertically in either direction, and
rendered to the screen in a single blast.
One of the reasons for creating the buffer is to make it so scaling can
be supported in the sw renderer.
While this does pull the grovelling for the subpic out to the callers,
the real problem is the excessive use of qpic_t in the internal code:
qpic_t is really just the image format in wad files, and shouldn't be
used as a generic image handle.
Cleans up more of the icky code in the font drawing functions.
This makes working with quads, implied alpha quads, and lines much
cleaner (and gets rid of the bulk of the "eww" fixme), and will probably
make it easier to support multiple scraps and fonts, and potentially
more flexible ordering between pipelines.
This means that QF should support more exotic fonts without any issue
(once the rest of the text handling system is up to snuff) as HarfBuzz
does all the hard work of handling OpenType, Graphite, etc text shaping,
including kerning (when enabled).
Also, font loading now loads all the glyphs into the atlas (preload is
gone).
It is currently an ugly hack for dealing with the separate quad queue,
and the pipeline handling code needs a lot of cleanup, but it works
quite well, though I do plan on moving to HarfBuzz for text shaping. One
nice development is I got updating of descriptor sets working (just need
to ensure the set is no longer in use by the command queue, which the
multiple frames in flight makes easy).
It's implemented only in the Vulkan renderer, partly because there's a
lot of experimenting going on with it, but the glyphs do get transferred
to the GPU (checked in render doc). No rendering is done yet: still
thinking about whether to do a quick-and-dirty test, or to add HarfBuzz
immediately, and the design surrounding that.
R8G8B8A8 was hard-coded by accident when creating Vulkan_LoadTexArray
(or probably even the original Vulkan_LoadTex). This wasn't a problem
while everything was loaded in that format, but attempting to load an R8
texture didn't go so well. The same format as the image itself is used
now (correctly so).
I have recently learned that pre-multiplied alpha is the correct way to
do compositing, which is pretty much what the 2d pass does (actually,
all passes, but...). This required ensuring the color factor passed to
the fragment shader is pre-multiplied (a little silly for cshifts as
they used to be pre-multiplied but were un-pre-multiplied early in QF's
history and I don't feel like fixing that right now as it affects all
renderers), and also pre-multiplying alpha when converting from 8-bit
palette to rgba as the palette entry for transparent has that funky pink
(which is used in full-brights).
The software renderer uses Bresenham's line slice algorithm as presented
by Michael Abrash in his Graphics Programming Black Book Special Edition
with the serial numbers filed off (as such, more just so *I* can read
the code easily), along with the Chen-Sutherland line clipping
algorithm. The other renderers were more or less trivial in comparison.
Most were pretty easy and fairly logical, but gib's regex was a bit of a
pain until I figured out the real problem was the conditional
assignments.
However, libs/gamecode/test/test-conv4 fails when optimizing due to gcc
using vcvttps2dq (which is nice, actually) for vector forms, but not the
single equivalent other times. I haven't decided what to do with the
test (I might abandon it as it does seem to be UD).
The texture animation data is compacted into a small struct for each
texture, resulting in much less data access when animating the texture.
More importantly, no looping over the list of frames. I plan on
migrating this to at least the other hardware renderers.
I found a test map with no texture data. Even after fixing the bsp
loader, vulkan didn't like it. Now vulkan is happy. The "Missing" text
is full-bright magenta on a dim grey background so it should be visible
in any lighting conditions.
Conflagrant Rodent has a sub-model with 0 faces (double bit error?)
causing simply counting faces to get out of sync with actual model
starts thus breaking *all* brush models that come after it (including
other maps). Thus be a little less lazy in figuring out model start
faces.
The models are broken up into N sub-(sub-)models, one for each texture,
but all faces using the same texture are drawn as an instance, making
for both reduced draw calls and reduced index buffer use (and thus,
hopefully, reduced bandwidth). While texture animations are broken, this
does mark a significant milestone towards implementing shadows as it
should now be possible to use multiple threads (with multiple index and
entid buffers) to render the depth buffers for all the lights.
This allows the use of an entity id to index into the entity data and
fetch the transform and colormod data in the vertex shader, thus making
instanced rendering possible. Non-world brush entities are still not
rendered, but the world entity is using both the entity data buffer and
entid buffer.
Sub-models and instance models need an instance data buffer, but this
gets the basics working (and the proof of concept). Using arrays like
this actually simplified a lot of the code, and will make it easy to get
transparency without turbulence (just another queue).
One more step towards BSP thread-safety. This one brought with it a very
noticeable speed boost (ie, not lost in the noise) thanks to the face
visframes being in tightly packed groups instead of 128 bytes apart,
though the sw render's boost is lost in the noise (but it's very
fill-rate limited).
This is next critical step to making BSP rendering thread-safe.
visframe was replaced with cluster (not used yet) in anticipation of BSP
cluster reconstruction (which will be necessary for dealing with large
maps like ad_tears).
The main goal was to get visframe out of mnode_t to make it thread-safe
(each thread can have its own visframe array), but moving the plane info
into mnode_t made for better data access patters when traversing the bsp
tree as the plane is right there with the child indices. Nicely, the
size of mnode_t is the same as before (64 bytes due to alignment), with
4 bytes wasted.
Performance-wise, there seems to be very little difference. Maybe
slightly slower.
The unfortunate thing about the change is the plane distance is negated,
possibly leading to some confusion, particularly since the box and
sphere culling functions were affected. However, this is so point-plane
distance calculations can be done with a single 4d dot product.
The map uses 41% of a 4k light map scrap, and 512 texture descriptors
wasn't enough for vulkan. Ouch. I do need to get cvars on these things,
but this will do for now (decades later...)
This was one of the biggest reasons I had trouble understanding the bsp
display list code, but it turns out it was for dealing with GLES's
16-bit limit on vertex indices. Since vulkan uses 32-bit indices,
there's no need for the extra layer of indirection. I'm pretty sure it
was that lack of understanding that prevented me from removing it when I
first converted the glsl bsp code to vulkan (ie, that 16-bit indices
were the only reason for elements_t).
It's hard to tell whether the change makes much difference to
performance, though it seems it might (noisy stats even over 50 timedemo
loops) and the better data localization indicate it should at least be
just as good if not better. However, the reason for the change is
simplifying the data structures so I can make bsp rendering thread-safe
in preparation for rendering shadow maps.
And maybe a nano-optimization. Switching from (~side + 1) to (-side)
seems to give glsl a very tiny speed boost, but certainly doesn't hurt.
Looking at some assembly output for the three cases, the two hacks seem
to generate the same code as each other, but 3 instructions vs 6 for ?:.
While ?: is more generically robust, the hacks are tuned for the
knowledge side is either 0 or 1. The later xor might alter things, but
at least I now know that the hack (either version) is worthwhile.
With experience, I have found that trying to continue after a validation
error tends to result in a segfault or some other nastiness, and
Sys_Shutdown (and the full shutdown sequence) is triggered for any error
signal (segfault, abort, etc) so just exit(1).
Some very much needed comments :P Still, nicely, I now have a much
better understanding of how the display lists are created (10 years
is a long time to remember how intricate code works (I do remember
fighting to get it working back then))
Many modders use negative lights for interesting effects, but vulkan
doesn't like the result of a negative int treated as unsigned when it
comes to texture sizes.
However, this time it doesn't modify the light array when it sorts the
lights by size since the lights are now located before the renderer gets
to see them, and having the fix up the light leafs array would be too
painful (and probably the completely wrong thing to do anyway: the light
array should be treated as constant by the renderer). 1.6GB of memory
for gmsp3v2's lights (a little better than marcher: more smaller lights?).
For reference:
gmsp3v2: shadow maps: 8330 layers in 29 images: 1647706112
marcher: shadow maps: 2440 layers in 11 images: 2358575104
For now, at least (I have some ideas to possibly reduce the numbers and
also to avoid the need for actual limits). I've seen gmsp3v2 use over
500 lights at once (it has over 1300), and I spent too long figuring out
that weird light behavior was due to the limit being hit and lights
getting dropped (and even longer figuring out that more weird behavior
was due to the lack of shadows and the world being too bright in the
first place).
Since the staging buffer allocates the command buffers it uses, it
needs to free them when it is freed. I think I was confused by the
validation layers not complaining about unfreed buffers when shutting
down, but that's because destroying the pool (during program shutdown,
when the validation layers would complain) frees all the buffers. Thus,
due to staging buffers being created and destroyed during the level load
process, (rather large) command buffers were piling up like imps in a
Doom level.
In the process, it was necessary to rearrange some of the shutdown code
because vulkan_vid_render_shutdown destroys the shared command pool, but
the pool is required for freeing the command buffers, but there was a
minor mess of long-lived staging buffers being freed afterwards. That
didn't end particularly well.
Unfortunately, the animations are pre-baked (by the loader) blocking
run-time determined animations (IK etc). However, this at least gets
everything working so the basics can be verified (the shader posed some
issue resulting in horror movies ;).
Brush models looked a little too tricky due to the very different style
of command queue, so that's left for now, but alias, iqm and sprite
entities are now labeled. The labels are made up of the lower 5 hex
digits of the entity address, the position, and colored by the
normalized position vector. Not sure that's the best choice as it does
mean the color changes as the entity moves, and can be quite subtle
between nearby entities, but it still helps identify the entities in the
command buffer.
And, as I suspected, I've got multiple draw calls for the one ogre. Now
to find out why.
The bones aren't animated yet (and I realized I made the mistake of
thinking the bone buffer was per-model when it's really per-instance (I
think this mistake is in the rest of QF, too)), skin rendering is a
mess, need to default vertex attributes that aren't in the model...
Still, it's quite satisfying seeing Mr Fixit on screen again :)
I wound up moving the pipeline spec in with the rest of the pipelines as
the system isn't really ready for separating them.
The plists can now be accessed by name and the forward render pass
config is available (but not used, or tested beyond syntax). I was going
to have the IQM pipeline spec separate but ran into limitations in the
system (which needs a lot of polish, really).
That @inherit is pretty useful :) This makes it much easier to see how
different pipelines differ or how they are the similar. It also makes it
much clearer which sub-pass they're for.
I was wondering why scaled-down quake-guy was dimmer than full-size
quake-guy. And the per-fragment normalization gives the illusion of
smoothness if you don't look at his legs (and even then...).