The inconsistencies in clang's handling of casts was bad enough, and its
silliness with certain extensions, but discovering that it doesn't
support raw strings was just too much. Yes, it gives a 3s boost to qfvis
on gmsp3v2.bsp, but it's not worth the hassle.
While the base of a memory object was aligned when calculating the
memory block size, the top end was not, which could result in the memory
block not getting enough bytes allocated to satisfy alignment
requirements (eg, for flushing).
While fixing that, I noticed the offsets of objects were not being
aligned when binding, so that is fixed as well.
Fixes Mr Fixit on my VersaPro.
With the improved atlas allocation, 2x is no longer needed and 1.2x
seems to be sufficient. Most importantly, it reduced the texture for
amiri-regular.ttf at 72 pix height from 8x to 4x (the staging buffer
isn't big enough for 8k textures).
Currently, only 16 fonts can be loaded (I need to sort out descriptor
set pools), but glyphs are grouped into batches of the same font. While
not quite optimal as it can result in bouncing between descriptor sets a
lot, it's still reasonably efficient.
Line rendering now has its own pipeline (removing the texture issue).
Glyph rendering (for fonts) has been reworked to use instanced quad
rendering, with the geometry (position and texture coords) in a static
buffer (uniform texture buffer), and each instance has a glyph index,
color, and 2d base position.
Multiple fonts can be loaded, but aren't used yet: still just the one
(work needs to be done on the queues to support multiple
textures/fonts).
Quads haven't changed much, but buffer creation and destruction has been
cleaned up to use the resource functions.
Its value on input is ignored. QFV_CreateResource writes the resource
object's offset relative to the beginning of the shared memory block.
Needed for the Draw overhaul.
I got tired of writing the same 13 or so lines of code over and over (it
actually put me off experimenting with Vulkan). Thus...
QFV_PacketCopyBuffer does the work of handling barriers and a (full
packet) copy from the staging buffer to a GPU buffer.
QFV_PacketCopyImage does a similar job, but for images. However, it
still needs a lot of work, but it does make getting a basic texture onto
the GPU much less of a hassle.
Both functions should make staging data much less error-prone.
This moves the qfv_resobj_t image initialization code from the IQM
loader into the resource management code. This will allow me to reuse
the code for setting up glyph data. As a bonus, it cleans up the IQM
code nicely.
UVs being 0 meant that lines were picking up the upper left pixel of
char 0 of conchars. With quake data, this meant a transparent pixel.
Fixes invisible debug lines :P.
It turns out that using the swapchain image for the size requirements is
unreliable: when running under renderdoc, vkGetImageMemoryRequirements
sets the memory requirements fields to 0, leading eventually to a null
memory object being passed to vkMapMemory, which does not end well.
I had missed that vkCmdCopyImage requires the source and destination
images to have exactly the same size, and I guess assumed that the
swapchain images would always be the size they said they were, but this
is not the case for tiled-optimal images. However,
vkCmdCopyImageToBuffer does the right thing regardless of the source
image size.
This fixes the skewed screenshots when the window size is not a multiple
of 8 (for me, might differ for others).
There's a problem with screenshot capture in that the image is sheared
after window resize, but the screen view looks good, and vulkan is happy
with the state changes.
As gbuf_base derives from the base pipeline, it inherits base's dynamic
setting, and thus doesn't need its own. I had a FIXME there as I wasn't
sure why I had a redundant setting, but I really can't see why I'd want
it different from any of the other main renderpass pipelines.
I've found and mostly isolated the parts of the code that will be
affected by window resizing, minus pipelines but they use dynamic
viewport and scissor settings and thus shouldn't be affected so long as
the swapchain format doesn't change (how does that happen?)
Finally, the model_funcs and render_funcs struts use designated
initializers. Not only are they good for ensuring correct
initialization, they're great for the programmer finding the right
initializer.
With the addition of dependencies on freetype and harfbuzz, it became
clear that the renderer plugins need to be explicitly linked against
external dependencies (and that I need to do more installed testing,
rather than just my static local builds). This fixes the unresolved
symbols when attempting to load any of the plugins.
qwaq doesn't supply a backtile pic, so Draw_TileClear in the gl and glsl
renderers would segfault when qwaq's window width changed due to some
back-tile being drawn.
As of a recent nvidia driver update, it became necessary to enable the
feature. I guess older drivers (or vulkan validation layers?) were a bit
slack in their checking (or perhaps I didn't actually get those lighting
changes working at the time despite having committed them).
This did involve changing some field names and a little bit of cleanup,
but I've got a better handle on what's going on (I think I was in one of
those coding trances where I quickly forget how things work).
This makes bsp traversal even more re-entrant (needed for shadows).
Everything needed for a "pass" is taken from bsp_pass_t (or indirectly
through bspctx_t (though that too might need some revising)).
Ambient lights are represented by a point at infinity and a zero
direction vector (spherical lights have a non-zero direction vector but
the cone angle is 360 degrees). This fixes what appeared to be mangled
light renderers (it was actually just an ambient light being treated as
a directional light (point at infinity, but non-zero direction vector).
There are some issues with the light renderers getting mangled, and only
the first light is even touched (just begin and end of render pass), but
this gets a lot of the framework into place.
Sounds odd, but it's part of the problem with calling two different
things with essentially the same name. The "high level" render pass in
question may be a compute pass, or a complex series of (Vulkan) render
passes and so won't create a Vulkan render pass for the "high level"
render pass (I do need to come up with a better name for it).
I really don't remember why I made it separate, though it may have been
to do with r_ent_queue. However, putting it together with the rest is
needed for the "render pass" rework.
It now lives in vulkan_renderpass.c and takes most of its parameters
from plist configs (just the name (which is used to find the config),
output spec, and draw function from C). Even the debug colors and names
are taken from the config.
QFV_CreateRenderPass is no longer used, and QFV_CreateFramebuffer hasn't
been used for a long time. The C file is still there for now but is
basically empty.
The real reason for the delay in implementing support for pNext is I
didn't know how to approach it at the time, but with the experience I've
gained using and modifying vkparse, the solution turned out to be fairly
simple. This allows for the use of various extensions (eg, multiview,
which was used for testing, though none of the hookup is in this
commit). No checking is done on the struct type being valid other than
it must be of a chainable type (ie, have its own pNext).
The software renderer uses Bresenham's line slice algorithm as presented
by Michael Abrash in his Graphics Programming Black Book Special Edition
with the serial numbers filed off (as such, more just so *I* can read
the code easily), along with the Chen-Sutherland line clipping
algorithm. The other renderers were more or less trivial in comparison.
Enabled by 'developer lighting'. It was good for confirming that the
lights in ad_e1m1 (Doom Hangar 16) were actually being output (over 600
of them sometimes, ouch). Turned out to be the color scale ambiguity.
Surfaces marked with SURF_DRAWALPHA but not SURF_DRAWTURB are put in a
separate queue for the water shader and run with a turb scale of 0.
Also, entities with colormod alpha < 1 are marked to go in the same
queue as SURF_DRAWALPHA surfaces (ie, no SURF_DRAWTURB unless the
model's texture indicated such).
A listener is used instead of (really, as well as) ie_app_window events
because systems that need to know about windows sizes may not have
anything to do with input and the event system.
This breaks console scaling for now (con_width and con_height are gone),
but is a major step towards window resize support as console stuff
should never have been in viddef_t in the first place.
The client screen init code now sets up a screen view (actually the
renderer's scr_view) that is passed to the client console so it can know
the size of the screen. The same view is used by the status bar code.
Also, the ram/cache/paused icon drawing is moved into the client screen
update code. A bit of duplication, but I do plan on merging that
eventually.
More tuning is needed on the actual splits as it falls over when the
lower rect gets too low for the subrects being allocated. However, the
scrap allocator itself will prefer exact width/height fits with larger
cutoff over inexact cuts with smaller cutoff. Many thanks to tdb for the
suggestions.
Fixes the fps dropping from ~3700fps down to ~450fps (cumulative due to
loss of POT rounding and very poor splitting layout), with a bonus boost
to about 4900fps (all speeds at 800x450). The 2d sprites were mostly ok,
but the lightmaps forming a capital gamma shape in a 4k texture really
hurt. Now the lightmaps are a nice dense bar at the top of the texture,
and 2d sprites are pretty good (slight improvement coming next).
It handles basic cursor motion respecting \r \n \f and \t (might be a
problem for id chars), wraps at the right edge, and automatically
scrolls when the cursor tries to pass the bottom of the screen.
Clearing the buffer resets its cursor to the upper left.
QFS_LoadFile closes its file argument (this is a design error resulting
from changing QFS_LoadFile to take a file instead of a path and not
completing the update), resulting in the call to Qfilesize accessing
freed memory.
This is intended for the built-in 8x8 bitmap characters and quake's
"conchars", but could potentially be used for any simple (non-composed
characters) mono-spaced font. Currently, the buffers can be created,
destroyed, cleared, scrolled vertically in either direction, and
rendered to the screen in a single blast.
One of the reasons for creating the buffer is to make it so scaling can
be supported in the sw renderer.
While this does pull the grovelling for the subpic out to the callers,
the real problem is the excessive use of qpic_t in the internal code:
qpic_t is really just the image format in wad files, and shouldn't be
used as a generic image handle.
Cleans up more of the icky code in the font drawing functions.
This makes working with quads, implied alpha quads, and lines much
cleaner (and gets rid of the bulk of the "eww" fixme), and will probably
make it easier to support multiple scraps and fonts, and potentially
more flexible ordering between pipelines.
This means that QF should support more exotic fonts without any issue
(once the rest of the text handling system is up to snuff) as HarfBuzz
does all the hard work of handling OpenType, Graphite, etc text shaping,
including kerning (when enabled).
Also, font loading now loads all the glyphs into the atlas (preload is
gone).
While the results are a little surprising (tends to alternate between
left side and top for allocations), there is much less wasted space in
the partially allocated regions, and the main free region seems to
always be quite big.
It is currently an ugly hack for dealing with the separate quad queue,
and the pipeline handling code needs a lot of cleanup, but it works
quite well, though I do plan on moving to HarfBuzz for text shaping. One
nice development is I got updating of descriptor sets working (just need
to ensure the set is no longer in use by the command queue, which the
multiple frames in flight makes easy).
It's implemented only in the Vulkan renderer, partly because there's a
lot of experimenting going on with it, but the glyphs do get transferred
to the GPU (checked in render doc). No rendering is done yet: still
thinking about whether to do a quick-and-dirty test, or to add HarfBuzz
immediately, and the design surrounding that.
R8G8B8A8 was hard-coded by accident when creating Vulkan_LoadTexArray
(or probably even the original Vulkan_LoadTex). This wasn't a problem
while everything was loaded in that format, but attempting to load an R8
texture didn't go so well. The same format as the image itself is used
now (correctly so).
I have recently learned that pre-multiplied alpha is the correct way to
do compositing, which is pretty much what the 2d pass does (actually,
all passes, but...). This required ensuring the color factor passed to
the fragment shader is pre-multiplied (a little silly for cshifts as
they used to be pre-multiplied but were un-pre-multiplied early in QF's
history and I don't feel like fixing that right now as it affects all
renderers), and also pre-multiplying alpha when converting from 8-bit
palette to rgba as the palette entry for transparent has that funky pink
(which is used in full-brights).
I will need to do more work to improve the 2d allocation, but rounding
up the requested sizes to the next power of two proved to be excessively
wasteful: I was able to allocate spots for only half of the sub-pics I
needed (though I did still need to double the number of pixels in the
end).
The software renderer uses Bresenham's line slice algorithm as presented
by Michael Abrash in his Graphics Programming Black Book Special Edition
with the serial numbers filed off (as such, more just so *I* can read
the code easily), along with the Chen-Sutherland line clipping
algorithm. The other renderers were more or less trivial in comparison.
Most were pretty easy and fairly logical, but gib's regex was a bit of a
pain until I figured out the real problem was the conditional
assignments.
However, libs/gamecode/test/test-conv4 fails when optimizing due to gcc
using vcvttps2dq (which is nice, actually) for vector forms, but not the
single equivalent other times. I haven't decided what to do with the
test (I might abandon it as it does seem to be UD).
This does mean that the gl and sw renderers can no longer call
S_ExtraUpdate, but really, they shouldn't be anyway. And I seem to
remember it not really helping (been way too long since quake ran that
slowly for me).
The texture animation data is compacted into a small struct for each
texture, resulting in much less data access when animating the texture.
More importantly, no looping over the list of frames. I plan on
migrating this to at least the other hardware renderers.
I found a test map with no texture data. Even after fixing the bsp
loader, vulkan didn't like it. Now vulkan is happy. The "Missing" text
is full-bright magenta on a dim grey background so it should be visible
in any lighting conditions.
Conflagrant Rodent has a sub-model with 0 faces (double bit error?)
causing simply counting faces to get out of sync with actual model
starts thus breaking *all* brush models that come after it (including
other maps). Thus be a little less lazy in figuring out model start
faces.
The models are broken up into N sub-(sub-)models, one for each texture,
but all faces using the same texture are drawn as an instance, making
for both reduced draw calls and reduced index buffer use (and thus,
hopefully, reduced bandwidth). While texture animations are broken, this
does mark a significant milestone towards implementing shadows as it
should now be possible to use multiple threads (with multiple index and
entid buffers) to render the depth buffers for all the lights.
This allows the use of an entity id to index into the entity data and
fetch the transform and colormod data in the vertex shader, thus making
instanced rendering possible. Non-world brush entities are still not
rendered, but the world entity is using both the entity data buffer and
entid buffer.
Sub-models and instance models need an instance data buffer, but this
gets the basics working (and the proof of concept). Using arrays like
this actually simplified a lot of the code, and will make it easy to get
transparency without turbulence (just another queue).
The gl water warp ones have been useless since very early on due to not
doing water warp in gl (vertex warping just didn't work well), and the
recent water warp implementation doesn't need those hacks. The rest of
the removed flags just aren't needed for anything. SURF_DRAWNOALPHA
might get renamed, but should be useful for translucent bsp surfaces
(eg, vines in ad_tears).
One more step towards BSP thread-safety. This one brought with it a very
noticeable speed boost (ie, not lost in the noise) thanks to the face
visframes being in tightly packed groups instead of 128 bytes apart,
though the sw render's boost is lost in the noise (but it's very
fill-rate limited).
This is next critical step to making BSP rendering thread-safe.
visframe was replaced with cluster (not used yet) in anticipation of BSP
cluster reconstruction (which will be necessary for dealing with large
maps like ad_tears).
The main goal was to get visframe out of mnode_t to make it thread-safe
(each thread can have its own visframe array), but moving the plane info
into mnode_t made for better data access patters when traversing the bsp
tree as the plane is right there with the child indices. Nicely, the
size of mnode_t is the same as before (64 bytes due to alignment), with
4 bytes wasted.
Performance-wise, there seems to be very little difference. Maybe
slightly slower.
The unfortunate thing about the change is the plane distance is negated,
possibly leading to some confusion, particularly since the box and
sphere culling functions were affected. However, this is so point-plane
distance calculations can be done with a single 4d dot product.
The map uses 41% of a 4k light map scrap, and 512 texture descriptors
wasn't enough for vulkan. Ouch. I do need to get cvars on these things,
but this will do for now (decades later...)
This was one of the biggest reasons I had trouble understanding the bsp
display list code, but it turns out it was for dealing with GLES's
16-bit limit on vertex indices. Since vulkan uses 32-bit indices,
there's no need for the extra layer of indirection. I'm pretty sure it
was that lack of understanding that prevented me from removing it when I
first converted the glsl bsp code to vulkan (ie, that 16-bit indices
were the only reason for elements_t).
It's hard to tell whether the change makes much difference to
performance, though it seems it might (noisy stats even over 50 timedemo
loops) and the better data localization indicate it should at least be
just as good if not better. However, the reason for the change is
simplifying the data structures so I can make bsp rendering thread-safe
in preparation for rendering shadow maps.
And maybe a nano-optimization. Switching from (~side + 1) to (-side)
seems to give glsl a very tiny speed boost, but certainly doesn't hurt.
Looking at some assembly output for the three cases, the two hacks seem
to generate the same code as each other, but 3 instructions vs 6 for ?:.
While ?: is more generically robust, the hacks are tuned for the
knowledge side is either 0 or 1. The later xor might alter things, but
at least I now know that the hack (either version) is worthwhile.
With experience, I have found that trying to continue after a validation
error tends to result in a segfault or some other nastiness, and
Sys_Shutdown (and the full shutdown sequence) is triggered for any error
signal (segfault, abort, etc) so just exit(1).