The dynamic array macros made this much easier than last time I looked
at it, especially when it came to figuring out the bad memory accesses
that I seem to remember from my last attempt 9 years ago.
This makes tex_t more generally useable and probably more portable. The
goal was to be able to use tex_t with data that is in a separate chunk
of memory.
The sky texture is loaded with black's alpha set to 0. While this does
hit both layers, the screen is cleared to black so it shouldn't be a
problem (and will allow having a skybox behind the sheets).
Glow map and sky sheet and cube need to wait until I can get some
default textures going, but the world is rendering correctly otherwise
(though a tad dark: need to do a gamma setting).
It now uses the ring buffer code I wrote for qwaq (and forgot about,
oops) to handle the packets themselves, and the logic for allocating and
freeing space from the buffer is a bit simpler and seems to be more
reliable. The automated test is a bit of a joke now, though, but coming
up with good tests for it... However, nq now cycles through the demos
without obvious issue under the same conditions that caused the light
map update code to segfault.
Needed to use an rgba format to use floats (and optimal layout), but
having to set the alpha to 1 even for full-dark luxels is not very
efficient. Better to just ignore the alpha in the shader. Fixes the
occasional transparent surface in shadowed areas.
Many surfaces are missing (I suspect it's due to transform stage
management in the index emitter), and currently only the light maps are
rendered (still not binding the correct textures), but the basics are
working.
Vulkan validation (quite rightly) doesn't like it when the flush range
goes past the end of the buffer, but also doesn't like it when the flush
range isn't cache-line aligned, so align the size of the buffer, too.
Copying data from the wrong buffer was the cause of the corrupted brush
model vertices, and then lots of little errors (mostly forgetting to
multiply by bpp) for textures.
I had originally planned on mixing the stage management with general
texture support code like I did in glsl, but I think that was a mistake
and I did keep looking for scrap.[ch] when I wanted to edit something to
do with the scrap...
There's still a problem with the vertex data itself not getting sent to
the GPU properly, but vulkan is now happy with my tiny test map (which
required disabling skies entirely until I get null textures working).
This fixes a nine year old bug that I discovered only today thanks to
the vulkan renderer. The problem was that when a model had a clear
callback, it was not getting marked as needing to be reloaded, and thus
the model would be "reused" after being trampled on by another model
loading over it.
Also, plug a potential string buffer overflow (strcpy just will not
die!).
This cleans up texture_t and possibly even improves locality of
reference when running through texture chains (not profiled, and not
actually the goal).
It optionally generates mipmaps, and supports the main texture types
(especially for texture packs), including palettes, but is otherwise
rather unsophisticated code. Needs a lot of work, but testing first.
This is more correct as the environment (X11 etc) might provide more
swapchain images than we want: 3 frames in flight is generally
considered a good balance between saturating the hardware and latency.
Cleans up global space and makes it usable in multiple contexts. Also,
max quads dropped to 32k as each frame now has its own vertex buffer to
avoid issues with vertex overwrites (which I have seen). However, all
vertex buffers are in the one memory/buffer object (using offsets) and
the index buffer has been moved into a device-local memory object.
I think I did two as a bit of a ring buffer, but the new ring buffer
system used inside a staging buffer makes it less necessary. Also, the
staging buffer is now a fair bit bigger (4M is probably not really
enough)
This allows the array in which the command buffers are allocated to be
allocated on the stack using alloca and thus remove the need to
malloc/free of relatively small chunks.
The console background is missing, and scaled vs unscaled (currently
always scaled) 2d, but otherwise everything seems to work. Lots of
places to clean up, though.
Draw now has its own staging buffer to use with its scrap. Also, a few
fixes were needed for the staging buffer and scrap flush routines.
Other than some synchronization issues with draw scrap flushing
(currently worked around with a fence-wait) things seem to be working
nicely.
The scrap texture did very good things for the glsl renderer and the
better control over data copying might help it do even better things for
vulkan, especially with lots of little icons.
It's never actually used (the texture can be fetched using
GLSL_ScrapTexture) and gets in the way of sharing the scrap system with
the vulkan renderer.
r_screen because of SCR_UpdateScreen, and r_cvar because the cvars
really should never have been in a plugin in the first place (and
r_screen needed access).
First pixels! This was a nightmare of little issues that the validation
layers couldn't help with: incorrect input assembly, incorrect vertex
attribute specs. Though the layers did help with getting the queues
working. Still, lots of work to go but this is a major breakthrough as
I now have access to visual debugging for textures and the like.
Short wrappers for Draw functins are in vid_render_vulkan.c so the
vulkan context can be passed on to the actual functions. The 2D shaders
are set up similar to those in glsl, but with full 32-bit color (rgba)
support instead of paletted. However, the textures are not loaded yet,
nor is anything bound.
This necessitated hand-writing qfv_swapchain_t's descriptors as I don't
feel like getting that complicated with vkgen at this stage and it's not
really appropriate anyway? qfv_swapchain_t is meant to be read-only and
not parsed from a plist.
The prototypes for handle parsers needed to be changes because it turned
out "single" was inappropriate for handles as "single" allocates memory
for the parsed object, but handles must be written directly.
The way I wound up using the field meant that exprctx should not "own"
the hashlinks chain, but rather just point to it. This fixes the nasty
access errors I had.
Dependencies on vkparse.hinc were spreading through the code which I
didn't want as that removes a lot of the automation from the automake
files. This keeps all parser code internal to vkparse.c's scope, and any
accesses required for enum and struct (not yet) definitions can be
fetched by name.
Array and single type overrides now allow the parsing of the items
themselves to be customized. This makes it easy to handle arrays and
pointers to single items while also using custom specifications, rather
than relying entirely on the custom override.
I want to be able to use name references, but that requires string
items, so anything that would normally be dictionary or array (or
binary, even) would also need to accept string. This seemed to be the
cleanest solution. Any custom parser would then need to check the type
and act appropriately, but any inappropriate types have already been
pre-filtered by the standard parsers.
Care needs to be taken to ensure the right function is used with the
right arguments, but with these, the need to use qconj(d|f) for a
one-off inverse rotation is removed.
I forgot to right-shift the value so offsets were becoming 0 or 8
instead of 0-15. This fixes the management of small objects. It turns
out that after this fix, qfvis's problems were caused by fragmentation
in the windings. Need to revisit line allocation and use POT-specialized
pools.
I think the sub-line allocator falling over is the final source of
qfvis's leaks. It certainly causes a mess of the sub-lines. But having
some tests to get working sure beats scratching my head over qfvis :)
They're binned by powers of two (with in between sizes going to the
smaller bin should I make cache-line allocations NPOT (which I think
might be worthwhile). However, there seems to still be a bug somewhere
causing a nasty leak as now my hacked qfvis consumes 40G in less than a
minute.
The idea is to not search through blocks for an available allocation.
While the goal was to speed up allocation of cache lines of varying
cluster sizes, it's not enough due to fragmentation.
They take advantage of gcc's vector_size attribute and so only cross,
dot, qmul, qvmul and qrot (create rotation quaternion from two vectors)
are needed at this stage as basic (per-component) math is supported
natively by gcc.
The provided functions work on horizontal (array-of-structs) data, ie a
vec4d_t or vec4f_t represents a single vector, or traditional vector
layout. Vertical layout (struct-of-arrays) does not need any special
functions as the regular math can be used to operate on four vectors at
a time.
Functions are provided for loading a vec4 from a vec3 (4th element set
to 0) and storing a vec4 into a vec3 (discarding the 4th element).
With this, QF will require AVX2 support (needed for vec4d_t). Without
support for doubles, SSE is possible, but may not be worthwhile for
horizontal data.
Fused-multiply-add is NOT used because it alters the results between
unoptimized and optimized code, resulting in -mfma really meaning
-mfast-math-anyway. I really do not want to have to debug issues that
occur only in optimized code.
QC's int type is named "integer" (didn't feel like changing that right
now), so special case it to be "int".
Output the parse func name (instead of "fix me").
Output a parse func for enums (needed for arrays of enums
(VkDynamicState)).
The static variable meant that Fog_GetColor was not thread-safe (though
multiple calls in the one thread look to be ok for now). However, this
change takes it one step closer to being more generally usable.
Patch found in an old stash.
I had missed the array declaration and thus initialized the pointer to
the offset array incorrectly. Didn't show up until I tried using
multiple offsets.
Shaders can be built as spv files and installed into
$libdir/quakeforge/shaders or as spvc files and compiled into the
engine. Loading supports $builtin/name to access builtin shaders,
$shader/path to access external standard shaders and quake filesystem
access for all other paths.
I had forgotten that msaa samples was governed by the driver (as a max)
and the renderpass setup code simply took the max. Thus why 1 vs 8
caused the display to render incorrectly.
It turned out the msaa setting defaulting to 1 instead of 8 was the
problem no idea why at this stage (need to read up on just how that
setting works). Once I understand just how it works, I'll rework the
msaa handling.