Copying data from the wrong buffer was the cause of the corrupted brush
model vertices, and then lots of little errors (mostly forgetting to
multiply by bpp) for textures.
I had originally planned on mixing the stage management with general
texture support code like I did in glsl, but I think that was a mistake
and I did keep looking for scrap.[ch] when I wanted to edit something to
do with the scrap...
There's still a problem with the vertex data itself not getting sent to
the GPU properly, but vulkan is now happy with my tiny test map (which
required disabling skies entirely until I get null textures working).
This fixes a nine year old bug that I discovered only today thanks to
the vulkan renderer. The problem was that when a model had a clear
callback, it was not getting marked as needing to be reloaded, and thus
the model would be "reused" after being trampled on by another model
loading over it.
Also, plug a potential string buffer overflow (strcpy just will not
die!).
This cleans up texture_t and possibly even improves locality of
reference when running through texture chains (not profiled, and not
actually the goal).
It optionally generates mipmaps, and supports the main texture types
(especially for texture packs), including palettes, but is otherwise
rather unsophisticated code. Needs a lot of work, but testing first.
This is more correct as the environment (X11 etc) might provide more
swapchain images than we want: 3 frames in flight is generally
considered a good balance between saturating the hardware and latency.
Cleans up global space and makes it usable in multiple contexts. Also,
max quads dropped to 32k as each frame now has its own vertex buffer to
avoid issues with vertex overwrites (which I have seen). However, all
vertex buffers are in the one memory/buffer object (using offsets) and
the index buffer has been moved into a device-local memory object.
I think I did two as a bit of a ring buffer, but the new ring buffer
system used inside a staging buffer makes it less necessary. Also, the
staging buffer is now a fair bit bigger (4M is probably not really
enough)
This allows the array in which the command buffers are allocated to be
allocated on the stack using alloca and thus remove the need to
malloc/free of relatively small chunks.
The console background is missing, and scaled vs unscaled (currently
always scaled) 2d, but otherwise everything seems to work. Lots of
places to clean up, though.
Draw now has its own staging buffer to use with its scrap. Also, a few
fixes were needed for the staging buffer and scrap flush routines.
Other than some synchronization issues with draw scrap flushing
(currently worked around with a fence-wait) things seem to be working
nicely.
The scrap texture did very good things for the glsl renderer and the
better control over data copying might help it do even better things for
vulkan, especially with lots of little icons.
It's never actually used (the texture can be fetched using
GLSL_ScrapTexture) and gets in the way of sharing the scrap system with
the vulkan renderer.
r_screen because of SCR_UpdateScreen, and r_cvar because the cvars
really should never have been in a plugin in the first place (and
r_screen needed access).
First pixels! This was a nightmare of little issues that the validation
layers couldn't help with: incorrect input assembly, incorrect vertex
attribute specs. Though the layers did help with getting the queues
working. Still, lots of work to go but this is a major breakthrough as
I now have access to visual debugging for textures and the like.
Short wrappers for Draw functins are in vid_render_vulkan.c so the
vulkan context can be passed on to the actual functions. The 2D shaders
are set up similar to those in glsl, but with full 32-bit color (rgba)
support instead of paletted. However, the textures are not loaded yet,
nor is anything bound.
This necessitated hand-writing qfv_swapchain_t's descriptors as I don't
feel like getting that complicated with vkgen at this stage and it's not
really appropriate anyway? qfv_swapchain_t is meant to be read-only and
not parsed from a plist.
The prototypes for handle parsers needed to be changes because it turned
out "single" was inappropriate for handles as "single" allocates memory
for the parsed object, but handles must be written directly.
The way I wound up using the field meant that exprctx should not "own"
the hashlinks chain, but rather just point to it. This fixes the nasty
access errors I had.
Dependencies on vkparse.hinc were spreading through the code which I
didn't want as that removes a lot of the automation from the automake
files. This keeps all parser code internal to vkparse.c's scope, and any
accesses required for enum and struct (not yet) definitions can be
fetched by name.
Array and single type overrides now allow the parsing of the items
themselves to be customized. This makes it easy to handle arrays and
pointers to single items while also using custom specifications, rather
than relying entirely on the custom override.
I want to be able to use name references, but that requires string
items, so anything that would normally be dictionary or array (or
binary, even) would also need to accept string. This seemed to be the
cleanest solution. Any custom parser would then need to check the type
and act appropriately, but any inappropriate types have already been
pre-filtered by the standard parsers.
Care needs to be taken to ensure the right function is used with the
right arguments, but with these, the need to use qconj(d|f) for a
one-off inverse rotation is removed.
I forgot to right-shift the value so offsets were becoming 0 or 8
instead of 0-15. This fixes the management of small objects. It turns
out that after this fix, qfvis's problems were caused by fragmentation
in the windings. Need to revisit line allocation and use POT-specialized
pools.
I think the sub-line allocator falling over is the final source of
qfvis's leaks. It certainly causes a mess of the sub-lines. But having
some tests to get working sure beats scratching my head over qfvis :)
They're binned by powers of two (with in between sizes going to the
smaller bin should I make cache-line allocations NPOT (which I think
might be worthwhile). However, there seems to still be a bug somewhere
causing a nasty leak as now my hacked qfvis consumes 40G in less than a
minute.
The idea is to not search through blocks for an available allocation.
While the goal was to speed up allocation of cache lines of varying
cluster sizes, it's not enough due to fragmentation.
They take advantage of gcc's vector_size attribute and so only cross,
dot, qmul, qvmul and qrot (create rotation quaternion from two vectors)
are needed at this stage as basic (per-component) math is supported
natively by gcc.
The provided functions work on horizontal (array-of-structs) data, ie a
vec4d_t or vec4f_t represents a single vector, or traditional vector
layout. Vertical layout (struct-of-arrays) does not need any special
functions as the regular math can be used to operate on four vectors at
a time.
Functions are provided for loading a vec4 from a vec3 (4th element set
to 0) and storing a vec4 into a vec3 (discarding the 4th element).
With this, QF will require AVX2 support (needed for vec4d_t). Without
support for doubles, SSE is possible, but may not be worthwhile for
horizontal data.
Fused-multiply-add is NOT used because it alters the results between
unoptimized and optimized code, resulting in -mfma really meaning
-mfast-math-anyway. I really do not want to have to debug issues that
occur only in optimized code.
QC's int type is named "integer" (didn't feel like changing that right
now), so special case it to be "int".
Output the parse func name (instead of "fix me").
Output a parse func for enums (needed for arrays of enums
(VkDynamicState)).
The static variable meant that Fog_GetColor was not thread-safe (though
multiple calls in the one thread look to be ok for now). However, this
change takes it one step closer to being more generally usable.
Patch found in an old stash.
I had missed the array declaration and thus initialized the pointer to
the offset array incorrectly. Didn't show up until I tried using
multiple offsets.
Shaders can be built as spv files and installed into
$libdir/quakeforge/shaders or as spvc files and compiled into the
engine. Loading supports $builtin/name to access builtin shaders,
$shader/path to access external standard shaders and quake filesystem
access for all other paths.
I had forgotten that msaa samples was governed by the driver (as a max)
and the renderpass setup code simply took the max. Thus why 1 vs 8
caused the display to render incorrectly.
It turned out the msaa setting defaulting to 1 instead of 8 was the
problem no idea why at this stage (need to read up on just how that
setting works). Once I understand just how it works, I'll rework the
msaa handling.
The problem is that I needed to support dynamic types on operators (for
bit-field enums), had things working, but a bad edit messed things up
and I had to rebuild that bit of code. Missed one bit :P
It is capable of parsing single expressions with fairly simple
operations. It current supports ints, enums, cvars and (external) data
structs. It is also thread-safe (in theory, needs proper testing) and
the memory it uses can be mass-freed.
This was inspired by
Hoard: A Scalable Memory Allocator
for Multithreaded Applications
Emery D. Berger, Kathryn S. McKinley, Robert D. Blumofe, Paul R.
Wilson,
It's not anywhere near the same implementation, but it did take a few
basic concepts. The idea is twofold:
1) A pool of memory from which blocks can be allocated and then freed
en-mass and is fairly efficient for small (4-16 byte) blocks
2) Tread safety for use with the Vulkan renderer (and any other
multi-threaded tasks).
However, based on the Hoard paper, small allocations are cache-line
aligned. On top of that, larger allocations are page aligned.
I suspect it would help qfvis somewhat if I ever get around to tweaking
qfvis to use cmem.
The calculation fails (produces NaN) if the vectors are anti-parallel,
but works for all other combinations. I came up with this implementation
when I discovered Unity's Quaternion.FromToRotation could did not work
with very small angles. This implementation will produce a usable
quaternion below 0.00255 degrees (though it will be slightly larger than
unit). Unity's failed such that I could see KSP's skybox snap while it
rotated around my test vessel.
The problem was caused by passing the index into the dtables array to
dtable_get which expects a handle. A handle is the ones-compliment
negative of the index which means that handle 0 is invalid (but 0 was
being passed... oops). Fixes the segfault when qw-client-x11 connects to
a server.
The problem was caused by passing the index into the dtables array to
dtable_get which expects a handle. A handle is the ones-compliment
negative of the index which means that handle 0 is invalid (but 0 was
being passed... oops). Fixes the segfault when qw-client-x11 connects to
a server.
This gets renderpass parsing almost working (not hooked up, though). The
missing bits are support for expressions for flags (namely support for
the | operator) and references (eg $swapchain.format). However, this
shows that the basic concept for the parser is working.
The array has to be allocated using byte elements and thus the size of
the array is the number of bytes, but it needs to be the actual number
of elements in the array. Problem caused by not knowing the actual type
(and C not having type variables anyway).
Nothing is actually done yet other than parsing the built-in property
list to property list items (the actual parser is just a skeleton), but
everything compiles
The property list specifies the base structures for which parser code
will be generated (along with any structures and enums upon which those
structures depend). It also defines option specialized parsers for
better control.
It worked as a proof of concept, but as the code itself needs to be a
bit smarter, it would be a lot smarter to break up that code to make it
easier to work on the individual parts.
PL_ParseDictionary itself does only one level, but it takes care of the
key-field mappings and property list item type checking leaving the
actual parsing to a helper specified by the field. That helper is free
to call PL_ParseDictionary recursively.
The first line of the parsed item is stored and can be retrieved using
PL_Line. Line numbers not stored for dictionary keys yet. Will be 0 for
any items generated by code rather than parsed from a file or string.
The tables are generated from the enums pulled out of the vulkan headers
using a ruamoko program (thanks to its reflection capabilities). They
will be used for parsing property lists used to create render passes and
pipelines.
There's still some cleanup to do, but everything seems to be working
nicely: `make -j` works, `make distcheck` passes. There is probably
plenty of bitrot in the package directories (RPM, debian), though.
The vc project files have been removed since those versions are way out
of date and quakeforge is pretty much dependent on gcc now anyway.
Most of the old Makefile.am files are now Makemodule.am. This should
allow for new Makefile.am files that allow local building (to be added
on an as-needed bases). The current remaining Makefile.am files are for
standalone sub-projects.a
The installable bins are currently built in the top-level build
directory. This may change if the clutter gets to be too much.
While this does make a noticeable difference in build times, the main
reason for the switch was to take care of the growing dependency issues:
now it's possible to build tools for code generation (eg, using qfcc and
ruamoko programs for code-gen).
This fixes the segfault when loading the menu progs. I had forgotten
that the menu code doesn't use PR_LoadProgs (I don't remember why.
Obsolete reason?).
When I ported SEB to python, I discovered that I apparently didn't
really understand the paper's description of the end condition and the
usage of the affine and convex sets for center testing. This cleans up
the test and makes SEB more correct for the cases that have less than 4
supporting points (especially when there are less than 4 points total).
Returning a string was a bad idea as it makes str_str difficult to use
with str_mid. (actually, iirc, it was the only reason I moved all
strings into progs memory... hmm).
This allows a debugger to do any symbol lookups and other preparations
between loading progs and the first code execution. .ctors are called as
per normal if debug_handler is not set.
In testing variable fw/precision in PR_Sprintf, I got a nasty reminder
of the limitations of the current progs ABI: passing @args to another QC
function does not work because the args list gets trampled but the
called function's locals. Thus, the need for a va_copy. It's not quite
the same as C's as it returns the destination args instead of copying
like memcpy, but it does copy the list from the source args to a
temporary buffer that is freed when the calling function returns.
This is the first step in reworking PR_Sprintf to use a state machine.
The goal is to make it more robust against errors and easier to extend
(eg, * width and precision).