Modern maps can have many more leafs (eg, ad_tears has 98983 leafs).
Using set_t makes dynamic leaf counts easy to support and the code much
easier to read (though set_is_member and the iterators are a little
slower). The main thing to watch out for is the novis set and the set
returned by Mod_LeafPVS never shrink, and may have excess elements (ie,
indicate that nonexistent leafs are visible).
Having set_expand exposed is useful for loading data into a set.
However, it turns out there was a bug in its size calculation in that
when the requested set size was a multiple of SET_BITS (and greater than
the current set size), the new set size one be SET_BITS larger than
requested. There's now some tests for this :)
Quake just looked wrong without the view model. I can't say I like the
way the depth range is hacked, but it was necessary because the view
model needs to be processed along with the rest of the alias models
(didn't feel like adding more command buffers, which I imagine would be
expensive with the pipeline switching).
The recent changes to key handling broke using escape to get out of the
console (escape would toggle between console and menu). Thus take care
of the menu (escape) part of the coupling FIXME by implementing a
callback for the escape key (and removing key_togglemenu) and sorting
out the escape key handling in console. Seems to work nicely
Since vulkan supports 32-bit indexes, there's no need for the
shenanigans the EGL-based glsl renderer had to go through to render bsp
models (maps often had quite a bit more than 65536 vertices), though the
reduced GPU memory requirements of 16-bit indices does have its
advantages.
Any sun (a directional light) is in the outside node, which due to not
having its own PVS data is visible to all nodes, but that's a tad
excessive. However, any leaf node with sky surfaces will potentially see
any suns, and leaf nodes with no sky surfaces will see the sun only if
they can see a leaf that does have sky surfaces. This can be quite
expensive to calculate (already known to be moderately expensive for
just the camera leaf node (singular!) when checking for in-map lights)
Getting close to understanding (again) how it all works. I only just
barely understood when I got vulkan's renderer running, but I really
need to understand for when I modify things for shadows. The main thing
hurdle was tinst, but that was dealt with in the previous commit, and
now it's just sorting out the mess of elechains and elementss.
Its sole purpose was to pass the newly allocated instsurf when chaining
an instance model (ammo box, etc) surface, but using expresion
statements removes the need for such shenanigans, and even makes
msurface_t that little bit smaller (though a separate array would be
much better for cache coherence).
More importantly, the relevant code is actually easier to understand: I
spent way too long working out what tinst was for and why it was never
cleared.
This reduces the overhead needed to manage the memory blocks as the
blocks are guaranteed to be page-aligned. Also, the superblock is now
alllocated from within one of the memory blocks it manages. While this
does slightly reduce the available cachelines within the first block (by
one or two depending on 32 vs 64 bit pointers), it removes the need for
an extra memory allocation (probably via malloc) for the superblock.
The renderer's LineGraph now takes a height parameter, and netgraph now
uses cl_* cvars instead of r_* (which never really made sense),
including it's own height cvar (the render graphs still use
r_graphheight).
The uptime display had not been updated for the offset Sys_DoubleTime,
so add Sys_DoubleTimeBase to make it easy to use Sys_DoubleTime as
uptime.
Line up the layout of the client list was not consistent for drop and
qport.
conwidth and conheight have been moved into vid.conview (probably change
the name at some time), and scr_vrect has been replaced by a view as
well. This makes it much easier to create 2d elements that follow the
screen size (taking advantage of a view's gravity) which will, in the
end, make changing the window size easier.
One moves and resizes the view in one operation as a bit of an
optimization as moving and resizing both update any child views, and
this does only one update.
The other sets the gravity and updates any child views as their
absolute positions would change as well as the updated view's absolute
position.
This refactors (as such) keys.c so that it no longer depends on console
or gib, and pulls keys out of video targets. The eventual plan is to
move all high-level general input handling into libQFinput, and probably
low-level (eg, /dev/input handling for joysticks etc on Linux).
Fixes#8
on_update is for pull-model outpput targets to do periodic synchronous
checks (eg, checking that the connection to the actual output device is
still alive and reviving it if necessary)
Output plugins can use either a push model (synchronous) or a pull
model (asynchronous). The ALSA plugin now uses the pull model. This
paves the way for making jack output a simple output plugin rather than
the combined render/output plugin it currently is (for #16) as now
snd_dma works with both models.
This brings the alsa driver in line with the jack render (progress
towards #16), but breaks most of the other drivers (for now: one step at
a time). The idea is that once the pull model is working for at least
one other target, the jack renderer can become just another target like
it should have been in the first place (but I needed to get the pull
model working first, then forgot about it).
Correct state checking is not done yet, but testsound does produce what
seems to be fairly good sound when it starts up correctly (part of the
state checking (or lack thereof), I imagine).
This failed with errors such as:
from ./include/QF/simd/vec4d.h:32,
from libs/util/simd.c:37:
./include/QF/simd/vec4d.h: In function ‘qmuld’:
/usr/lib/gcc/x86_64-pc-linux-gnu/10.3.0/include/avx2intrin.h:1049:1: error: inlining failed in call to ‘always_inline’ ‘_mm256_permute4x64_pd’: target specific option mismatch
1049 | _mm256_permute4x64_pd (__m256d __X, const int __M)
Support for finding the first address associated with a source line was
added to the engine, returning 0 if not found.
A temporary breakpoint is set and the progs allowed to run free.
However, better handling of temporary breakpoitns is needed as currently
a "permanent" breakpoint will be cleared without clearing the temporary
breakpoing if the permanent breakpoing is hit while execut-to-cursor is
running.
Fuzzy bsearch is useful for finding an entry in a prefix sum array
(value is >= ele[0], < ele[1]), and the reentrant version is good when
data needs to be passed to the compare function. Adapted from the code
used in pr_resolve.
GCC does a fairly nice job of producing code for vector types when the
hardware doesn't support SIMD, but it seems to break certain math
optimization rules due to excess precision (?). Still, it works well
enough for the core engine, but may not be well suited to the tools.
However, so far, only qfvis uses vector types (and it's not tested yet),
and tools should probably be used on suitable machines anyway (not
forces, of course).
I don't know that the cache line size is 64 bytes on 32 bit systems, but
it should be ok to assume that 64-byte alignment behaves well on systems
with smaller cache lines so long as they are powers of two. This does
mean there is some waste on 32-bit systems, but it should be fairly
minimal (32 bytes per memblock, which manages page sized regions).
The Blend macro supports any non-integral type supporting * and +
(float, double, vec4f_t, etc), so it is essentially a scalar VectorBlend
or QuatBlend.
Standard quake has just linear, but the modding community added inverse,
inverse-square (raw and offset (1/(r^2+1)), infinite (sun), and
ambient (minlight). Other than the lack of shadows, marcher now looks
really good.
Mostly, this gets the stage flags in with the barrier, but also adds a
couple more barrier templates. It should make for slightly less verbose
code, and one less opportunity for error (mismatched barrier/stages).
This gets the shaders needed for creating shadow maps, and the changes
to the lighting pipeline for binding the shadow maps, but no generation
or reading is done yet. It feels like parts of various systems are
getting a little big for their britches and I need to do an audit of
various things.
QF now uses its own configuration file (quakeforge.cfg for now) rather
than overwriting config.cfg so that people trying out QF in their normal
quake installs don't trash their config.cfg for other quake clients. If
quakeforge.cfg is present, all other config files are ignored except
that quake.rc is scanned for a startdemos command and that is executed.
And improve the generated code for MSG_ReadShort
I suspect gcc didn't like all the excess pointer dereferences and so
couldn't assume that the bytes were being read sequentially.
And improve the generated code as well (ie, use a code sequence that gcc
recognizes and optimizes to a single 32-bit read and a byte-swap).
nq uses big-endian for its packet headers (arg, though it is consistent
with IP, it's not with the rest of quake).
I'm not sure that the mismatch between refdef_t and the assembly defines
was a problem (many fields unused), but the main problem was due to
execute permission on the pages: one chunk of asm was in the data
section, and the patched code was not marked as being executable (due to
such a thing not existing when quake was written).
vid.aspect is removed (for now) as it was not really the right idea (I
really didn't know what I was doing at the time). Nicely, this *almost*
fixes the fov bug on fresh installs: the view is now properly
upside-down rather than just flipped vertically (ie, it's now rotated
180 degrees).
While the main bulk of the improvement (36s down from 42s for
gmsp3v2.bsp on my i7-6850K) comes from using a high-tide allocator for
the windings (which necessitated using a fixed size), it is ever so
slightly faster than using malloc as the back-end.
This is for the conversion /to/ paletted textures. The conversion is
necessary for csqc support. In the process, the conversion has been sped up
by implementing a color cache for the conversion process. I haven't
measured the difference yet, but Mr Fixit does seem to load much faster for
the sw renderer than it did before the change (many months old memory).
This separate the FOV calculations from other refdef calcs, cleaning up the
renderer proper and making it easier for other parts of the engine (eg,
csqc) to update the fov.
The server edict arrays are now stored outside of progs memory, only the
entity data itself (ie data accessible to progs via ent.fld) is stored in
progs memory. Many of the changes were due to code accessing edicts and
entity fields directly rather than through the provided macros.
Loading is broken for multi-file image sets due to the way images are
loaded (this needs some thought for making it effecient), but the
Blender environment map loading works.
They're unlit (fullbright, but that's nothing new for quake), but
working nicely. As a bonus, sort out the sky pass (forced to due to the
way command buffers are used).
There were actually several problems: translucency wasn't using or
depending on the depth buffer, and the depth buffer wasn't marked as
read-only in the g-buffer pass. Getting that correct seems to have given
bigass1 a 0.5% boost (hard to say, could be the usual noise).
That was... easier than expected. A little more tedious that I would
have liked, but my scripting system isn't perfect (I suspect it's best
suited as the output of a code generator), and the C side could do with
a little more automation.
Other than dealing with shader data alignment issues, that went well :).
Nicely, the implementation gets the explicit scaling out of the shader,
and allows for a directional flag.
I never liked that some of the macros needed the type as a parameter
(yay typeof and __auto_type) or those that returned a value hid the
return statement so they couldn't be used in assignments.
Still "some" more to go: a pile to do with transforms and temporary
entities, and a nasty one with host_cbuf. There's also all the static
block-alloc lists :/
Light styles and shadows aren't implemented yet.
The map's entities are used to create the lights, and the PVS used to
determine which lights might be visible (ie, the surfaces they light).
That could do with some more improvements (eg, checking if a leaf is
outside a spotlight's cone), but the concept seems to work.
Double benefit, actually: faster when building a fat PVS (don't need to
copy as much) and can be used in multiple threads. Also, default visiblity
can be set, and the buffer size has its own macro.
Useful for avoiding a pile of wrapper functions that merely pass on
command-specific data to the actual implementation. Used to clean up the
wrappers in nq and qw cl_input.c
This is the first step towards component-based entities.
There's still some transform-related stuff in the struct that needs to
be moved, but it's all entirely client related (rather than renderer)
and will probably go into a "client" component. Also, the current
components are directly included structs rather than references as I
didn't want to deal with the object management at this stage.
As part of the process (because transforms use simd) this also starts
the process of moving QF to using simd for vectors and matrices. There's
now a mess of simd and sisd code mixed together, but it works
surprisingly well together.
The plan is to have a fully component based entity system. This adds
hierarchical transforms. Not particularly useful for quake itself at
this stage, but it will allow for much more flexibility later on,
especially when QuakeForge becomes more general-purpose.
This seems to be pretty close to as fast as it gets (might be able to do
better with some shuffles of the negation constants instead of loading
separate constants).
It's not used yet as work needs to be done to better support generic
entities, but this is the next step to real-time lighting (though, to be
honest, I expect it will be too slow to be usable).
The main purpose is to allow fluent-style:
const char *targetname = PL_String (PL_ObjectForKey (entity, "targetname"));
if (targetname && !PL_ObjectForKey (targets, targetname)) {
PL_D_AddObject (targets, targetname, entity);
}
[note: the above is iffy due to ownership of entity, but the code from
which the above comes works around the issue]
Static lights are yet to come (so the screen is black most of the time),
but dynamic lights work very nicely (and look very good) despite the
falloff being incorrect.
While I could reconstruct the position from the screen coords and depth,
this is easier and good enough for now. Reconstruction is an
optimization thing.
Lighting doesn't actually do lights yet, but it's producing pixels.
Translucent seems to be working (2d draw uses it), and compose seems to
be working.
After getting lights even vaguely working for alias models, I realized
that it just wasn't going to be feasible to do nice lighting with
forward rendering. This gets the bulk of the work done for deferred
rendering, but still need to sort out the shaders before any real
testing can be done.
The order in which keys are added to the dictionary object is
maintained. Adding a key after removing an old key adds the new key to
the end of the list rather than reusing the old key's spot.
PL_ParseLabeledArray works the same way as PL_ParseArray, but instead
takes a dictionary object. The keys of the items are ignored, and the
order is not preserved (at this stage), but this is a cleaner solution
to getting an array of objects when the definitions of those objects
need to be accessible by name as well.
It's not entirely there yet, but the basics are working. Work is still
needed for avoiding duplication of objects (different threads will have
different contexts and thus different tables, so necessary per-thread
duplication should not become a problem) and general access to arbitrary
fields (mostly just parsing the strings)
This allows plist objects to be accessed directly from cexpr expressions
using struct.field syntax for dictionary objects and array[index] syntax
for array objects.
The node struct was 72 bytes thus two cache line. Moving the pointer
into the brush model data block allows nodes to fit in a single cache
line (not that they're aligned yet, but that's next). It doesn't seem to
have made any difference to performance (at least in the vulkan
renderer), but it hasn't hurt, either, as the only place that needed the
parent pointer was R_MarkLeaves.
It's not quite as expected, but that may be due to one of msaa, the 0-15
range in the palette not being all the way to white, the color gradients
being not quite linear (haven't checked yet) or some combination of the
above. However, it's that what should be yellow is more green. At least
the zombies are no longer white and the ogres don't look like they're
wearing skeleton suits.
Doesn't seem to make much difference performance-wise, but speed does
seem to be fill-rate limited due to the 8x msaa. Still, it does mean
fewer bindings to worry about.
This is a big step towards a cleaner api. The struct reference in
model_t really should be a pointer, but bsp submodel(?) loading messed
that up, though that's just a matter of taking more care in the loading
code. It seems sensible to make that a separate step.
I've decided that alias model skins should be in a single four-level
array texture rather than spread over four textures, but there's no way
I want to write that code again: getting it right was hard enough the
first time :P
It now takes a context pointer (opaque data) that holds the buffers it
uses for the temporary strings. If the context pointer is null, a static
context is used (making those uses of va NOT thread-safe). Most calls to
va use the static context, but all such calls have been formatted
consistently so they are easy to find when it comes time to do a full
audit.
It's a tad bogus as it's the lights close to the camera, but it should
at least be a good start once things are working. There's currently
something very wrong with the state of things.
The dynamic array macros made this much easier than last time I looked
at it, especially when it came to figuring out the bad memory accesses
that I seem to remember from my last attempt 9 years ago.
This makes tex_t more generally useable and probably more portable. The
goal was to be able to use tex_t with data that is in a separate chunk
of memory.
The sky texture is loaded with black's alpha set to 0. While this does
hit both layers, the screen is cleared to black so it shouldn't be a
problem (and will allow having a skybox behind the sheets).
Glow map and sky sheet and cube need to wait until I can get some
default textures going, but the world is rendering correctly otherwise
(though a tad dark: need to do a gamma setting).
It now uses the ring buffer code I wrote for qwaq (and forgot about,
oops) to handle the packets themselves, and the logic for allocating and
freeing space from the buffer is a bit simpler and seems to be more
reliable. The automated test is a bit of a joke now, though, but coming
up with good tests for it... However, nq now cycles through the demos
without obvious issue under the same conditions that caused the light
map update code to segfault.
Vulkan validation (quite rightly) doesn't like it when the flush range
goes past the end of the buffer, but also doesn't like it when the flush
range isn't cache-line aligned, so align the size of the buffer, too.
I had originally planned on mixing the stage management with general
texture support code like I did in glsl, but I think that was a mistake
and I did keep looking for scrap.[ch] when I wanted to edit something to
do with the scrap...
This cleans up texture_t and possibly even improves locality of
reference when running through texture chains (not profiled, and not
actually the goal).
It optionally generates mipmaps, and supports the main texture types
(especially for texture packs), including palettes, but is otherwise
rather unsophisticated code. Needs a lot of work, but testing first.
This allows the array in which the command buffers are allocated to be
allocated on the stack using alloca and thus remove the need to
malloc/free of relatively small chunks.
The scrap texture did very good things for the glsl renderer and the
better control over data copying might help it do even better things for
vulkan, especially with lots of little icons.
It's never actually used (the texture can be fetched using
GLSL_ScrapTexture) and gets in the way of sharing the scrap system with
the vulkan renderer.
r_screen because of SCR_UpdateScreen, and r_cvar because the cvars
really should never have been in a plugin in the first place (and
r_screen needed access).
First pixels! This was a nightmare of little issues that the validation
layers couldn't help with: incorrect input assembly, incorrect vertex
attribute specs. Though the layers did help with getting the queues
working. Still, lots of work to go but this is a major breakthrough as
I now have access to visual debugging for textures and the like.
Short wrappers for Draw functins are in vid_render_vulkan.c so the
vulkan context can be passed on to the actual functions. The 2D shaders
are set up similar to those in glsl, but with full 32-bit color (rgba)
support instead of paletted. However, the textures are not loaded yet,
nor is anything bound.
The way I wound up using the field meant that exprctx should not "own"
the hashlinks chain, but rather just point to it. This fixes the nasty
access errors I had.
Dependencies on vkparse.hinc were spreading through the code which I
didn't want as that removes a lot of the automation from the automake
files. This keeps all parser code internal to vkparse.c's scope, and any
accesses required for enum and struct (not yet) definitions can be
fetched by name.
I want to be able to use name references, but that requires string
items, so anything that would normally be dictionary or array (or
binary, even) would also need to accept string. This seemed to be the
cleanest solution. Any custom parser would then need to check the type
and act appropriately, but any inappropriate types have already been
pre-filtered by the standard parsers.
Care needs to be taken to ensure the right function is used with the
right arguments, but with these, the need to use qconj(d|f) for a
one-off inverse rotation is removed.
They're binned by powers of two (with in between sizes going to the
smaller bin should I make cache-line allocations NPOT (which I think
might be worthwhile). However, there seems to still be a bug somewhere
causing a nasty leak as now my hacked qfvis consumes 40G in less than a
minute.
The idea is to not search through blocks for an available allocation.
While the goal was to speed up allocation of cache lines of varying
cluster sizes, it's not enough due to fragmentation.
They take advantage of gcc's vector_size attribute and so only cross,
dot, qmul, qvmul and qrot (create rotation quaternion from two vectors)
are needed at this stage as basic (per-component) math is supported
natively by gcc.
The provided functions work on horizontal (array-of-structs) data, ie a
vec4d_t or vec4f_t represents a single vector, or traditional vector
layout. Vertical layout (struct-of-arrays) does not need any special
functions as the regular math can be used to operate on four vectors at
a time.
Functions are provided for loading a vec4 from a vec3 (4th element set
to 0) and storing a vec4 into a vec3 (discarding the 4th element).
With this, QF will require AVX2 support (needed for vec4d_t). Without
support for doubles, SSE is possible, but may not be worthwhile for
horizontal data.
Fused-multiply-add is NOT used because it alters the results between
unoptimized and optimized code, resulting in -mfma really meaning
-mfast-math-anyway. I really do not want to have to debug issues that
occur only in optimized code.
Shaders can be built as spv files and installed into
$libdir/quakeforge/shaders or as spvc files and compiled into the
engine. Loading supports $builtin/name to access builtin shaders,
$shader/path to access external standard shaders and quake filesystem
access for all other paths.
It is capable of parsing single expressions with fairly simple
operations. It current supports ints, enums, cvars and (external) data
structs. It is also thread-safe (in theory, needs proper testing) and
the memory it uses can be mass-freed.
This was inspired by
Hoard: A Scalable Memory Allocator
for Multithreaded Applications
Emery D. Berger, Kathryn S. McKinley, Robert D. Blumofe, Paul R.
Wilson,
It's not anywhere near the same implementation, but it did take a few
basic concepts. The idea is twofold:
1) A pool of memory from which blocks can be allocated and then freed
en-mass and is fairly efficient for small (4-16 byte) blocks
2) Tread safety for use with the Vulkan renderer (and any other
multi-threaded tasks).
However, based on the Hoard paper, small allocations are cache-line
aligned. On top of that, larger allocations are page aligned.
I suspect it would help qfvis somewhat if I ever get around to tweaking
qfvis to use cmem.
The calculation fails (produces NaN) if the vectors are anti-parallel,
but works for all other combinations. I came up with this implementation
when I discovered Unity's Quaternion.FromToRotation could did not work
with very small angles. This implementation will produce a usable
quaternion below 0.00255 degrees (though it will be slightly larger than
unit). Unity's failed such that I could see KSP's skybox snap while it
rotated around my test vessel.
PL_ParseDictionary itself does only one level, but it takes care of the
key-field mappings and property list item type checking leaving the
actual parsing to a helper specified by the field. That helper is free
to call PL_ParseDictionary recursively.
The first line of the parsed item is stored and can be retrieved using
PL_Line. Line numbers not stored for dictionary keys yet. Will be 0 for
any items generated by code rather than parsed from a file or string.
There's still some cleanup to do, but everything seems to be working
nicely: `make -j` works, `make distcheck` passes. There is probably
plenty of bitrot in the package directories (RPM, debian), though.
The vc project files have been removed since those versions are way out
of date and quakeforge is pretty much dependent on gcc now anyway.
Most of the old Makefile.am files are now Makemodule.am. This should
allow for new Makefile.am files that allow local building (to be added
on an as-needed bases). The current remaining Makefile.am files are for
standalone sub-projects.a
The installable bins are currently built in the top-level build
directory. This may change if the clutter gets to be too much.
While this does make a noticeable difference in build times, the main
reason for the switch was to take care of the growing dependency issues:
now it's possible to build tools for code generation (eg, using qfcc and
ruamoko programs for code-gen).
This should keep things nicely extensible, since additional data can be
done in the data space and found using defs. This gets the compilation
units into the sym file.
The compilation unit stores the directory from which qfcc was run and
any source files mentioned. This is similar to dwarf's compilation unit.
Right now, this is the only data in the new debug space, but more might
come in the future so it seems best to treat the debug space separately
in the object files.
I decided that stopping in between function calls that are on the same
line is a good thing as it gives a chance to skip over the first but
step into the second.
This allows a debugger to do any symbol lookups and other preparations
between loading progs and the first code execution. .ctors are called as
per normal if debug_handler is not set.