This fixes a bug when loading bsp29 files that resulted in leaf nodes
having bogus bounding boxes if any coordinates were negative (and thus
dynamic lights, and probably all sorts of other things) being broken.
And it took me only 9 years to notice :P
Without shadows, this is quite the cheat, but noclip is a cheat anyway,
so probably not that big a deal. It does, however, make noclip usable
for debugging.
Since vulkan supports 32-bit indexes, there's no need for the
shenanigans the EGL-based glsl renderer had to go through to render bsp
models (maps often had quite a bit more than 65536 vertices), though the
reduced GPU memory requirements of 16-bit indices does have its
advantages.
Any sun (a directional light) is in the outside node, which due to not
having its own PVS data is visible to all nodes, but that's a tad
excessive. However, any leaf node with sky surfaces will potentially see
any suns, and leaf nodes with no sky surfaces will see the sun only if
they can see a leaf that does have sky surfaces. This can be quite
expensive to calculate (already known to be moderately expensive for
just the camera leaf node (singular!) when checking for in-map lights)
Getting close to understanding (again) how it all works. I only just
barely understood when I got vulkan's renderer running, but I really
need to understand for when I modify things for shadows. The main thing
hurdle was tinst, but that was dealt with in the previous commit, and
now it's just sorting out the mess of elechains and elementss.
Its sole purpose was to pass the newly allocated instsurf when chaining
an instance model (ammo box, etc) surface, but using expresion
statements removes the need for such shenanigans, and even makes
msurface_t that little bit smaller (though a separate array would be
much better for cache coherence).
More importantly, the relevant code is actually easier to understand: I
spent way too long working out what tinst was for and why it was never
cleared.
This reduces the overhead needed to manage the memory blocks as the
blocks are guaranteed to be page-aligned. Also, the superblock is now
alllocated from within one of the memory blocks it manages. While this
does slightly reduce the available cachelines within the first block (by
one or two depending on 32 vs 64 bit pointers), it removes the need for
an extra memory allocation (probably via malloc) for the superblock.
The renderer's LineGraph now takes a height parameter, and netgraph now
uses cl_* cvars instead of r_* (which never really made sense),
including it's own height cvar (the render graphs still use
r_graphheight).
The uptime display had not been updated for the offset Sys_DoubleTime,
so add Sys_DoubleTimeBase to make it easy to use Sys_DoubleTime as
uptime.
Line up the layout of the client list was not consistent for drop and
qport.
The render plugins have made a bit of a mess of getting at the data and
thus it's a tad confusing how to get at it in different places. Really
needs a proper cleanup :(
conwidth and conheight have been moved into vid.conview (probably change
the name at some time), and scr_vrect has been replaced by a view as
well. This makes it much easier to create 2d elements that follow the
screen size (taking advantage of a view's gravity) which will, in the
end, make changing the window size easier.
One moves and resizes the view in one operation as a bit of an
optimization as moving and resizing both update any child views, and
this does only one update.
The other sets the gravity and updates any child views as their
absolute positions would change as well as the updated view's absolute
position.
It now processes 4 pixels at a time and uses a bit mask instead of a
conditional to set 3 of the 4 pixels to black. On top of the 4:1 pixel
processing and avoiding inner-loop conditional jumps, gcc unrolls the
loop, so Draw_FadeScreen itself is more than 4x as fast as it was. The
end result is about 5% (3fps) speedup to timedemo demo1 on my 900MHz
EEE Pc when nq has been hacked to always draw the fade-screen.
qwaq-curses has its place, but its use for running vkgen was really a
placeholder because I didn't feel like sorting out the different
initialization requirements at the time. qwaq-cmd has the (currently
unnecessary) threading power of qwaq-curses, but doesn't include any UI
stuff and thus doesn't need curses. The work also paves the way for
qwaq-x11 to become a proper engine (though sorting out its init will be
taken care of later).
Fixes#15.
This refactors (as such) keys.c so that it no longer depends on console
or gib, and pulls keys out of video targets. The eventual plan is to
move all high-level general input handling into libQFinput, and probably
low-level (eg, /dev/input handling for joysticks etc on Linux).
Fixes#8
I had forgotten to test with shared libs and it turns out jack and alsa
were directly accessing symbols in the renderer (and in jack's case,
linking in a duplicate of the renderer).
Fixes#16.
The JACK Audio Connection Kit support is now just an output target
rather than a full duplicate of the renderer (in pull mode). This is
what I wanted to to back when I first added jack support, but I needed
to get the renderer working asynchronously without affecting any of the
other outputs.
Fixes#16.
on_update is for pull-model outpput targets to do periodic synchronous
checks (eg, checking that the connection to the actual output device is
still alive and reviving it if necessary)
Output plugins can use either a push model (synchronous) or a pull
model (asynchronous). The ALSA plugin now uses the pull model. This
paves the way for making jack output a simple output plugin rather than
the combined render/output plugin it currently is (for #16) as now
snd_dma works with both models.
This gets the alsa target working nicely for mmapped outout. I'm not
certain, but I think it will even deal with NPOT buffer sizes (I copied
the code from libasound's sample pcm.c, thus the uncertainty).
Non-mmapped output isn't supported yet, but the alsa target now works
nicely for pull rendering.
However, some work still needs to be done for recovery failure: either
disable the sound system, or restart the driver entirely (preferable).
This brings the alsa driver in line with the jack render (progress
towards #16), but breaks most of the other drivers (for now: one step at
a time). The idea is that once the pull model is working for at least
one other target, the jack renderer can become just another target like
it should have been in the first place (but I needed to get the pull
model working first, then forgot about it).
Correct state checking is not done yet, but testsound does produce what
seems to be fairly good sound when it starts up correctly (part of the
state checking (or lack thereof), I imagine).
This failed with errors such as:
from ./include/QF/simd/vec4d.h:32,
from libs/util/simd.c:37:
./include/QF/simd/vec4d.h: In function ‘qmuld’:
/usr/lib/gcc/x86_64-pc-linux-gnu/10.3.0/include/avx2intrin.h:1049:1: error: inlining failed in call to ‘always_inline’ ‘_mm256_permute4x64_pd’: target specific option mismatch
1049 | _mm256_permute4x64_pd (__m256d __X, const int __M)
and rename the variable since it's not the size of the frame (may be
from the very early days of ALSA development, and I suspect the
terminology changed a bit).
The calculation was including the bits per sample, which makes no sense
as the period size determines the number of samples in a submission
chunk (and thus latency). For now, set it to around 5.5ms (will probably
need a cvar).
If Sys_Shutdown gets called twice, particularly if a shutdown callback
hangs and the program is killed with INT or QUIT, shutdown_list would be
in an invalid state. Thus, get the required data (function pointer and
data pointer) from the list element, then unlink the element before
calling the function. This ensures that a reinvocation of Sys_Shutdown
continues from the next callback or ends cleanly. Fixes a segfault when
killing testsound while using the oss output (it hangs on shutdown).
Fixes#12
However, this is a bit of a band-aid in that the code for global defs
seems redundant (there is very similar code a little above that is
always executed) and the code for field defs should probably be executed
unconditionally: I suspect the problem fixed by
d5454faeb7 still shows with game coded
compiled with recent versions of the compiler, I just haven't tested
any.
Support for finding the first address associated with a source line was
added to the engine, returning 0 if not found.
A temporary breakpoint is set and the progs allowed to run free.
However, better handling of temporary breakpoitns is needed as currently
a "permanent" breakpoint will be cleared without clearing the temporary
breakpoing if the permanent breakpoing is hit while execut-to-cursor is
running.
For now, just bsearch (normal and fuzzy), qsort, and prefixsum (not in
C's stdlib that I know of, but I think having native implementations of
float and int prefix sums will be useful.
Fuzzy bsearch is useful for finding an entry in a prefix sum array
(value is >= ele[0], < ele[1]), and the reentrant version is good when
data needs to be passed to the compare function. Adapted from the code
used in pr_resolve.
A bit of a mess for optimized vs unoptimized, but the tests acknowledge
the differences in precision while checking that the code produces the
right results allowing for that precision.
It seems that i686 code generation is all over the place reguarding sse2
vs fp, with the resulting differences in carried precision. I'm not sure
I'm happy with the situation, but at least it's being tested to a
certain extent. Not sure if this broke basic (no sse) i686 tests.
GCC does a fairly nice job of producing code for vector types when the
hardware doesn't support SIMD, but it seems to break certain math
optimization rules due to excess precision (?). Still, it works well
enough for the core engine, but may not be well suited to the tools.
However, so far, only qfvis uses vector types (and it's not tested yet),
and tools should probably be used on suitable machines anyway (not
forces, of course).
I don't know that the cache line size is 64 bytes on 32 bit systems, but
it should be ok to assume that 64-byte alignment behaves well on systems
with smaller cache lines so long as they are powers of two. This does
mean there is some waste on 32-bit systems, but it should be fairly
minimal (32 bytes per memblock, which manages page sized regions).
Legacy progs do not have the extended defs data (and usually won't have
anything more complicated than a vector), so use the basic type size for
the def size. Fixes broken edict prints.
Standard quake has just linear, but the modding community added inverse,
inverse-square (raw and offset (1/(r^2+1)), infinite (sun), and
ambient (minlight). Other than the lack of shadows, marcher now looks
really good.
Because LoadImage uses Hunk_TempAlloc, the face images need to be copied
individually. Really, what's neeeded is to be able to load the image
data into a pre-allocated buffer (ideally, the staging buffer for
vulkan, but that's for later).
Mostly, this gets the stage flags in with the barrier, but also adds a
couple more barrier templates. It should make for slightly less verbose
code, and one less opportunity for error (mismatched barrier/stages).
This gets the shaders needed for creating shadow maps, and the changes
to the lighting pipeline for binding the shadow maps, but no generation
or reading is done yet. It feels like parts of various systems are
getting a little big for their britches and I need to do an audit of
various things.
The built up "path" name of the handle resource was not always surviving
the intervening call to cexpr_eval_string (in particular, when other
handles were created in the process of creating a handle). Rather than
simply increase the number of va buffers (where would it end?), just
regenerate the path when adding the new handle. It's probably quick
enough, and the code is not usually not on a critical path.
I was reading about multi-pass rendering on mobile devices
(https://developer.oculus.com/blog/loads-stores-passes-and-advanced-gpu-pipelines/)
and discovered that I had used the wrong flags (but then, I think Graham
Sellers had, too, since used his Vulkan Programming Guide as a
reference). Doesn't seem to make any difference on desktop, but as
there's no loss there, but potential gains on mobile, I'd say it's a
win.
QF now uses its own configuration file (quakeforge.cfg for now) rather
than overwriting config.cfg so that people trying out QF in their normal
quake installs don't trash their config.cfg for other quake clients. If
quakeforge.cfg is present, all other config files are ignored except
that quake.rc is scanned for a startdemos command and that is executed.
And improve the generated code for MSG_ReadShort
I suspect gcc didn't like all the excess pointer dereferences and so
couldn't assume that the bytes were being read sequentially.
And improve the generated code as well (ie, use a code sequence that gcc
recognizes and optimizes to a single 32-bit read and a byte-swap).
nq uses big-endian for its packet headers (arg, though it is consistent
with IP, it's not with the rest of quake).
This fixes the textures (and presumably mesh data) being deleted while
still in use. Oddly, the wait was needed in both brush and alias models
(I expected brush to always come first).
I'm not sure that the mismatch between refdef_t and the assembly defines
was a problem (many fields unused), but the main problem was due to
execute permission on the pages: one chunk of asm was in the data
section, and the patched code was not marked as being executable (due to
such a thing not existing when quake was written).
This is a bit of a hack for now (need to look into maybe using cmem),
but it gets 32-bit windows working for all but the software renderer
(probably just refdef (and maybe viddef) getting out of sync with the
assembly code.
This ensures that fov_y is not calculated until after the render view
size is known and thus doesn't become some crazy angle (that happens to
result in a negative tan). Fixes upside-down-quake :)
vid.aspect is removed (for now) as it was not really the right idea (I
really didn't know what I was doing at the time). Nicely, this *almost*
fixes the fov bug on fresh installs: the view is now properly
upside-down rather than just flipped vertically (ie, it's now rotated
180 degrees).
Not only does it makes sense to centralize the setting of viewport and
scissor, but it's actually necessary in order to fix the upside-down
rendering on windows.
This gets the GL and GLSL renderers working for the -win targets... sort
of: they are upside down and GLSL's bsp surfaces are black (same as
Vulkan). However, with this, all 5 renderers at least limp along for
-win, 4/5 work for -sdl.
It turns out the dd and dib "driver" code is very specific to the
software renderer. This does not fix the segfault on changing video
mode, but I do know where the problem lies: the window is being
destroyed and recreated without recreating the buffers. I suspect a
clean solution to this will allow for window resizing in X as well.
Only 64-bit windows is tested, and there are still various failures, but
QF is limping along in windows again.
nq-sdl works for sw, and sw32, gl and glsl are mostly black (but not
entirely for gl?), vulkan is not supported with sdl.
nq-win works for sw and sw32, and sort of for vulkan (very dark and
upside-down?). gl and glsl complain about vid mode,
qw-client-[sdl,win] seem to be the same, but something is wrong with the
console (reading keyboard input).
The merge with the improvements I made while hacking on csqc (still
undecided as to whether to continue that project) resulted in the size
of the progs string area getting mangled when no heap was allocated for
the progs due to a null zone pointer being used in some pointer
arithmetic. Fixes random(!!!) invalid string error in qfprogs.
While this caused some trouble for pr_strings and configurable strftime
(evil hacks abound), it's the result of discovering an ancient (from
maybe as early as 2004, definitely before 2012) bug in qwaq's printing
that somehow got past months of trial-by-fire testing (origin understood
thanks to the warning finding it).
It looks like choosing a visual is not necessary (at least for normal
apps, VR might be another matter). Still no idea if anything works (for
-win support in general, let alone vulkan).
This is for the conversion /to/ paletted textures. The conversion is
necessary for csqc support. In the process, the conversion has been sped up
by implementing a color cache for the conversion process. I haven't
measured the difference yet, but Mr Fixit does seem to load much faster for
the sw renderer than it did before the change (many months old memory).
This separate the FOV calculations from other refdef calcs, cleaning up the
renderer proper and making it easier for other parts of the engine (eg,
csqc) to update the fov.
The server edict arrays are now stored outside of progs memory, only the
entity data itself (ie data accessible to progs via ent.fld) is stored in
progs memory. Many of the changes were due to code accessing edicts and
entity fields directly rather than through the provided macros.
Loading is broken for multi-file image sets due to the way images are
loaded (this needs some thought for making it effecient), but the
Blender environment map loading works.
They're unlit (fullbright, but that's nothing new for quake), but
working nicely. As a bonus, sort out the sky pass (forced to due to the
way command buffers are used).
There were actually several problems: translucency wasn't using or
depending on the depth buffer, and the depth buffer wasn't marked as
read-only in the g-buffer pass. Getting that correct seems to have given
bigass1 a 0.5% boost (hard to say, could be the usual noise).
While being able to write pipeline specs like this was the end goal of
the parsing sub-project, I didn't realize it was already usable. This
sure makes going through the pipeline specs much easier.
That was... easier than expected. A little more tedious that I would
have liked, but my scripting system isn't perfect (I suspect it's best
suited as the output of a code generator), and the C side could do with
a little more automation.
Other than dealing with shader data alignment issues, that went well :).
Nicely, the implementation gets the explicit scaling out of the shader,
and allows for a directional flag.
The transforms aren't actually freed at the end (more work), but at
least they aren't lost any more, though one is still lost for the
viewent (weapon). The obvious fix didn't work.
I never liked that some of the macros needed the type as a parameter
(yay typeof and __auto_type) or those that returned a value hid the
return statement so they couldn't be used in assignments.
Still "some" more to go: a pile to do with transforms and temporary
entities, and a nasty one with host_cbuf. There's also all the static
block-alloc lists :/
Light styles and shadows aren't implemented yet.
The map's entities are used to create the lights, and the PVS used to
determine which lights might be visible (ie, the surfaces they light).
That could do with some more improvements (eg, checking if a leaf is
outside a spotlight's cone), but the concept seems to work.
Double benefit, actually: faster when building a fat PVS (don't need to
copy as much) and can be used in multiple threads. Also, default visiblity
can be set, and the buffer size has its own macro.
Useful for avoiding a pile of wrapper functions that merely pass on
command-specific data to the actual implementation. Used to clean up the
wrappers in nq and qw cl_input.c
This is the first step towards component-based entities.
There's still some transform-related stuff in the struct that needs to
be moved, but it's all entirely client related (rather than renderer)
and will probably go into a "client" component. Also, the current
components are directly included structs rather than references as I
didn't want to deal with the object management at this stage.
As part of the process (because transforms use simd) this also starts
the process of moving QF to using simd for vectors and matrices. There's
now a mess of simd and sisd code mixed together, but it works
surprisingly well together.
The plan is to have a fully component based entity system. This adds
hierarchical transforms. Not particularly useful for quake itself at
this stage, but it will allow for much more flexibility later on,
especially when QuakeForge becomes more general-purpose.
This seems to be pretty close to as fast as it gets (might be able to do
better with some shuffles of the negation constants instead of loading
separate constants).
It's not used yet as work needs to be done to better support generic
entities, but this is the next step to real-time lighting (though, to be
honest, I expect it will be too slow to be usable).
The main purpose is to allow fluent-style:
const char *targetname = PL_String (PL_ObjectForKey (entity, "targetname"));
if (targetname && !PL_ObjectForKey (targets, targetname)) {
PL_D_AddObject (targets, targetname, entity);
}
[note: the above is iffy due to ownership of entity, but the code from
which the above comes works around the issue]
There's still the memory management itself to clean up, but the main
code no longer uses any static/global variables (holdover from when the
function was recursive rather).
The static libs are used to build the plugins, but make it easy to use
only those modules needed for tests. Fixes the link error when running
"make check" with non-static plugins.
Static lights are yet to come (so the screen is black most of the time),
but dynamic lights work very nicely (and look very good) despite the
falloff being incorrect.
While I could reconstruct the position from the screen coords and depth,
this is easier and good enough for now. Reconstruction is an
optimization thing.
Lighting doesn't actually do lights yet, but it's producing pixels.
Translucent seems to be working (2d draw uses it), and compose seems to
be working.
This gets the alias model render pass and pipeline passing validation.
I don't know why I didn't add the subpass field to the
VkGraphicsPipelineCreateInfo parser def, though it could be I simply
missed it, or I thought I wouldn't need it at the time.
Due to wanting to access array sizes when parsing uint32_t type values,
parse_uint32_t needs to handle size_t values even though it throws out
any excess bits.
After getting lights even vaguely working for alias models, I realized
that it just wasn't going to be feasible to do nice lighting with
forward rendering. This gets the bulk of the work done for deferred
rendering, but still need to sort out the shaders before any real
testing can be done.
Setting the result type cexpr_exprval tells cexpr to simply return whoe
exprval object rather than the referenced value, thus allowing the
caller to check the type when the expression is context sensitive.
The order in which keys are added to the dictionary object is
maintained. Adding a key after removing an old key adds the new key to
the end of the list rather than reusing the old key's spot.
PL_ParseLabeledArray works the same way as PL_ParseArray, but instead
takes a dictionary object. The keys of the items are ignored, and the
order is not preserved (at this stage), but this is a cleaner solution
to getting an array of objects when the definitions of those objects
need to be accessible by name as well.
Never really wanted in the first place (back when I did the plugin
renderers), but I didn't feel like doing the required work to avoid it
at the time. At least with Vulkan being a fresh start in an environment
that's already plugin-friendly, there was no real work involved. I'll
get to the other renderers eventually (especially now that I know gdb
does the right thing when there are multiple functions with the same
name).
It turns out I had conflated frame buffers with frames and wound up
making a minor mess when separating the number of frames the renderer
could have in flight from the number of swap-chain images. This is the
first step towards correcting that mistake.
It's not entirely there yet, but the basics are working. Work is still
needed for avoiding duplication of objects (different threads will have
different contexts and thus different tables, so necessary per-thread
duplication should not become a problem) and general access to arbitrary
fields (mostly just parsing the strings)
This allows plist objects to be accessed directly from cexpr expressions
using struct.field syntax for dictionary objects and array[index] syntax
for array objects.
The node struct was 72 bytes thus two cache line. Moving the pointer
into the brush model data block allows nodes to fit in a single cache
line (not that they're aligned yet, but that's next). It doesn't seem to
have made any difference to performance (at least in the vulkan
renderer), but it hasn't hurt, either, as the only place that needed the
parent pointer was R_MarkLeaves.
It's not quite as expected, but that may be due to one of msaa, the 0-15
range in the palette not being all the way to white, the color gradients
being not quite linear (haven't checked yet) or some combination of the
above. However, it's that what should be yellow is more green. At least
the zombies are no longer white and the ogres don't look like they're
wearing skeleton suits.
Doesn't seem to make much difference performance-wise, but speed does
seem to be fill-rate limited due to the 8x msaa. Still, it does mean
fewer bindings to worry about.
This is a big step towards a cleaner api. The struct reference in
model_t really should be a pointer, but bsp submodel(?) loading messed
that up, though that's just a matter of taking more care in the loading
code. It seems sensible to make that a separate step.
Probably not really necessary, but I think I found a small opportunity
for a buffer overflow in there while I was modifying the code, so this
is probably better anyway.