This needed changing Vulkan_CreatePipeline to
Vulkan_CreateGraphicsPipeline for consistency (and parsing the
difference from a plist seemed... not worth thinking about).
It turned out the bindless approach wouldn't work too well for my design
of the sprite objects, but I don't think that's a big issue at this
stage (and it seems bindless is causing problems for brush/alias
rendering via renderdoc and on my versa pro). However, I have figured
out how to make effective use of descriptor sets (finally :P).
The actual normal still needs checking, but the sprites are currently
unlit so not an issue at this stage.
I'm not at all sure what I was thinking when I designed it, but I
certainly designed it wrong (to the point of being fairly useless). It
turns out memory requirements are already aligned in size (so just
multiplying is fine), and what I really wanted was to get the next
offset aligned to the given requirements.
The vertices and frame images are loaded into the one memory object,
with the vertices first followed by the images.
The vertices are 2D xy+uv sets meant to be applied to the model
transform frame, and are pre-computed for the sprite size (this part
does support sprites with varying frame image sizes).
The frame images are loaded into one image with each frame on its own
layer. This will cause some problems if any sprites with varying frame
image sizes are found, but the three sprites in quake are all uniform
size.
As much as it can be since the texture data is interleaved with the
model data in the files (I guess not that bad a design for 25 years ago
with the tight memory constraints), but this paves the way for
supporting sprites in Vulkan.
This is needed for cleaning up excess memsets when loading files because
Hunk_RawAllocName has nonnull on its hunk pointer (as the rest of the
hunk functions really should, but not just yet).
In trying to reduce unnecessary memsets when loading files, I found that
Hunk_RawAllocName already had nonnull on it, so quakefs needed to know
the hunk it was to use. It seemed much better to to go this way (first
step in what is likely to be a lengthy process) than backtracking a
little and removing the nonnull attribute.
As the sw renderer's implementation was the closest to id's, it was used
as the model (thus a fair bit of cleanup is still needed). This fixes
some incorrect implementations in glsl and gl.
This gets the alias pipeline in line with the bsp pipeline, and thus
everything is about as functional as it was before the rework (minus
dealing with large texture sets).
I guess it's not quite bindless as the texture index is a push constant,
but it seems to work well (and I may have fixed some full-bright issues
by accident, though I suspect that's just my imagination, but they do
look good).
This should fix the horrid frame rate dependent behavior of the view
model.
They are also in their own descriptor set so they can be easily shared
between pipelines. This has been verified to work for Draw.
BSP textures are now two-layered with the albedo and emission in the two
layers rather than two separate images. While this does increase memory
usage for the textures themselves (most do not have fullbright pixels),
it cuts down on image and image view handles (and shader resources).
For now, just dot product, trig, and min/max/bound, but it works well as
a proof of concept. The main goal was actually min. Only the list of
symbols is provided, it is the user's responsibility to set up the
symbol table and context.
cexpr's symbol tables currently aren't readily extended, and dynamic
scoping is usually a good thing anyway. The chain of contexts is walked
when a symbol is not found in the current context's symtab, but minor
efforts are made to avoid checking the same symtab twice (usually cased
by cloning a context but not updating the symtab).
Multiple render passes are needed for supporting shadow mapping, and
this is a huge step towards breaking the Vulkan render free of Quake,
and hopefully will lead the way for breaking the GL renderers free as
well.
This is actually a better solution to the renderer directly accessing
client code than provided by 7e078c7f9c.
Essentially, V_RenderView should not have been calling R_RenderView, and
CL_UpdateScreen should have been calling V_RenderView directly. The
issue was that the renderers expected the world entity model to be valid
at all times. Now, R_RenderView checks the world entity model's validity
and immediately bails if it is not, and R_ClearState (which is called
whenever the client disconnects and thus no longer has a world to
render) clears the world entity model. Thus R_RenderView can (and is)
now called unconditionally from within the renderer, simplifying
renderer-specific variants.
When allocating memory for multiple objects that have alignment
requirements, it gets tedious keeping track of the offset and the
alignment. This is a simple function for walking the offset respecting
size and alignment requirements, and doubles as a size calculator.
I'm not sure what I was thinking when I made PL_RemoveObjectForKey take
a const plitem. One of those times where C could do with being a little
more strict.
The stack is arbitrary strings that the validation layer debug callback
prints in reverse order after each message. This makes it easy to work
out what nodes in a pipeline/render pass plist are causing validation
errors. Still have to narrow down the actual line, but the messages seem
to help with that.
Putting qfvPushDebug/qfvPopDebug around other calls to vulkan should
help out a lot, tool.
As a bonus, the stack is printed before debug_breakpoint is called, so
it's immediately visible in gdb.
I'm not at all happy with con_message and con_menu, but fixing them
properly will take a rework of the menus (planned, though). Also, the
Menu_ console command implementations are a bit iffy and could also do
with a rewrite (probably part of the rest of the menu rework) or just
nuking (they were part of Johnny on Flame's work, so I suspect had
something to do with joystick bindings).
It seems X11 does not like creating barriers entirely off the screen,
though the error seems to be a little unreliable (however, off the left
edge was definitely bad).
An imt switcher automatically changes the context's active imt based on
a user specified list of binary inputs. The inputs may be either buttons
(indicated as +button) or cvars (bare name). For buttons, the
pressed/not pressed state is used, and cvars are interpreted as ints
being 0 or not 0. The order of the inputs determines the bit number of
the input, with the first input being bit 0, second bit 1, third bit 2
etc. A default imt is given so large switchers do not need to be fully
configured (the default imt is written to all states).
A context can have any number of switchers attached. The switchers can
wind up fighting over the active imt, but this seems to be something for
the "user" (eg, configuration system) to sort out rather than the
switcher code enforcing anything.
As a result of the inputs being treated as bits, a switcher with N
inputs will have 2**N states, thus there's a maximum of 16 inputs for
now as 65536 states is a lot of configuration.
Using a switcher, setting up a standard strafe/mouse look configuration
is fairly easy.
imt_create key_game imt_mod
imt_create key_game imt_mod_strafe imt_mod
imt_create key_game imt_mod_freelook
imt_create key_game imt_mod_lookstrafe imt_mod_freelook
imt_switcher_create mouse key_game imt_mod_strafe +strafe lookstrafe +mlook freelook
imt_switcher 0 imt_mod 2 imt_mod 4 imt_mod_freelook 8 imt_mod_freelook 12 imt_mod_freelook
imt_switcher 6 imt_mod_lookstrafe 10 imt_mod_lookstrafe 14 imt_mod_lookstrafe
in_bind imt_mod mouse axis 0 move.yaw
in_bind imt_mod mouse axis 1 move.forward
in_bind imt_mod_strafe mouse axis 0 move.side
in_bind imt_mod_lookstrafe mouse axis 0 move.side
in_bind imt_mod_freelook mouse axis 1 move.pitch
This takes advantage of imt chaining and the default imt for the
switcher (there are 8 states that use imt_mod_strafe).
The switcher name must be unique across all contexts, and every imt used
in a switcher must be in the switcher's context.
The listener is invoked when the axis value changes due to IN_UpdateAxis
or IN_ClampAxis updating the axis. This does mean the listener
invocation make be somewhat delayed. I am a tad uncertain about this
design thus it being a separate commit.
Listeners are separate to the main callback as listeners have only
read-only access to the objects, but the main callback is free to modify
the cvar and thus can act as a parser and validator. The listeners are
invoked after the main callback if the cvar is modified. There does not
need to be a main callback for the listeners to be invoked.
This allows id1/qw config files, and to a certain extent scripts, to
work with the new binding system. It does highlight just how limited the
original system was (many keys could not bound).
Mouse axis input does not work yet as that needs a little more work to
support +strafe and +mlook.
I decided cvars and input buttons/axes need listeners so any changes to
them can be propagated. This will make using cvars in bindings feasible
and I have an idea for automatic imt switching that would benefit from
listeners attached to buttons and cvars.
Combining absolute and relative inputs at the binding does not work well
because absolute inputs generally update only when the physical input
updates, so clearing the axis input each frame results in a brief pulse
from the physical input, but relative inputs must be cleared each frame
(where frame here is each time the axis is read) but must accumulate the
relative updates between frames.
Other than the axis mode being incorrect, this seems to work quite
nicely.
This should be a much friendlier way of "grabbing" input, though I
suspect that using raw keyboard events will result in a keyboard grab,
which is part of the reason for wanting a friendly grab.
There does seem to be a problem with the mouse sneaking out of the
top-right and bottom-left corners. I currently suspect a bug in the X
server, but further investigation is needed.
This is the first step in the long-sought goal of allowing the window
size to change, but is required for passing on getting window position
and size information (though size is in viddef, it makes sense to pass
both together).
There's now IN_X11_Preinit, IN_X11_Postinit (both for want of better
names), and in_x11_init. The first two are for taking care of
initialization that needs to be done before window creation and between
window creation and mapping (ie, are very specific to X11 stuff) while
in_x11_init takes care of the setup for the input system. This proved
necessary in my XInput experimentation: a passive enter grab takes
effect only when the pointer enters the window, thus setting up the grab
with the pointer already in the window has no effect until the pointer
leaves the window and returns.
Input driver can now have an optional init_cvars function. This allows
them to create all their cvars before the actual init pass thus avoiding
some initialization order interdependency issues (in this case, fixing a
segfault when starting x11 clients fullscreen due to the in_dga cvar not
existing yet).
keyhelp provides the input name if it is known, and in_bind tries to use
the provided input name if not a number. Case sensitivity for name
lookups is dependent on the input driver.
There's now an internal event handler for taking care of device addition
and removal, and a public event handler for dealing with device input
events in various contexts In particular, so the clients can check for
the escape key.
While the console command line is quite good for setting everything up,
the devices being bound do need to be present when the commands are
executed (due to needing extra data provided by the devices). Thus
property lists that store the extra data (button and axis counts, device
names/ids, connection names, etc) seems to be the best solution.
The mouse bound to movement axes works (though signs are all over the
place, so movement direction is a little off), and binding F10 (key 68)
to quit works :)
Each axis binding has its own recipe (meaning the same input axis can be
interpreted differently for each binding)
Recipes are specified with field=value pairs after the axis name.
Valid fields are minzone, maxzone, deadzone, curve and scale, with
deadzone doubling as a balanced/unbalanced flag.
The default recipe has no zones, is balanced, and curve and scale are 1.
Hot-plug support is done via "connections" (not sure I'm happy with the
name) that provide a user specifiable name to input devices. The
connections record the device name (eg, "6d spacemouse") and id (usually
usb path for evdev devices, but may be the device unique id if
available) and whether automatic reconnection should match just the
device name or both device name and id (prevents problems with changing
the device connected to the one usb port).
Unnecessary enum removed, and the imt block struct moved to imt.c
(doesn't need to be public). Also, remove device name from the imt block
(and thus the parameter to the functions) as it turns out not to be
needed.
in_bind is only partially implemented (waiting on imt), but device
listing, device naming, and input identification are working. The event
handling system made for a fairly clean implementation for input
identification thanks to the focused event handling.
This has smashed the keydest handling for many things, and bindings, but
seems to be a good start with the new input system: the console in
qw-client-x11 is usable (keyboard-only).
The button and axis values have been removed from the knum_t enum as
mouse events are separate from key events, and other button and axis
inputs will be handled separately.
keys.c has been disabled in the build as it is obsolute (thus much of
the breakage).
For the mouse in x11, I'm not sure which is more cooked: deltas or
window-relative coordinates, but I don't imagine that really matters too
much. However, keyboard and mouse events suitable for 2D user interfaces
are sent at the same time as the more game oriented button and axis events.
Input Mapping Tables are still at the core as they are a good concept,
however they include both axis and button mappings, and the size is not
hard-coded, but dependent on the known devices. Not much actually works
yet (nq segfaults when a key is pressed).
kbutton_t is now in_button_t and has been moved to input.h. Also, a
button registration function has been added to take care of +button and
-button command creation and, eventually, direct binding of "physical"
buttons to logical buttons. "Physical" buttons are those coming in from
the OS (keyboard, mouse, joystick...), logical buttons are what the code
looks at for button state.
Additionally, the button edge detection code has been cleaned up such
that it no longer uses magic numbers, and the conversion to a float is
cleaner. Interestingly, I found that the handling is extremely
frame-rate dependent (eg, +forward will accelerate the player to full
speed much faster at 72fps than it does at 20fps). This may be a factor
in why gamers are frame rate obsessed: other games doing the same thing
would certainly feel different under varying frame rates.
For drivers that support it. Polling is still supported and forces the
select timeout to 0 if any driver requires polling. For now, the default
timeout when all drivers use select is 10ms.
Removing the device from the devices list after closing the device
could cause the device to be double-freed if something went wrong in the
device removal callback resulting in system shutdown which would then
close all open devices.
The device is removed from the list before the callback is called.
There's still a small opportunity for such in a multi-threaded
environment, but that would take device removal occurring at the same
time as the input system is shut down. Probably the responsibility of
the threaded environment rather than inputlib.
I had forgotten that _size was the number of rows in the map, not the
number of objects (1024 objects per row). This fixes the missed device
removal messages. And probably a slew of other bugs I'd yet to encounter
:P
This includes device add and remove events, and axis and buttons for
evdev. Will need to sort out X11 input later, but next is getting qwaq
responding.
The common input code (input outer loop and event handling) has been
moved into libQFinput, and modified to have the concept of input drivers
that are registered by the appropriate system-level code (x11, win,
etc).
As well, my evdev input library code (with hotplug support) has been
added, but is not yet fully functional. However, the idea is that it
will be available on all systems that support evdev (Linux, and from
what I've read, FreeBSD).
At the low level, only unions can cause a set to grow. Of course, things
get interesting at the higher level when infinite (inverted) sets are
mixed in.
Instead of printing every representable member of an infinite set (ie,
up to element 63 in a set that can hold 64 elements), only those
elements up to one after the last non-member are listed. For example,
{...} - {2 3} -> {0 1 4 ...}
This makes reading (and testing!) infinite sets much easier.
Most of the set ops were always endian-agnostic since they were simply
operating on multiple bits in parallel, but individual element
add/remove/test was very endian-dependent. For the most part, this
didn't matter, but it does matter very much when loading external data
into a set or writing the data out (eg, for PVS).
For now, the functions check for a null hunk pointer and use the global
hunk (initialized via Memory_Init) if necessary. However, Hunk_Init is
available (and used by Memory_Init) to create a hunk from any arbitrary
memory block. So long as that block is 64-byte aligned, allocations
within the hunk will remain 64-byte aligned.
The output fat-pvs data is the *difference* between the base pvs and fat
pvs. This currently makes for about 64kB savings for marcher.bsp, and
about 233MB savings for ad_tears.bsp (or about 50% (470.7MB->237.1MB)).
I expect using utf-8 encoding for the run lengths to make for even
bigger savings (the second output fat-pvs leaf of marcher.bsp is all 0s,
or 6 bytes in the file, which would reduce to 3 bytes using utf-8).
The fact that numleafs did not include leaf 0 actually caused in many
places due to never being sure whether to add 1. Hopefully this fixes
some of the confusion. (and that comment in sv_init didn't last long :P)
After seeing set_size and thinking it redundant (thought it returned the
capacity of the set until I checked), I realized set_count would be a
much better name (set_count (node->successors) in qfcc does make much
more sense).
Modern maps can have many more leafs (eg, ad_tears has 98983 leafs).
Using set_t makes dynamic leaf counts easy to support and the code much
easier to read (though set_is_member and the iterators are a little
slower). The main thing to watch out for is the novis set and the set
returned by Mod_LeafPVS never shrink, and may have excess elements (ie,
indicate that nonexistent leafs are visible).
Having set_expand exposed is useful for loading data into a set.
However, it turns out there was a bug in its size calculation in that
when the requested set size was a multiple of SET_BITS (and greater than
the current set size), the new set size one be SET_BITS larger than
requested. There's now some tests for this :)
Quake just looked wrong without the view model. I can't say I like the
way the depth range is hacked, but it was necessary because the view
model needs to be processed along with the rest of the alias models
(didn't feel like adding more command buffers, which I imagine would be
expensive with the pipeline switching).
The recent changes to key handling broke using escape to get out of the
console (escape would toggle between console and menu). Thus take care
of the menu (escape) part of the coupling FIXME by implementing a
callback for the escape key (and removing key_togglemenu) and sorting
out the escape key handling in console. Seems to work nicely
This sorts out the unwanted use of R_EnqueueEntity, which will help with
removing another global (r_ent_queue), which is necessary for threaded
multi-pass rendering (ie, shadows).
Since vulkan supports 32-bit indexes, there's no need for the
shenanigans the EGL-based glsl renderer had to go through to render bsp
models (maps often had quite a bit more than 65536 vertices), though the
reduced GPU memory requirements of 16-bit indices does have its
advantages.
Any sun (a directional light) is in the outside node, which due to not
having its own PVS data is visible to all nodes, but that's a tad
excessive. However, any leaf node with sky surfaces will potentially see
any suns, and leaf nodes with no sky surfaces will see the sun only if
they can see a leaf that does have sky surfaces. This can be quite
expensive to calculate (already known to be moderately expensive for
just the camera leaf node (singular!) when checking for in-map lights)
Getting close to understanding (again) how it all works. I only just
barely understood when I got vulkan's renderer running, but I really
need to understand for when I modify things for shadows. The main thing
hurdle was tinst, but that was dealt with in the previous commit, and
now it's just sorting out the mess of elechains and elementss.
Its sole purpose was to pass the newly allocated instsurf when chaining
an instance model (ammo box, etc) surface, but using expresion
statements removes the need for such shenanigans, and even makes
msurface_t that little bit smaller (though a separate array would be
much better for cache coherence).
More importantly, the relevant code is actually easier to understand: I
spent way too long working out what tinst was for and why it was never
cleared.
This reduces the overhead needed to manage the memory blocks as the
blocks are guaranteed to be page-aligned. Also, the superblock is now
alllocated from within one of the memory blocks it manages. While this
does slightly reduce the available cachelines within the first block (by
one or two depending on 32 vs 64 bit pointers), it removes the need for
an extra memory allocation (probably via malloc) for the superblock.
The renderer's LineGraph now takes a height parameter, and netgraph now
uses cl_* cvars instead of r_* (which never really made sense),
including it's own height cvar (the render graphs still use
r_graphheight).
The uptime display had not been updated for the offset Sys_DoubleTime,
so add Sys_DoubleTimeBase to make it easy to use Sys_DoubleTime as
uptime.
Line up the layout of the client list was not consistent for drop and
qport.
conwidth and conheight have been moved into vid.conview (probably change
the name at some time), and scr_vrect has been replaced by a view as
well. This makes it much easier to create 2d elements that follow the
screen size (taking advantage of a view's gravity) which will, in the
end, make changing the window size easier.
One moves and resizes the view in one operation as a bit of an
optimization as moving and resizing both update any child views, and
this does only one update.
The other sets the gravity and updates any child views as their
absolute positions would change as well as the updated view's absolute
position.
This refactors (as such) keys.c so that it no longer depends on console
or gib, and pulls keys out of video targets. The eventual plan is to
move all high-level general input handling into libQFinput, and probably
low-level (eg, /dev/input handling for joysticks etc on Linux).
Fixes#8
I had forgotten to test with shared libs and it turns out jack and alsa
were directly accessing symbols in the renderer (and in jack's case,
linking in a duplicate of the renderer).
Fixes#16.
on_update is for pull-model outpput targets to do periodic synchronous
checks (eg, checking that the connection to the actual output device is
still alive and reviving it if necessary)
Output plugins can use either a push model (synchronous) or a pull
model (asynchronous). The ALSA plugin now uses the pull model. This
paves the way for making jack output a simple output plugin rather than
the combined render/output plugin it currently is (for #16) as now
snd_dma works with both models.
This gets the alsa target working nicely for mmapped outout. I'm not
certain, but I think it will even deal with NPOT buffer sizes (I copied
the code from libasound's sample pcm.c, thus the uncertainty).
Non-mmapped output isn't supported yet, but the alsa target now works
nicely for pull rendering.
However, some work still needs to be done for recovery failure: either
disable the sound system, or restart the driver entirely (preferable).
This brings the alsa driver in line with the jack render (progress
towards #16), but breaks most of the other drivers (for now: one step at
a time). The idea is that once the pull model is working for at least
one other target, the jack renderer can become just another target like
it should have been in the first place (but I needed to get the pull
model working first, then forgot about it).
Correct state checking is not done yet, but testsound does produce what
seems to be fairly good sound when it starts up correctly (part of the
state checking (or lack thereof), I imagine).
This failed with errors such as:
from ./include/QF/simd/vec4d.h:32,
from libs/util/simd.c:37:
./include/QF/simd/vec4d.h: In function ‘qmuld’:
/usr/lib/gcc/x86_64-pc-linux-gnu/10.3.0/include/avx2intrin.h:1049:1: error: inlining failed in call to ‘always_inline’ ‘_mm256_permute4x64_pd’: target specific option mismatch
1049 | _mm256_permute4x64_pd (__m256d __X, const int __M)
Support for finding the first address associated with a source line was
added to the engine, returning 0 if not found.
A temporary breakpoint is set and the progs allowed to run free.
However, better handling of temporary breakpoitns is needed as currently
a "permanent" breakpoint will be cleared without clearing the temporary
breakpoing if the permanent breakpoing is hit while execut-to-cursor is
running.
For now, just bsearch (normal and fuzzy), qsort, and prefixsum (not in
C's stdlib that I know of, but I think having native implementations of
float and int prefix sums will be useful.
Fuzzy bsearch is useful for finding an entry in a prefix sum array
(value is >= ele[0], < ele[1]), and the reentrant version is good when
data needs to be passed to the compare function. Adapted from the code
used in pr_resolve.
GCC does a fairly nice job of producing code for vector types when the
hardware doesn't support SIMD, but it seems to break certain math
optimization rules due to excess precision (?). Still, it works well
enough for the core engine, but may not be well suited to the tools.
However, so far, only qfvis uses vector types (and it's not tested yet),
and tools should probably be used on suitable machines anyway (not
forces, of course).
I don't know that the cache line size is 64 bytes on 32 bit systems, but
it should be ok to assume that 64-byte alignment behaves well on systems
with smaller cache lines so long as they are powers of two. This does
mean there is some waste on 32-bit systems, but it should be fairly
minimal (32 bytes per memblock, which manages page sized regions).
The Blend macro supports any non-integral type supporting * and +
(float, double, vec4f_t, etc), so it is essentially a scalar VectorBlend
or QuatBlend.
Standard quake has just linear, but the modding community added inverse,
inverse-square (raw and offset (1/(r^2+1)), infinite (sun), and
ambient (minlight). Other than the lack of shadows, marcher now looks
really good.
Mostly, this gets the stage flags in with the barrier, but also adds a
couple more barrier templates. It should make for slightly less verbose
code, and one less opportunity for error (mismatched barrier/stages).
This gets the shaders needed for creating shadow maps, and the changes
to the lighting pipeline for binding the shadow maps, but no generation
or reading is done yet. It feels like parts of various systems are
getting a little big for their britches and I need to do an audit of
various things.
QF now uses its own configuration file (quakeforge.cfg for now) rather
than overwriting config.cfg so that people trying out QF in their normal
quake installs don't trash their config.cfg for other quake clients. If
quakeforge.cfg is present, all other config files are ignored except
that quake.rc is scanned for a startdemos command and that is executed.
And improve the generated code for MSG_ReadShort
I suspect gcc didn't like all the excess pointer dereferences and so
couldn't assume that the bytes were being read sequentially.
And improve the generated code as well (ie, use a code sequence that gcc
recognizes and optimizes to a single 32-bit read and a byte-swap).
nq uses big-endian for its packet headers (arg, though it is consistent
with IP, it's not with the rest of quake).
I'm not sure that the mismatch between refdef_t and the assembly defines
was a problem (many fields unused), but the main problem was due to
execute permission on the pages: one chunk of asm was in the data
section, and the patched code was not marked as being executable (due to
such a thing not existing when quake was written).
vid.aspect is removed (for now) as it was not really the right idea (I
really didn't know what I was doing at the time). Nicely, this *almost*
fixes the fov bug on fresh installs: the view is now properly
upside-down rather than just flipped vertically (ie, it's now rotated
180 degrees).
Not only does it makes sense to centralize the setting of viewport and
scissor, but it's actually necessary in order to fix the upside-down
rendering on windows.
It turns out the dd and dib "driver" code is very specific to the
software renderer. This does not fix the segfault on changing video
mode, but I do know where the problem lies: the window is being
destroyed and recreated without recreating the buffers. I suspect a
clean solution to this will allow for window resizing in X as well.
While the main bulk of the improvement (36s down from 42s for
gmsp3v2.bsp on my i7-6850K) comes from using a high-tide allocator for
the windings (which necessitated using a fixed size), it is ever so
slightly faster than using malloc as the back-end.