This gets the alias pipeline in line with the bsp pipeline, and thus
everything is about as functional as it was before the rework (minus
dealing with large texture sets).
I guess it's not quite bindless as the texture index is a push constant,
but it seems to work well (and I may have fixed some full-bright issues
by accident, though I suspect that's just my imagination, but they do
look good).
This should fix the horrid frame rate dependent behavior of the view
model.
They are also in their own descriptor set so they can be easily shared
between pipelines. This has been verified to work for Draw.
BSP textures are now two-layered with the albedo and emission in the two
layers rather than two separate images. While this does increase memory
usage for the textures themselves (most do not have fullbright pixels),
it cuts down on image and image view handles (and shader resources).
For now, just dot product, trig, and min/max/bound, but it works well as
a proof of concept. The main goal was actually min. Only the list of
symbols is provided, it is the user's responsibility to set up the
symbol table and context.
cexpr's symbol tables currently aren't readily extended, and dynamic
scoping is usually a good thing anyway. The chain of contexts is walked
when a symbol is not found in the current context's symtab, but minor
efforts are made to avoid checking the same symtab twice (usually cased
by cloning a context but not updating the symtab).
Multiple render passes are needed for supporting shadow mapping, and
this is a huge step towards breaking the Vulkan render free of Quake,
and hopefully will lead the way for breaking the GL renderers free as
well.
This is actually a better solution to the renderer directly accessing
client code than provided by 7e078c7f9c.
Essentially, V_RenderView should not have been calling R_RenderView, and
CL_UpdateScreen should have been calling V_RenderView directly. The
issue was that the renderers expected the world entity model to be valid
at all times. Now, R_RenderView checks the world entity model's validity
and immediately bails if it is not, and R_ClearState (which is called
whenever the client disconnects and thus no longer has a world to
render) clears the world entity model. Thus R_RenderView can (and is)
now called unconditionally from within the renderer, simplifying
renderer-specific variants.
When allocating memory for multiple objects that have alignment
requirements, it gets tedious keeping track of the offset and the
alignment. This is a simple function for walking the offset respecting
size and alignment requirements, and doubles as a size calculator.
I'm not sure what I was thinking when I made PL_RemoveObjectForKey take
a const plitem. One of those times where C could do with being a little
more strict.
The stack is arbitrary strings that the validation layer debug callback
prints in reverse order after each message. This makes it easy to work
out what nodes in a pipeline/render pass plist are causing validation
errors. Still have to narrow down the actual line, but the messages seem
to help with that.
Putting qfvPushDebug/qfvPopDebug around other calls to vulkan should
help out a lot, tool.
As a bonus, the stack is printed before debug_breakpoint is called, so
it's immediately visible in gdb.
I'm not at all happy with con_message and con_menu, but fixing them
properly will take a rework of the menus (planned, though). Also, the
Menu_ console command implementations are a bit iffy and could also do
with a rewrite (probably part of the rest of the menu rework) or just
nuking (they were part of Johnny on Flame's work, so I suspect had
something to do with joystick bindings).
An imt switcher automatically changes the context's active imt based on
a user specified list of binary inputs. The inputs may be either buttons
(indicated as +button) or cvars (bare name). For buttons, the
pressed/not pressed state is used, and cvars are interpreted as ints
being 0 or not 0. The order of the inputs determines the bit number of
the input, with the first input being bit 0, second bit 1, third bit 2
etc. A default imt is given so large switchers do not need to be fully
configured (the default imt is written to all states).
A context can have any number of switchers attached. The switchers can
wind up fighting over the active imt, but this seems to be something for
the "user" (eg, configuration system) to sort out rather than the
switcher code enforcing anything.
As a result of the inputs being treated as bits, a switcher with N
inputs will have 2**N states, thus there's a maximum of 16 inputs for
now as 65536 states is a lot of configuration.
Using a switcher, setting up a standard strafe/mouse look configuration
is fairly easy.
imt_create key_game imt_mod
imt_create key_game imt_mod_strafe imt_mod
imt_create key_game imt_mod_freelook
imt_create key_game imt_mod_lookstrafe imt_mod_freelook
imt_switcher_create mouse key_game imt_mod_strafe +strafe lookstrafe +mlook freelook
imt_switcher 0 imt_mod 2 imt_mod 4 imt_mod_freelook 8 imt_mod_freelook 12 imt_mod_freelook
imt_switcher 6 imt_mod_lookstrafe 10 imt_mod_lookstrafe 14 imt_mod_lookstrafe
in_bind imt_mod mouse axis 0 move.yaw
in_bind imt_mod mouse axis 1 move.forward
in_bind imt_mod_strafe mouse axis 0 move.side
in_bind imt_mod_lookstrafe mouse axis 0 move.side
in_bind imt_mod_freelook mouse axis 1 move.pitch
This takes advantage of imt chaining and the default imt for the
switcher (there are 8 states that use imt_mod_strafe).
The switcher name must be unique across all contexts, and every imt used
in a switcher must be in the switcher's context.
The listener is invoked when the axis value changes due to IN_UpdateAxis
or IN_ClampAxis updating the axis. This does mean the listener
invocation make be somewhat delayed. I am a tad uncertain about this
design thus it being a separate commit.
Listeners are separate to the main callback as listeners have only
read-only access to the objects, but the main callback is free to modify
the cvar and thus can act as a parser and validator. The listeners are
invoked after the main callback if the cvar is modified. There does not
need to be a main callback for the listeners to be invoked.
I decided cvars and input buttons/axes need listeners so any changes to
them can be propagated. This will make using cvars in bindings feasible
and I have an idea for automatic imt switching that would benefit from
listeners attached to buttons and cvars.
Combining absolute and relative inputs at the binding does not work well
because absolute inputs generally update only when the physical input
updates, so clearing the axis input each frame results in a brief pulse
from the physical input, but relative inputs must be cleared each frame
(where frame here is each time the axis is read) but must accumulate the
relative updates between frames.
Other than the axis mode being incorrect, this seems to work quite
nicely.
This is the first step in the long-sought goal of allowing the window
size to change, but is required for passing on getting window position
and size information (though size is in viddef, it makes sense to pass
both together).
Input driver can now have an optional init_cvars function. This allows
them to create all their cvars before the actual init pass thus avoiding
some initialization order interdependency issues (in this case, fixing a
segfault when starting x11 clients fullscreen due to the in_dga cvar not
existing yet).
keyhelp provides the input name if it is known, and in_bind tries to use
the provided input name if not a number. Case sensitivity for name
lookups is dependent on the input driver.
There's now an internal event handler for taking care of device addition
and removal, and a public event handler for dealing with device input
events in various contexts In particular, so the clients can check for
the escape key.
While the console command line is quite good for setting everything up,
the devices being bound do need to be present when the commands are
executed (due to needing extra data provided by the devices). Thus
property lists that store the extra data (button and axis counts, device
names/ids, connection names, etc) seems to be the best solution.
The mouse bound to movement axes works (though signs are all over the
place, so movement direction is a little off), and binding F10 (key 68)
to quit works :)
Each axis binding has its own recipe (meaning the same input axis can be
interpreted differently for each binding)
Recipes are specified with field=value pairs after the axis name.
Valid fields are minzone, maxzone, deadzone, curve and scale, with
deadzone doubling as a balanced/unbalanced flag.
The default recipe has no zones, is balanced, and curve and scale are 1.
Hot-plug support is done via "connections" (not sure I'm happy with the
name) that provide a user specifiable name to input devices. The
connections record the device name (eg, "6d spacemouse") and id (usually
usb path for evdev devices, but may be the device unique id if
available) and whether automatic reconnection should match just the
device name or both device name and id (prevents problems with changing
the device connected to the one usb port).
Unnecessary enum removed, and the imt block struct moved to imt.c
(doesn't need to be public). Also, remove device name from the imt block
(and thus the parameter to the functions) as it turns out not to be
needed.
in_bind is only partially implemented (waiting on imt), but device
listing, device naming, and input identification are working. The event
handling system made for a fairly clean implementation for input
identification thanks to the focused event handling.
This has smashed the keydest handling for many things, and bindings, but
seems to be a good start with the new input system: the console in
qw-client-x11 is usable (keyboard-only).
The button and axis values have been removed from the knum_t enum as
mouse events are separate from key events, and other button and axis
inputs will be handled separately.
keys.c has been disabled in the build as it is obsolute (thus much of
the breakage).
For the mouse in x11, I'm not sure which is more cooked: deltas or
window-relative coordinates, but I don't imagine that really matters too
much. However, keyboard and mouse events suitable for 2D user interfaces
are sent at the same time as the more game oriented button and axis events.
Input Mapping Tables are still at the core as they are a good concept,
however they include both axis and button mappings, and the size is not
hard-coded, but dependent on the known devices. Not much actually works
yet (nq segfaults when a key is pressed).
kbutton_t is now in_button_t and has been moved to input.h. Also, a
button registration function has been added to take care of +button and
-button command creation and, eventually, direct binding of "physical"
buttons to logical buttons. "Physical" buttons are those coming in from
the OS (keyboard, mouse, joystick...), logical buttons are what the code
looks at for button state.
Additionally, the button edge detection code has been cleaned up such
that it no longer uses magic numbers, and the conversion to a float is
cleaner. Interestingly, I found that the handling is extremely
frame-rate dependent (eg, +forward will accelerate the player to full
speed much faster at 72fps than it does at 20fps). This may be a factor
in why gamers are frame rate obsessed: other games doing the same thing
would certainly feel different under varying frame rates.
For drivers that support it. Polling is still supported and forces the
select timeout to 0 if any driver requires polling. For now, the default
timeout when all drivers use select is 10ms.
I had forgotten that _size was the number of rows in the map, not the
number of objects (1024 objects per row). This fixes the missed device
removal messages. And probably a slew of other bugs I'd yet to encounter
:P
This includes device add and remove events, and axis and buttons for
evdev. Will need to sort out X11 input later, but next is getting qwaq
responding.
The common input code (input outer loop and event handling) has been
moved into libQFinput, and modified to have the concept of input drivers
that are registered by the appropriate system-level code (x11, win,
etc).
As well, my evdev input library code (with hotplug support) has been
added, but is not yet fully functional. However, the idea is that it
will be available on all systems that support evdev (Linux, and from
what I've read, FreeBSD).
At the low level, only unions can cause a set to grow. Of course, things
get interesting at the higher level when infinite (inverted) sets are
mixed in.
Instead of printing every representable member of an infinite set (ie,
up to element 63 in a set that can hold 64 elements), only those
elements up to one after the last non-member are listed. For example,
{...} - {2 3} -> {0 1 4 ...}
This makes reading (and testing!) infinite sets much easier.
Most of the set ops were always endian-agnostic since they were simply
operating on multiple bits in parallel, but individual element
add/remove/test was very endian-dependent. For the most part, this
didn't matter, but it does matter very much when loading external data
into a set or writing the data out (eg, for PVS).
For now, the functions check for a null hunk pointer and use the global
hunk (initialized via Memory_Init) if necessary. However, Hunk_Init is
available (and used by Memory_Init) to create a hunk from any arbitrary
memory block. So long as that block is 64-byte aligned, allocations
within the hunk will remain 64-byte aligned.
The output fat-pvs data is the *difference* between the base pvs and fat
pvs. This currently makes for about 64kB savings for marcher.bsp, and
about 233MB savings for ad_tears.bsp (or about 50% (470.7MB->237.1MB)).
I expect using utf-8 encoding for the run lengths to make for even
bigger savings (the second output fat-pvs leaf of marcher.bsp is all 0s,
or 6 bytes in the file, which would reduce to 3 bytes using utf-8).
The fact that numleafs did not include leaf 0 actually caused in many
places due to never being sure whether to add 1. Hopefully this fixes
some of the confusion. (and that comment in sv_init didn't last long :P)
After seeing set_size and thinking it redundant (thought it returned the
capacity of the set until I checked), I realized set_count would be a
much better name (set_count (node->successors) in qfcc does make much
more sense).
Modern maps can have many more leafs (eg, ad_tears has 98983 leafs).
Using set_t makes dynamic leaf counts easy to support and the code much
easier to read (though set_is_member and the iterators are a little
slower). The main thing to watch out for is the novis set and the set
returned by Mod_LeafPVS never shrink, and may have excess elements (ie,
indicate that nonexistent leafs are visible).
Having set_expand exposed is useful for loading data into a set.
However, it turns out there was a bug in its size calculation in that
when the requested set size was a multiple of SET_BITS (and greater than
the current set size), the new set size one be SET_BITS larger than
requested. There's now some tests for this :)
Quake just looked wrong without the view model. I can't say I like the
way the depth range is hacked, but it was necessary because the view
model needs to be processed along with the rest of the alias models
(didn't feel like adding more command buffers, which I imagine would be
expensive with the pipeline switching).
The recent changes to key handling broke using escape to get out of the
console (escape would toggle between console and menu). Thus take care
of the menu (escape) part of the coupling FIXME by implementing a
callback for the escape key (and removing key_togglemenu) and sorting
out the escape key handling in console. Seems to work nicely
Since vulkan supports 32-bit indexes, there's no need for the
shenanigans the EGL-based glsl renderer had to go through to render bsp
models (maps often had quite a bit more than 65536 vertices), though the
reduced GPU memory requirements of 16-bit indices does have its
advantages.
Any sun (a directional light) is in the outside node, which due to not
having its own PVS data is visible to all nodes, but that's a tad
excessive. However, any leaf node with sky surfaces will potentially see
any suns, and leaf nodes with no sky surfaces will see the sun only if
they can see a leaf that does have sky surfaces. This can be quite
expensive to calculate (already known to be moderately expensive for
just the camera leaf node (singular!) when checking for in-map lights)
Getting close to understanding (again) how it all works. I only just
barely understood when I got vulkan's renderer running, but I really
need to understand for when I modify things for shadows. The main thing
hurdle was tinst, but that was dealt with in the previous commit, and
now it's just sorting out the mess of elechains and elementss.
Its sole purpose was to pass the newly allocated instsurf when chaining
an instance model (ammo box, etc) surface, but using expresion
statements removes the need for such shenanigans, and even makes
msurface_t that little bit smaller (though a separate array would be
much better for cache coherence).
More importantly, the relevant code is actually easier to understand: I
spent way too long working out what tinst was for and why it was never
cleared.
This reduces the overhead needed to manage the memory blocks as the
blocks are guaranteed to be page-aligned. Also, the superblock is now
alllocated from within one of the memory blocks it manages. While this
does slightly reduce the available cachelines within the first block (by
one or two depending on 32 vs 64 bit pointers), it removes the need for
an extra memory allocation (probably via malloc) for the superblock.
The renderer's LineGraph now takes a height parameter, and netgraph now
uses cl_* cvars instead of r_* (which never really made sense),
including it's own height cvar (the render graphs still use
r_graphheight).
The uptime display had not been updated for the offset Sys_DoubleTime,
so add Sys_DoubleTimeBase to make it easy to use Sys_DoubleTime as
uptime.
Line up the layout of the client list was not consistent for drop and
qport.
conwidth and conheight have been moved into vid.conview (probably change
the name at some time), and scr_vrect has been replaced by a view as
well. This makes it much easier to create 2d elements that follow the
screen size (taking advantage of a view's gravity) which will, in the
end, make changing the window size easier.
One moves and resizes the view in one operation as a bit of an
optimization as moving and resizing both update any child views, and
this does only one update.
The other sets the gravity and updates any child views as their
absolute positions would change as well as the updated view's absolute
position.
This refactors (as such) keys.c so that it no longer depends on console
or gib, and pulls keys out of video targets. The eventual plan is to
move all high-level general input handling into libQFinput, and probably
low-level (eg, /dev/input handling for joysticks etc on Linux).
Fixes#8
on_update is for pull-model outpput targets to do periodic synchronous
checks (eg, checking that the connection to the actual output device is
still alive and reviving it if necessary)
Output plugins can use either a push model (synchronous) or a pull
model (asynchronous). The ALSA plugin now uses the pull model. This
paves the way for making jack output a simple output plugin rather than
the combined render/output plugin it currently is (for #16) as now
snd_dma works with both models.
This brings the alsa driver in line with the jack render (progress
towards #16), but breaks most of the other drivers (for now: one step at
a time). The idea is that once the pull model is working for at least
one other target, the jack renderer can become just another target like
it should have been in the first place (but I needed to get the pull
model working first, then forgot about it).
Correct state checking is not done yet, but testsound does produce what
seems to be fairly good sound when it starts up correctly (part of the
state checking (or lack thereof), I imagine).
This failed with errors such as:
from ./include/QF/simd/vec4d.h:32,
from libs/util/simd.c:37:
./include/QF/simd/vec4d.h: In function ‘qmuld’:
/usr/lib/gcc/x86_64-pc-linux-gnu/10.3.0/include/avx2intrin.h:1049:1: error: inlining failed in call to ‘always_inline’ ‘_mm256_permute4x64_pd’: target specific option mismatch
1049 | _mm256_permute4x64_pd (__m256d __X, const int __M)
Support for finding the first address associated with a source line was
added to the engine, returning 0 if not found.
A temporary breakpoint is set and the progs allowed to run free.
However, better handling of temporary breakpoitns is needed as currently
a "permanent" breakpoint will be cleared without clearing the temporary
breakpoing if the permanent breakpoing is hit while execut-to-cursor is
running.
Fuzzy bsearch is useful for finding an entry in a prefix sum array
(value is >= ele[0], < ele[1]), and the reentrant version is good when
data needs to be passed to the compare function. Adapted from the code
used in pr_resolve.
GCC does a fairly nice job of producing code for vector types when the
hardware doesn't support SIMD, but it seems to break certain math
optimization rules due to excess precision (?). Still, it works well
enough for the core engine, but may not be well suited to the tools.
However, so far, only qfvis uses vector types (and it's not tested yet),
and tools should probably be used on suitable machines anyway (not
forces, of course).
I don't know that the cache line size is 64 bytes on 32 bit systems, but
it should be ok to assume that 64-byte alignment behaves well on systems
with smaller cache lines so long as they are powers of two. This does
mean there is some waste on 32-bit systems, but it should be fairly
minimal (32 bytes per memblock, which manages page sized regions).
The Blend macro supports any non-integral type supporting * and +
(float, double, vec4f_t, etc), so it is essentially a scalar VectorBlend
or QuatBlend.
Standard quake has just linear, but the modding community added inverse,
inverse-square (raw and offset (1/(r^2+1)), infinite (sun), and
ambient (minlight). Other than the lack of shadows, marcher now looks
really good.
Mostly, this gets the stage flags in with the barrier, but also adds a
couple more barrier templates. It should make for slightly less verbose
code, and one less opportunity for error (mismatched barrier/stages).
This gets the shaders needed for creating shadow maps, and the changes
to the lighting pipeline for binding the shadow maps, but no generation
or reading is done yet. It feels like parts of various systems are
getting a little big for their britches and I need to do an audit of
various things.
QF now uses its own configuration file (quakeforge.cfg for now) rather
than overwriting config.cfg so that people trying out QF in their normal
quake installs don't trash their config.cfg for other quake clients. If
quakeforge.cfg is present, all other config files are ignored except
that quake.rc is scanned for a startdemos command and that is executed.
And improve the generated code for MSG_ReadShort
I suspect gcc didn't like all the excess pointer dereferences and so
couldn't assume that the bytes were being read sequentially.
And improve the generated code as well (ie, use a code sequence that gcc
recognizes and optimizes to a single 32-bit read and a byte-swap).
nq uses big-endian for its packet headers (arg, though it is consistent
with IP, it's not with the rest of quake).
I'm not sure that the mismatch between refdef_t and the assembly defines
was a problem (many fields unused), but the main problem was due to
execute permission on the pages: one chunk of asm was in the data
section, and the patched code was not marked as being executable (due to
such a thing not existing when quake was written).
vid.aspect is removed (for now) as it was not really the right idea (I
really didn't know what I was doing at the time). Nicely, this *almost*
fixes the fov bug on fresh installs: the view is now properly
upside-down rather than just flipped vertically (ie, it's now rotated
180 degrees).
While the main bulk of the improvement (36s down from 42s for
gmsp3v2.bsp on my i7-6850K) comes from using a high-tide allocator for
the windings (which necessitated using a fixed size), it is ever so
slightly faster than using malloc as the back-end.
This is for the conversion /to/ paletted textures. The conversion is
necessary for csqc support. In the process, the conversion has been sped up
by implementing a color cache for the conversion process. I haven't
measured the difference yet, but Mr Fixit does seem to load much faster for
the sw renderer than it did before the change (many months old memory).
This separate the FOV calculations from other refdef calcs, cleaning up the
renderer proper and making it easier for other parts of the engine (eg,
csqc) to update the fov.
The server edict arrays are now stored outside of progs memory, only the
entity data itself (ie data accessible to progs via ent.fld) is stored in
progs memory. Many of the changes were due to code accessing edicts and
entity fields directly rather than through the provided macros.
Loading is broken for multi-file image sets due to the way images are
loaded (this needs some thought for making it effecient), but the
Blender environment map loading works.
They're unlit (fullbright, but that's nothing new for quake), but
working nicely. As a bonus, sort out the sky pass (forced to due to the
way command buffers are used).
There were actually several problems: translucency wasn't using or
depending on the depth buffer, and the depth buffer wasn't marked as
read-only in the g-buffer pass. Getting that correct seems to have given
bigass1 a 0.5% boost (hard to say, could be the usual noise).
That was... easier than expected. A little more tedious that I would
have liked, but my scripting system isn't perfect (I suspect it's best
suited as the output of a code generator), and the C side could do with
a little more automation.
Other than dealing with shader data alignment issues, that went well :).
Nicely, the implementation gets the explicit scaling out of the shader,
and allows for a directional flag.
I never liked that some of the macros needed the type as a parameter
(yay typeof and __auto_type) or those that returned a value hid the
return statement so they couldn't be used in assignments.
Still "some" more to go: a pile to do with transforms and temporary
entities, and a nasty one with host_cbuf. There's also all the static
block-alloc lists :/
Light styles and shadows aren't implemented yet.
The map's entities are used to create the lights, and the PVS used to
determine which lights might be visible (ie, the surfaces they light).
That could do with some more improvements (eg, checking if a leaf is
outside a spotlight's cone), but the concept seems to work.
Double benefit, actually: faster when building a fat PVS (don't need to
copy as much) and can be used in multiple threads. Also, default visiblity
can be set, and the buffer size has its own macro.
Useful for avoiding a pile of wrapper functions that merely pass on
command-specific data to the actual implementation. Used to clean up the
wrappers in nq and qw cl_input.c
This is the first step towards component-based entities.
There's still some transform-related stuff in the struct that needs to
be moved, but it's all entirely client related (rather than renderer)
and will probably go into a "client" component. Also, the current
components are directly included structs rather than references as I
didn't want to deal with the object management at this stage.
As part of the process (because transforms use simd) this also starts
the process of moving QF to using simd for vectors and matrices. There's
now a mess of simd and sisd code mixed together, but it works
surprisingly well together.
The plan is to have a fully component based entity system. This adds
hierarchical transforms. Not particularly useful for quake itself at
this stage, but it will allow for much more flexibility later on,
especially when QuakeForge becomes more general-purpose.
This seems to be pretty close to as fast as it gets (might be able to do
better with some shuffles of the negation constants instead of loading
separate constants).
It's not used yet as work needs to be done to better support generic
entities, but this is the next step to real-time lighting (though, to be
honest, I expect it will be too slow to be usable).
The main purpose is to allow fluent-style:
const char *targetname = PL_String (PL_ObjectForKey (entity, "targetname"));
if (targetname && !PL_ObjectForKey (targets, targetname)) {
PL_D_AddObject (targets, targetname, entity);
}
[note: the above is iffy due to ownership of entity, but the code from
which the above comes works around the issue]
Static lights are yet to come (so the screen is black most of the time),
but dynamic lights work very nicely (and look very good) despite the
falloff being incorrect.
While I could reconstruct the position from the screen coords and depth,
this is easier and good enough for now. Reconstruction is an
optimization thing.
Lighting doesn't actually do lights yet, but it's producing pixels.
Translucent seems to be working (2d draw uses it), and compose seems to
be working.
After getting lights even vaguely working for alias models, I realized
that it just wasn't going to be feasible to do nice lighting with
forward rendering. This gets the bulk of the work done for deferred
rendering, but still need to sort out the shaders before any real
testing can be done.
The order in which keys are added to the dictionary object is
maintained. Adding a key after removing an old key adds the new key to
the end of the list rather than reusing the old key's spot.
PL_ParseLabeledArray works the same way as PL_ParseArray, but instead
takes a dictionary object. The keys of the items are ignored, and the
order is not preserved (at this stage), but this is a cleaner solution
to getting an array of objects when the definitions of those objects
need to be accessible by name as well.
It's not entirely there yet, but the basics are working. Work is still
needed for avoiding duplication of objects (different threads will have
different contexts and thus different tables, so necessary per-thread
duplication should not become a problem) and general access to arbitrary
fields (mostly just parsing the strings)
This allows plist objects to be accessed directly from cexpr expressions
using struct.field syntax for dictionary objects and array[index] syntax
for array objects.
The node struct was 72 bytes thus two cache line. Moving the pointer
into the brush model data block allows nodes to fit in a single cache
line (not that they're aligned yet, but that's next). It doesn't seem to
have made any difference to performance (at least in the vulkan
renderer), but it hasn't hurt, either, as the only place that needed the
parent pointer was R_MarkLeaves.
It's not quite as expected, but that may be due to one of msaa, the 0-15
range in the palette not being all the way to white, the color gradients
being not quite linear (haven't checked yet) or some combination of the
above. However, it's that what should be yellow is more green. At least
the zombies are no longer white and the ogres don't look like they're
wearing skeleton suits.
Doesn't seem to make much difference performance-wise, but speed does
seem to be fill-rate limited due to the 8x msaa. Still, it does mean
fewer bindings to worry about.
This is a big step towards a cleaner api. The struct reference in
model_t really should be a pointer, but bsp submodel(?) loading messed
that up, though that's just a matter of taking more care in the loading
code. It seems sensible to make that a separate step.
I've decided that alias model skins should be in a single four-level
array texture rather than spread over four textures, but there's no way
I want to write that code again: getting it right was hard enough the
first time :P
It now takes a context pointer (opaque data) that holds the buffers it
uses for the temporary strings. If the context pointer is null, a static
context is used (making those uses of va NOT thread-safe). Most calls to
va use the static context, but all such calls have been formatted
consistently so they are easy to find when it comes time to do a full
audit.
It's a tad bogus as it's the lights close to the camera, but it should
at least be a good start once things are working. There's currently
something very wrong with the state of things.
The dynamic array macros made this much easier than last time I looked
at it, especially when it came to figuring out the bad memory accesses
that I seem to remember from my last attempt 9 years ago.
This makes tex_t more generally useable and probably more portable. The
goal was to be able to use tex_t with data that is in a separate chunk
of memory.
The sky texture is loaded with black's alpha set to 0. While this does
hit both layers, the screen is cleared to black so it shouldn't be a
problem (and will allow having a skybox behind the sheets).
Glow map and sky sheet and cube need to wait until I can get some
default textures going, but the world is rendering correctly otherwise
(though a tad dark: need to do a gamma setting).
It now uses the ring buffer code I wrote for qwaq (and forgot about,
oops) to handle the packets themselves, and the logic for allocating and
freeing space from the buffer is a bit simpler and seems to be more
reliable. The automated test is a bit of a joke now, though, but coming
up with good tests for it... However, nq now cycles through the demos
without obvious issue under the same conditions that caused the light
map update code to segfault.
Vulkan validation (quite rightly) doesn't like it when the flush range
goes past the end of the buffer, but also doesn't like it when the flush
range isn't cache-line aligned, so align the size of the buffer, too.
I had originally planned on mixing the stage management with general
texture support code like I did in glsl, but I think that was a mistake
and I did keep looking for scrap.[ch] when I wanted to edit something to
do with the scrap...
This cleans up texture_t and possibly even improves locality of
reference when running through texture chains (not profiled, and not
actually the goal).
It optionally generates mipmaps, and supports the main texture types
(especially for texture packs), including palettes, but is otherwise
rather unsophisticated code. Needs a lot of work, but testing first.
This allows the array in which the command buffers are allocated to be
allocated on the stack using alloca and thus remove the need to
malloc/free of relatively small chunks.
The scrap texture did very good things for the glsl renderer and the
better control over data copying might help it do even better things for
vulkan, especially with lots of little icons.
It's never actually used (the texture can be fetched using
GLSL_ScrapTexture) and gets in the way of sharing the scrap system with
the vulkan renderer.
r_screen because of SCR_UpdateScreen, and r_cvar because the cvars
really should never have been in a plugin in the first place (and
r_screen needed access).
First pixels! This was a nightmare of little issues that the validation
layers couldn't help with: incorrect input assembly, incorrect vertex
attribute specs. Though the layers did help with getting the queues
working. Still, lots of work to go but this is a major breakthrough as
I now have access to visual debugging for textures and the like.
Short wrappers for Draw functins are in vid_render_vulkan.c so the
vulkan context can be passed on to the actual functions. The 2D shaders
are set up similar to those in glsl, but with full 32-bit color (rgba)
support instead of paletted. However, the textures are not loaded yet,
nor is anything bound.
The way I wound up using the field meant that exprctx should not "own"
the hashlinks chain, but rather just point to it. This fixes the nasty
access errors I had.
Dependencies on vkparse.hinc were spreading through the code which I
didn't want as that removes a lot of the automation from the automake
files. This keeps all parser code internal to vkparse.c's scope, and any
accesses required for enum and struct (not yet) definitions can be
fetched by name.
I want to be able to use name references, but that requires string
items, so anything that would normally be dictionary or array (or
binary, even) would also need to accept string. This seemed to be the
cleanest solution. Any custom parser would then need to check the type
and act appropriately, but any inappropriate types have already been
pre-filtered by the standard parsers.
Care needs to be taken to ensure the right function is used with the
right arguments, but with these, the need to use qconj(d|f) for a
one-off inverse rotation is removed.
They're binned by powers of two (with in between sizes going to the
smaller bin should I make cache-line allocations NPOT (which I think
might be worthwhile). However, there seems to still be a bug somewhere
causing a nasty leak as now my hacked qfvis consumes 40G in less than a
minute.