Getting close to understanding (again) how it all works. I only just
barely understood when I got vulkan's renderer running, but I really
need to understand for when I modify things for shadows. The main thing
hurdle was tinst, but that was dealt with in the previous commit, and
now it's just sorting out the mess of elechains and elementss.
Its sole purpose was to pass the newly allocated instsurf when chaining
an instance model (ammo box, etc) surface, but using expresion
statements removes the need for such shenanigans, and even makes
msurface_t that little bit smaller (though a separate array would be
much better for cache coherence).
More importantly, the relevant code is actually easier to understand: I
spent way too long working out what tinst was for and why it was never
cleared.
This reduces the overhead needed to manage the memory blocks as the
blocks are guaranteed to be page-aligned. Also, the superblock is now
alllocated from within one of the memory blocks it manages. While this
does slightly reduce the available cachelines within the first block (by
one or two depending on 32 vs 64 bit pointers), it removes the need for
an extra memory allocation (probably via malloc) for the superblock.
The renderer's LineGraph now takes a height parameter, and netgraph now
uses cl_* cvars instead of r_* (which never really made sense),
including it's own height cvar (the render graphs still use
r_graphheight).
Ping ack (A2A_ACK) is always a 1-byte packet, so check that the A2A_ACK
is the only byte in the packet before passing the packet to the server
list code. Fixes the mysterious dropped packet every 256 packets as the
low eight bits of the packet's sequence number equalled A2A_ACK.
The uptime display had not been updated for the offset Sys_DoubleTime,
so add Sys_DoubleTimeBase to make it easy to use Sys_DoubleTime as
uptime.
Line up the layout of the client list was not consistent for drop and
qport.
The render plugins have made a bit of a mess of getting at the data and
thus it's a tad confusing how to get at it in different places. Really
needs a proper cleanup :(
conwidth and conheight have been moved into vid.conview (probably change
the name at some time), and scr_vrect has been replaced by a view as
well. This makes it much easier to create 2d elements that follow the
screen size (taking advantage of a view's gravity) which will, in the
end, make changing the window size easier.
One moves and resizes the view in one operation as a bit of an
optimization as moving and resizing both update any child views, and
this does only one update.
The other sets the gravity and updates any child views as their
absolute positions would change as well as the updated view's absolute
position.
It now processes 4 pixels at a time and uses a bit mask instead of a
conditional to set 3 of the 4 pixels to black. On top of the 4:1 pixel
processing and avoiding inner-loop conditional jumps, gcc unrolls the
loop, so Draw_FadeScreen itself is more than 4x as fast as it was. The
end result is about 5% (3fps) speedup to timedemo demo1 on my 900MHz
EEE Pc when nq has been hacked to always draw the fade-screen.
qwaq-curses has its place, but its use for running vkgen was really a
placeholder because I didn't feel like sorting out the different
initialization requirements at the time. qwaq-cmd has the (currently
unnecessary) threading power of qwaq-curses, but doesn't include any UI
stuff and thus doesn't need curses. The work also paves the way for
qwaq-x11 to become a proper engine (though sorting out its init will be
taken care of later).
Fixes#15.
This refactors (as such) keys.c so that it no longer depends on console
or gib, and pulls keys out of video targets. The eventual plan is to
move all high-level general input handling into libQFinput, and probably
low-level (eg, /dev/input handling for joysticks etc on Linux).
Fixes#8
The default is to enable (and autodetect based on lscpu) simd support,
or the mode can be specified via --enable-simd=mode. --disable-simd
disables the support but ensures gcc will still compile the float vector
types.
When moving an identifier label from one node to another, the first node
must be evaluated before the second node, which the edge guarantees.
However, code for swapping two variables
t = a; a = b; b = t;
creates a dependency cycle. The solution is to create a new leaf node
for the source operand of the assignment. This fixes the swap.r test
without pessimizing postop code.
This takes care of the core problem in #3, but there is still room for
improvement in that the load/store can be combined into a move.
This reverts commit 2fcda44ab0.
Killing the node is not the correcgt answer as it blocks many
optimization opportunities. The correct answer is adding edges to
describe the temporal dependencies. Of course, this breaks the swap.r
test.
In order to correctly handle swap-style code
{ t = a; a = b; b = t; }
edges need to be created for each of the assignments moving an
identifier lable, but the dag must remain acyclic (the above example
wants to create a cycle). Having the reachable nodes recorded makes
checking for potential loops a quick operation.
Identifiers can be constants. I don't remember quite what it fixed other
than some bogus kill relations in the dags (which might have caused
issues later).
I had forgotten to test with shared libs and it turns out jack and alsa
were directly accessing symbols in the renderer (and in jack's case,
linking in a duplicate of the renderer).
Fixes#16.
The JACK Audio Connection Kit support is now just an output target
rather than a full duplicate of the renderer (in pull mode). This is
what I wanted to to back when I first added jack support, but I needed
to get the renderer working asynchronously without affecting any of the
other outputs.
Fixes#16.
on_update is for pull-model outpput targets to do periodic synchronous
checks (eg, checking that the connection to the actual output device is
still alive and reviving it if necessary)