I'm not sure what the author of that code was thinking (maybe trying to
do 4 pixels at a time?), but the resulting code still did only one.
Better to remove all the casts, use the right pointer type, and keep the
code clear.
Drawing sky chains first ensures that sky surfaces correctly block parts
of the map that should not be visible (by writing the correct depth to
the depth buffer when doing box or dome skies). Writing brush models
first means that the models (ammo boxes etc) could be visible when they
should not be.
While there's currently only the one still, this will allow the entities
to be multiply queued for multi-pass rendering (eg, shadows). As the
avoidance of putting an entity in the same queue more than once relies
on the entity id, all entities now come from the scene (which is stored
in cl_world in the client code for nq and qw), thus the extensive
changes in the clients.
The root transform of each hierarchy can be extracted from the first
transform of the list in the hierarchy, so no information is lost. The
main reason for the change is I discovered (obvious in hindsight) that
deleting root transforms was O(n) due to keeping them in an array, thus
the use of a linked list (I don't expect a hierarchy to be in more than
one such list), and I didn't want the transforms to be in a linked list.
GL and GLSL were drawing the view model after particles instead of
before. For GL, this is likely due to avoiding fog affecting the view
model (which I think is not the right thing to do), and GLSL due to
copying GL (because I had no idea at the time). This makes the two
renderers consistent with the software renderers, and might even speed
things up a little as that's one less set of blends to do when the
particles are covered by the view model (I don't expect much
difference).
While I doubt the difference is all that significant, this should speed
up entity rendering because it cuts out a lot of branching, and
eliminates scanning the same list multiple times only to not do anything
for large chunks of the list.
Since transforms now know the scene to which they belong, and they know
when they are root and when not, getting the transform code to manage
the scene roots is the best way to keep the list of root transforms
consistent.
It turns out cam_controls is for pointing the player model in the
direction of movement rather than controlling the camera (I should add
proper camera controls).
I finally spent the time to work out what it was trying to say. Still
not sure it's clear, but what is clear is that there was probably some
disagreement at Id about the orientation of the world.
They no longer spin like crazy. I don't know how, but I must have broken
something over the years as I'm sure Seth had the code working (and I
seem to remember seeing it working). In the process, clean up a lot of
the angle mess.
It's a lot easier to read (and see the difference between modes 2 and 3)
with all the ifs removed, and the state is properly is chasestate_t now
(though not handled properly on level reset etc).
The more advanced modes are rather broken (continuous spinning), but
they may have been for a while. The bulk of the various changes were due
to renaming viewstate's origin and angles to make their meaning more
explicit.
They've been near-identical for years, now they're only one. It proved
necessary to start merging the HUD code which for now is just a few cvar
declarations (not even init), but that should be a separate set of
commits.
The actual view and projection matrices are now consistent with vulkan,
with the vulkan-gl disparity moved into adjustment matrices. The goal is
to allow the same camera data and code to be used in all renderers. The
extra matrix multiplication shouldn't be too expensive as it occurs only
when the field of view (not often, under user control) or near and far
clip distances (very rarely) change.
It holds the data for a basic 3d camera (transform, fov, near and far
clip). Not used yet as there is much work to be done in cleaning up the
client code.
Handling of view angles is a little hacky at the moment, but this gets
the chase camera code and most of the common input code into one place,
which will make cleaning up the camera code much easier.
While both matrices had positive determinants in the first place, I find
the projection matrix easier to understand without all the negatives,
and having quake-x/vulkan-z positively parallel in the z-up matrix makes
that a lot easier to think about.
Regardless of whether the sky is spinning or not, the matrix needs to be
updated with the current origin in order to get the direction vector
right in the shader. Also, it's in the update that the required x-y
plane rotation gets in so the skies move in the correct direction.
This actually has at least two benefits: the transform id is managed by
the scene and thus does not need separate management by the Ruamoko
wrapper functions, and better memory handling of the transform objects.
Another benefit that isn't realized yet is that this is a step towards
breaking the renderers free of quake and quakeworld: although the
clients don't actually use the scene yet, it will be a good place to
store the rendering information (functions to run, etc).
I've run into a bit of an issue with transform management (really, just
need to make them owned by the scene, but that means creating a scene
for quake and quakeworld).
This is the bulk of the work for recording the resource pointer with
with builtin data. I don't know how much of a difference it makes for
most things, but it's probably pretty big for qwaq-curses due to the
very high number of calls to the curses builtins.
Closes#26
The zone memory block header is 64 bytes, so allocating a single 8 byte
selector is rather wasteful. Instead, allocate selectors in large chunks
(currently 64) and divvy them out as needed. Significantly reduces
memory pressure in large Ruamoko progs.
These add legacy support for basic float bitops (& | ^ ~). Avoiding the
instructions would require tot only the source to be converted, but also
the servers (as they do access those fields), and this seemed to be too
much.
It's not enforced a this stage, and it would be easy enough to handle,
but it turns out all the standard quake and quakeworld progs never used
... for the print functions: the behavior of PF_VarString was
undocumented and so... tough :P.
I had forgotten that unsigned division was different from signed
division (rather silly of me). However, with some testing and analysis,
unsigned true modulo is not needed as it's not possible to have
negative inputs and thus it's the same as remainder.
It now takes the function name to print in error message (passed on to
PR_Sprintf) and the argument number of the format string. The variable
arguments (in ...) are assumed to be immediately after the format
argument.
This loads the current return pointer into the specified register. No
offset is used (should make that an error, but for now any offset is
simply ignored). This is part of the fix for getting obj_msg_sendv to
work with return values.
With the return buffer in progs_t, it could not be addressed by the
progs on 64-bit machines (this was intentional, actually), but in order
to get obj_msg_sendv working properly, I needed a way to "bounce" the
return address of a calling function to the called function. The
cleanest solution I could think of was to add a mode to the with
instruction allowing the return pointer to be loaded into a register and
then calling the function with a 0 offset for the return value but using
the relevant register (next few commits). Testing promptly segfaulted
due to the 64-bit offset not fitting into a 32-bit value.
This gets message forwarding apparently working, though something isn't
quite right as qwaq-app doesn't update properly when I try to step
through the program, but that could be an error elsewhere.
The plan is to use the types to extract the number of parameters for a
selector when it is necessary to know the count. However, it'll probably
become useful for something else alter (these things seem to always do
so).
This takes care of the problems with PR_RESET_PARAMS (which has recently
become just a wrapper for PR_SetupParams) changing the stack and causing
PR_CallFunction to save the wrong stack pointer. Message forwarding is
currently broken for Ruamoko ISA progs, but that is due to not having a
valid pr_argc. However, I do have a plan involving extracting the
parameter count from the selector, but that's something for a later
commit. Everything else seems to be ok (my little game is working
nicely).
rua_obj was skipped because that looks to be a bit more work and should
be a separate commit.
This is to avoid the stack getting mangled when calling progs functions
with parameters.
I suppose having one builtin call another was a neat idea at the time,
and really could have been fixed by simply wrapping the calls with
push/pop frame, but this is probably faster.
obj_msg_sendv needs to push the parameters onto the stack for Ruamoko
progs, but this causes problems because PR_CallFunction winds up
recording the wrong stack pointer for progs functions, and nothing
restores the stack for builtins. The handling is basically the same as
for the return pointer.
It's a bit disconcerting seeing a builtin in the top 10 when builtins
are counted by call while progs functions are counted by instruction.
Also, show the total profile after the function top-10 list.
pr_argc cannot be used in Ruamoko progs because nothing sets it. This
fixes the parse errors and resulting segfault when trying to parse the
Vulkan pipeline config.
It's currently only 4 (or even 3 for v6) words, but this fixes false
positives when checking for null pointers in Ruamoko progs due to
pr_return pointing to the return buffer and thus outside the progs
memory map resulting in an impossible to exceed value.
Since Z_Malloc uses Z_TagMalloc to do the work, this ensures the check
is always run.
Also, add the check to Z_Realloc when it needs to adjust an existing
block.
Builtins that call progs with parameters now must always wrap the call
to PR_ExecuteProgram so that the data stack is properly preserved across
the call.
I need to do an audit of all the calls to PR_ExecuteProgram.
It turns out the return pointer still needs to be saved even when a
builtin sets up a chain call to progs, but rather than the pointer being
simply restored, it needs to be saved in the call stack exactly as if
the function was called directly by progs. This fixes the invalid self
issue quite thoroughly: parameter state seems to be correct across all
calls now.
I should set up an automated test now that I know and understand the
situation.
In Ruamoko ISA progs, the param pointers point to the stack and
generally must most be manipulated by builtins, and there is no need
anyway as Ruamoko doesn't have RCALL. Fixes the mangling of .super.
When calling a builtin, normally the return pointer needs to be
restored, but if the builtin changes the call depth (usually by
effecting "return foo()" as in support for objects, but possibly
setjmp/longjmp when they are implemented), then the return pointer must
not be restored. This gets vkgen past object allocation, but it dies
when trying to send messages to super. This appears to be a compiler
bug.
Since the operand types sort out the difference between asr and shr, no
need to give them different opnames. Means qfcc doesn't need to worry
about which one it's searching for.
Yet another redundant addressing mode (since ptr + 0 can be used), so
replace it with a variable-indexed array (same as in v6p). Was forced
into noticing the problem when trying to compile Machine.r.
I abandoned the reason for doing it (adding a pile of vector types), but
I liked the cleanup. All the implementations are hand-written still, but
at least the boilerplate stuff is automated.
Of course, only in Ruamoko progs, but it works quite nicely.
global_string is now passed the absolute address of the referenced
operand. With a little groveling through the progs stack, it should be
possible to resolve pointers to locals in functions further up the
stack.
This fixes Ruamoko's return format string. It looks like it's producing
the correct address (but doesn't show all the information it should),
but the rest of the debug code needs work locals.
It turned out I need locals count and params_start for debugging, so use
the progs version instead to bail early from PR_EnterFunction and
PR_LeaveFunction (which I had forgotten anyway, oops).
They now include base register index and effective address of the
operands (though it may be wrong for instructions that don't use a base
register for that operand).
This cleans up dprograms_t, making it easier to read and see what chunks
are in it (I was surprised to see only 6, the explicit pairs made it
seem to have more).
Intel hardware requires 32-byte alignment for lvec4 and dvec4.
Unfortunately, it turns out that my attempts to align progs data in qfcc
went awry do to the order block sizes are calculated when writing the
progs.
This makes return consistent with load, store, etc, though its
addressing mode is encoded in bits 5 and 6 of c rather than the opcode.
It turns out I had no tests for any of return's addressing modes other
than basic def references, so no tests needed changing.
The parameter defs are allocated from the parameter space using a
minimum alignment of 4, and varargs functions get a va_list struct in
place of the ...
An "args" expression is unconditionally injected into the call arguments
list at the place where ... is in the list, with arguments passed
through ... coming after the ...
Arguments get through to functions now, but there's problems with taking
the address of local variables: currently done using constant pointer
defs, which can't work for the base register addressing used in Ruamoko
progs.
With the update to test-bi's printf (and a hack to qfcc for lea),
triangle.r actually works, printing the expected results (but -1 instead
of 1 for equality, though that too is actually expected). qfcc will take
a bit longer because it seems there are some design issues in address
expressions (ambiguity, and a few other things) that have pretty much
always been there.
PR_SetupParams is new and sets up the parameter pointers so older code
that expects only up to 8 parameter will work with both v6p and Ruamoko
progs without having to check what progs are running. PR_SetupParams is
useful even when Ruamoko progs are expected as it reserves the required
space (respecting alignment) on the stack and returns a pointer to the
top (bottom? confusing) of the stack. PR_PushFrame and PR_PopFrame
need to be used around PR_SetupParams, regardless of using temp strings,
to avoid a stack leak (need to do an audit).
This is part of the work for #26 (Record resource pointer with builtin
function data). Currently, the data pointer gets as far as the
per-instance VM function table (I don't feel like tackling the job of
converting all the builtin functions tonight). All the builtin modules
that register a resources data block pass that block on to
PR_RegisterBuiltins.
The builtin and progs function data is overlaid so the extra data
doesn't cause too much memory to be used (it's actually 8 bytes smaller
now). The plan is to pre-compute the offsets based on the parameter
size and alignment data.
This will make it possible for the engine to set up their parameter
pointers when running Ruamoko progs. At this stage, it doesn't matter
*too* much, except for varargs functions, because no builtin yet takes
anything larger than a float quaternion, but it will be critical when
double or long vec3 and vec4 values are passed.
Just 32-bit rounding to next higher power of two, and base 2 logarithm.
Most importantly, they are suitable for use in initializers as they are
constant in, constant out.
As even the simplest v6p functions that take parameters but have no
local or temporary variables still have locals for the local copy of the
parameters, this is a both a good check for for the Ruamoko ISA as its
functions never have locals (everything's on the progs data stack), and
an optimization for v6p functions that have no params or locals (simple
getters (very rare?), most .ctor, etc).
And fix an incorrect definition for RETURN_QUAT.
Prefixed MAX_STACK_DEPTH and LOCALSTACK_SIZE (and LOCALSTACK_SIZE got an
extra _).
The rest is just edits to documentation comments.
ldconst isn't implemented yet but the plan is to load various constants
(eg, 0, 1, 2, pi, e, ...).
Stack adjust is useful for adding an offset to the stack pointer without
having to worry about finding it (and it checks for alignment).
nop is just that :)
Due to how OP_RETURN works, a destination is required for any function
returning data, but the caller may not have allocated any space for the
value. Thus the VM maintains a buffer into which the data can be put and
ignored. It also makes a good place for return values when the engine
calls Ruamoko code as trusting progs code with return sizes seems like a
recipe for disaster, especially if the return location is on the C
stack.
It turned out that address mode B was redundant as C with 0 offset
(immediate) was the same (except for the underlying C code of course,
but adding st->b is very cheap). This allowed B to be used for
entity.field for all transfer operations. Thus instructions 0-3 are now
free as load E became load B, and other than the specifics of format
codes for statement printing, transfers+lea are unified.
This makes the v6p instruction table consistent with the ruamoko
instruction table, and clears up some of the ugliness with the load,
store, and assign instructions (. .= and = are now spelled out). I think
I'd still prefer an enum code (faster) but at least this is more
readable.
long is ignored for double, and v6p progs are stuck with 32 bits for
longs (don't feel like extending v6p any further), but the basics are
there for Ruamoko.
short is ignored for ints because the minimum size is 32, and signed is
just noise for ints anyway (and no chars, so...).
unsigned, however, is finally implemented properly (or at least seems to
be working correctly: tests pass after getting things compiling again,
and lt.u is used where it should be :)
And provide a table for such for qfcc and the like. With this, using
pr_double_t (for example) in C will cause the double value to always be
8-byte aligned and thus structures shared between gcc and qfcc will be
consistent (with a little fuss to take care of the warts).
And other related fields so integer is now int (and uinteger is uint). I
really don't know why I went with integer in the first place, but this
will make using macros easier for dealing with types.
They are both gone, and pr_pointer_t is now pr_ptr_t (pointer may be a
little clearer than ptr, but ptr is consistent with things like intptr,
and keeps the type name short).
This required delaying the setting of the return pointer by call until
after the current pointer had been saved, and thus passing the desired
pointer into PR_CallFunction (which does have some advantages for C
functions calling progs functions, but some dangers too (should ensure a
128 byte (32 word) buffer when calling untrusted code (which is any,
really)).
This fixes the issue of the data stack not being restored properly
because the returning function needs to return a value from its local
variables (stored on the stack) and accessing stack data below the stack
pointer is a bad idea (sure, no interrupts yet, but who knows...).
Call's operand c is used to specify where the return value of the
function is to be stored. This gets both the correct function being
called, and the value being returned correctly. Test still fails due to
the stack restoration issue.
It currently fails for two reasons:
- call's mode selection is incorrect (never updated from when there was
only the one call instruction and the mode was encoded in operand c)
- return should automatically restore the stack pointer to the value it
had on entry to the function, thus allowing local values stored on
the stack to be safely returned.
I don't know why they were ever signed (oversight at id and just
propagated?). Anyway, this resulted in "unsigned" spreading a bit, but
all to reasonable places.
This has been a long-held wishlist item, really, and I thought I might
as well take the opportunity to add the instructions. The double
versions of STATE require both the nextthink field and time global to be
double (but they're not resolved properly yet: marked with
"FIXME double time" comments).
Also, the frame number for double time state is integer rather than
float.
In the end, I decided any/all/none should be separate from the other
horizontal ops, if I even do them (can be implemented by first
converting to bool, then using the appropriate horizontal operation (& |
etc).
ANY/ALL/NONE have been temporarily removed until I implement the HOPS
(horizontal operations) sub-instructions, which will all both 32-bit and
64-bit operands and several other operations (eg, horizontal add).
All the fancy addressing modes for the conditional branch instructions
have been permanently removed: I decided the gain was too little for the
cost (24 instructions vs 6). JUMP and CALL retain their addressing
modes, though.
Other instructions have been shuffled around a little to fill most of
the holes in the upper block of 256 instructions: just a single small
7-instruction hole.
Rearrangements in the actual engine are mostly just to keep the code
organized. The only real changes were the various IF statements and
dealing with the resulting changes in their addressing.
When creating the tests for lea, I noticed that B was yet another simple
assign, so I decided it was best to drop it and move E into its place
(freeing up another instruction).
Most useful for 64-bit values as only one instruction is needed to move
the data around rather than two, but could be slightly faster for 32-bit
as the addressing is simpler (needs profiling).
The compare/ne operator returns "random" -ve, 0, +ve values (really,
just the numerical difference between the chars of the strings), but all
the rest return -1 for true and 0 for false, as with the rest of the
comparison operators.
Does not include string concatenation because I don't feel like messing
with zone init, but all the other operators are tested (currently
failing due to bool convention)
It calculating only the size of the array (which was often 4 or 8
globals per element) proved to be a pain when I forgot to alter the size
for the new scale tests. Fixing the size calculation even found a bug in
the shiftop tests.
It seems casting from float/double to [unsigned] int/long when the value
doesn't fit is undefined (which would explain the inconsistent results).
Mentioning the possibility seems like a good idea should the results for
such casts change and cause the tests to fail.
Bools turned out to be a problem to due to me wanting any non-zero value
to be treated as true thus had to expand them out as well as the
floating point <-> integral conversions.
They currently fail because for vector values, gcc casts the view, not
the value, so vec4 cast to ivec4 simply views the bits as int rather
than doing the actual conversion.
Rather than specifying that the conversion should be skipped, it now
specifies the mode of the conversions (with 0 being no conversion). This
is in preparation for boolean conversion.
I realized that being able to do bit-wise operations with 64-bit values
(and 256-bit vectors) is far more important than some convenient boolean
logic operators. The logic ops can be handled via the bit-wise ops so
long as the values are all properly boolean, and I plan on adding some
boolean conversion ope, so no real loss.
Both float 2,3,4 vectors and double 2,3,4 vectors (1 would be just a
copy of the mul instructions).
This completes the currently planned instructions. Now for testing.
Not all possibilities are supported because converting between int and
uint, and long and ulong is essentially a no-op. However, thanks to
Deek's suggestion, not only are all reasonable conversions available,
conversions for all widths are available, so vector conversions are
supported.
The code for the conversions is generated.
Thanks to Deek for the suggestion: the mode (ie, src and dst types) are
encoded in st->b. Actual code not written yet, but this frees up 13
instructions: now have 74 available for really interesting stuff :)
The call1-8 instructions have been removed as they are really not needed
(they were put in when I had plans of simple translation of v6p progs to
ruamoko, but they joined the dinosaurs).
The call instruction lost mode A (that is now return) and its mode B is
just the regular function access. The important thing is op_c (with
support for with-bases) specifies the location of the return def.
The return instruction packs both its addressing mode and return value
size into st->c as a 3.5 value: 3 bits for the mode (it supports all
five addressing modes with entity.field being mode 4) and 5 for the
size, limiting return sizes to 32 words, which is enough for one 4x4
double matrix.
This, especially with the following convert patch, frees up a lot of
instructions.
Now they're in a much more consistent arrangement, in particular with
the comparison opcodes if the conditional branch instructions are
considered to be fast comparisons with zero (ifnot -> ifeq, if -> ifne,
etc). Unconditional jump and call fill in the gaps. The goal was to get
them all in an arrangement that would work as a small enum for qfcc: it
can use the enum directly for the ruamoko IS, and a small map array for
v6p (except for call).
Both pr_type_size and pr_type_name. I want to macroize the enum, but
need to sort out the clutter of headers first, just need to decide on
naming. This at least sorts out the missed values for now.
The bug (alignment issues with AVX on windows) seems to have in gcc from
the 4.x days, and is still present in 11.2: it does not ensure stack
parameters that need 32 byte alignment are aligned. Telling gcc to use
the sysv abi (safe on a static function) lets gcc do what it does for
linux (usually pass the parameters in registers, which it seems to have
done).
And partial implementations in qfcc (most places will generate an
internal error (not implemented) or segfault, but some low-hanging fruit
has already been implemented).
As I expect to be tweaking things for a while, it's part of the build
process. This will make it a lot easier to adjust mnemonics and argument
formats (tweaking the old table was a pain when conventions changed).
It's not quite done as it still needs arg widths and types.
While working on the new opcode table, I decided a lot of the names were
not to my liking. Part of the problem was the earlier clash with the
v6p opcode names, but that has been resolved via the v6p tag.
Use the new "1" versions of loadvec3 to get a 1 in w to avoid
divide-by-zero errors, and use the correct type for longs (forgot to
change i to l on the vector types).
It turned out I had no way of using a pointer or field as the value to
load, so all 4 modes are duplicated with loads from where operand b
points, but the loaded value interpreted the same way. Also, fixed an
error in the calculation of op-b offsets.
Statements can be bounds checked in the one place (jump calculation),
but memory accesses cannot as they can be used in lea instructions which
should never cause an exception (unless one of lea's operands is OOB).
* / % %% + -
As a bonus, includes partial tests for a few extra operators. Several
things are broken at this stage, but uncommitted code is already
working.
Float bit-ops as well.
Also, add q*v4 and v4*q instructions. There are currently 48 free
opcodes, and I might remove the scale instructions, but they could be
useful as expanding a single float to a vector would take 3 instructions
(copy to temp, swizzle-expand temp, multiply, vs just scale).
The swizzle instruction is very powerful in that in can do any of the
256 permutations of xyzw, optionally negate any combination of the
resulting components, and zero any combination of the result components
(even all). This means the one instruction can take care of any actual
swizzles, conjugation for complex and quaternion values, zeroing vectors
(not that it's the only way), and probably other weird things.
The python file was used to generate the jump table and actual swizzle
code.
They even found a bug in the addressing mode functions :) (I'd forgotten
that I wanted signed offsets from the pointer and thus forgot to cast
st->b to short in order to get the sign extension)
This allows the VM to select the right execution loop and qfcc currently
still produces only the old IS (it doesn't know how to deal with the new
IS yet)
When it's finalized (most of the conversion operations will go, probably
the float bit ops, maybe (very undecided) the 3-component vector ops,
and likely the CALLN ops), this will be the actual instruction set for
Ruamoko.
Main features:
- Significant reduction in redundant instructions: no more multiple
opcodes to move the one operand size.
- load, store, push, and pop share unified addressing mode encoding
(with the exception of mode 0 for load as that is redundant with mode
0 for store, thus load mode 0 gives quick access to entity.field).
- Full support for both 32 and 64 bit signed integer, unsigned integer,
and floating point values.
- SIMD for 1, 2, (currently) 3, and 4 components. Transfers support up
to 128-bit wide operations (need two operations to transfer a full
4-component double/long vector), but all math operations support both
128-bit (32-bit components) and 256-bit (64-bit components) vectors.
- "Interpreted" operations for the various vector sizes: complex dot
and multiplication, 3d vector dot and cross product, quaternion dot
and multiplication, along with qv and vq shortcuts.
- 4-component swizzles for both sizes (not yet implemented, but the
instructions are allocated), with the option to zero or negate (thus
conjugates for complex and quaternion values) individual components.
- "Based offsets": all relevant instructions include base register
indices for all three operands allowing for direct access to any of
four areas (eg, current entity, current stack frame, Objective-QC
self, ...) instructions to set a register and push/pop the four
registers to/from the stack.
Remaining work:
- Implement swizzle operations and a few other stragglers.
= Make a decision about conversion operations (if any instructions
remain, they'll be just single-component (at 14 meaningful pairs,
that's a lot of instructions to waste on SIMD versions).
- Decide whether to keep CALL1-CALL8: probably little point in
supporting two different calling conventions, and it would free up
another eight instructions.
- Unit tests for the instructions.
- Teach qfcc to generate code for the new instruction set (hah, biggest
job, I'm sure, though hopefully not as crazy as the rewrite eleven
years ago).
I wish I'd done it this way years ago (but maybe gcc 2.95 couldn't hack
the casts, I do know there were aliasing problems in the past). Anyway,
this makes operand access much more consistent for variable sized
operands (eg float vs double vs vec4), and is a big part of the new
instruction set implementation.
There is no reasonable way (due to hardware-enforced alignment issues)
to simply convert old bytecode to new (probably best done with an
off-line tool, preferably just recompiling when I get qfcc up to the
job), so both loops will need to be present. This just moves the
original loop into its own function in order to make it easy to bring in
the new (and iron out integration issues).
And add a unary op macro. Having VectorCompOp makes it easy to write
macros that work for multiple data widths, which is why it and its users
now use (dst, ...) instead of (..., dst) as in the past. I'll sort out
the other macros later now that I know the compiler handily gives
messages about the switched order (uninitialized vars etc).
For int, long, float and double. I've been meaning to add them for a
while, and they're part of the new Ruamoko instructions set (which is
progressing nicely).
The opcode table is a nightmare to maintain, but this does clean it up
and speed up opcode lookups since they can now be indexed. Of course, it
turns out I had missed adding several instructions, so had to fix that,
and qfcc needed a bit of a re-jigger to get the opcode out of the table.
The list of all allocated dispatch tables is used to free all the tables
when the progs are reloaded. Not clearing the list meant that the next
instance (second map change) corrupted the list.
Forgetting to unhook the functions (Sys_Printf and the client console's
input event handler) was not a problem for static builds because the
functions were always present, but in builds with dynamic plugins, the
client console's code got ripped away and thus Sys_Printf and the event
hander were being sent into invalid memory. Too much work, not enough
play (with a fully installed client).
The switch from using pr_functions (dfunction_t) to function_table
(bfunction_t) for keeping track of the current function (and thus
profiling data) broke PR_Profile as it never saw anything but 0.
Even NUM_FOR_BAD_EDICT will have a bad day if the edict pointer is
invalid, so make sure that the entity pointer is valid (within the edict
area AND a multiple of edict size).
PR_LoadDebug now does only the initial version and crc checks, and the
byte-swapping of the loaded symbols file. PR_DebugSetSym sets up all the
pointers.
The homogeneous coord was not being initialized and thus was picking up
rubbish from the stack. This is why the test would succeed in some
circumstances but fail in others.
Forgetting to invoke [super dealloc] in a derived class's -dealloc
method has caused me to waste far too much time chasing down the
resulting memory leaks and crashes. This is actually the main focus of
issue #24, but I want to take care of multiple paths before I consider
the issue to be done.
However, as a bonus, four cases were found :)
Fixes axis inputs being half what they should be. Can't quite get +1,
though (need to figure something out for the positive axis range being
slightly smaller than the negative range).
With some hacks that are not included (plan on handling events and
contexts properly), button inputs, including using listeners, are
working nicely: my little game is working again. While the trampoline
code was a bit repetitive (and I do want to clean that up), connecting
button listeners directly to Ruamoko instance methods proved to be quite
nice.
mtwist_rand_0_1 produces numbers in the range [0, 1) and
mtwist_rand_m1_1 produces numbers in the range (-1, 1). The numbers will
not be denormal, so the distribution should be fairly uniform (as much
as Mersenne Twister itself is), but this needs proper testing.
0 is included for the mtwist_rand_0_1 as it seems useful, but -1 is not
included in mtwist_rand_m1_1 in order to keep the extremes of the
distribution balanced around 0.
And create rua_game to coordinate other game builtins.
Menus are broken for key handling, but have been since the input rewrite
anyway. rua_input adds the ability to create buttons and axes (but not
destroy them). More work needs to be done to flesh things out, though.
This takes care of the global variables to a point (there is still the
global struct shared between the non-vulkan renderers), but it also
takes care of glsl's points-only rendering.
After yesterday's crazy marathon editing all the particles files, and
starting to do another big change to them today, I realized that I
really do need to merge them down. All the actual spawning is now in the
client library (though particle insertion will need to be moved). GLSL
particle rendering is semi-broken in that it now does only points (until
I come up with a way to select between points and quads (probably a
context object, which I need anyway for Vulkan)).
This may seem a little contradictory, but it's due to the difference
between a high level (engine) render pass and a Vulkan render pass
object (and quite likely a poor choice in names for the high level
object). This is necessary for supporting compute shader dispatches as
they cannot be submitted inside a Vulkan render pass.
This has the advantage of getting entity_t out of the particle system,
and much easier to read math. Also, it served as a nice test for my
particle physics shaders (implemented the ideas in C). There's a lot of
code that needs merging down: all but the actual drawing can be merged.
There's some weirdness with color ramps, but I'll look into that later.
They should increment by one for each pic, not 4 (I think some fluff
remaining from copying glsl's draw code).
I noticed the problem when I saw large gaps of 0s in the vertex data in
renderdoc.
This gets the crosshair working in Vulkan (next commit) and fixes issues
with changing the palette (though I've never seen a different palette
for quate, there's still the change from "all black" to an actual
palette).
This was needed to get crosshaircolor working correctly, but is likely
another step towards resizable windows (the listener set types are
generic for any viddef event, not just palette changes).
Holding onto the pointer is not a good idea, and it is read-only as
direct manipulation of the world matrix is not supported. However, this
is useful for passing the matrix to the GPU.
This means color, emission, and translucent. Fixes the HOM issues on my
VersaPro (but halves the frame-rate... definitely need to bring back the
forward renderer as an option).
This gets the pipelines loaded (and unloaded on shutdown). Probably the
easy part :P. Still need to sort out the command buffers,
synchronization, and particle generation (and probably a bunch else
that's not coming to mind).
This needed changing Vulkan_CreatePipeline to
Vulkan_CreateGraphicsPipeline for consistency (and parsing the
difference from a plist seemed... not worth thinking about).
It turned out the bindless approach wouldn't work too well for my design
of the sprite objects, but I don't think that's a big issue at this
stage (and it seems bindless is causing problems for brush/alias
rendering via renderdoc and on my versa pro). However, I have figured
out how to make effective use of descriptor sets (finally :P).
The actual normal still needs checking, but the sprites are currently
unlit so not an issue at this stage.
I'm not at all sure what I was thinking when I designed it, but I
certainly designed it wrong (to the point of being fairly useless). It
turns out memory requirements are already aligned in size (so just
multiplying is fine), and what I really wanted was to get the next
offset aligned to the given requirements.
This adds the shaders and the pipeline specs. I'm not sure that the
deferred rendering side of the render pass is appropriate, but I thought
I'd give it a go, since quake sprites are really cutoff rather than
translucent.
With the switch to multi-layer textures for brush models, the bsp and
alias texture descriptor sets became identical and thus the definitions
shareable. However, due to complications I don't want to address yet,
they're still separately identified, but I should be able to use the
texture set for most, if not all, pipelines.
The vertices and frame images are loaded into the one memory object,
with the vertices first followed by the images.
The vertices are 2D xy+uv sets meant to be applied to the model
transform frame, and are pre-computed for the sprite size (this part
does support sprites with varying frame image sizes).
The frame images are loaded into one image with each frame on its own
layer. This will cause some problems if any sprites with varying frame
image sizes are found, but the three sprites in quake are all uniform
size.
As much as it can be since the texture data is interleaved with the
model data in the files (I guess not that bad a design for 25 years ago
with the tight memory constraints), but this paves the way for
supporting sprites in Vulkan.
The cache system pointers are now indices into an array of
cache_system_t blocks, allowing them to be 32 bits instead of 64, thus
allowing cache_system_t to fit into a single CPU cache line. This still
gives and effective 38 bits (256GB) of addressing for cache/hunk. This
does mean that the cache functions cannot work with more than 256GB, but
should that become a problem, cache and working hunking hunk can be
separate, and it should be possible to have multiple cache systems.
There's no point in zeroing out memory that is only going to be
overwritten by the loaded file (excess bytes beyond the end of a
massaged text file shouldn't be accessed anyway, and the terminating
null is still written).
This is needed for cleaning up excess memsets when loading files because
Hunk_RawAllocName has nonnull on its hunk pointer (as the rest of the
hunk functions really should, but not just yet).
In trying to reduce unnecessary memsets when loading files, I found that
Hunk_RawAllocName already had nonnull on it, so quakefs needed to know
the hunk it was to use. It seemed much better to to go this way (first
step in what is likely to be a lengthy process) than backtracking a
little and removing the nonnull attribute.
As the sw renderer's implementation was the closest to id's, it was used
as the model (thus a fair bit of cleanup is still needed). This fixes
some incorrect implementations in glsl and gl.
I'd forgotten (when doing the original brush texture loader) that
turbulent surfaces were unlit and thus always full-bright, then never
wrote the turb shader to take care of it. The best solution seems to be
to just mix the two colors in the shader as it will allow turb surfaces
to be lit in the future (probably with severely limited light counts due
to being a forward renderer).
This gets the alias pipeline in line with the bsp pipeline, and thus
everything is about as functional as it was before the rework (minus
dealing with large texture sets).
I guess it's not quite bindless as the texture index is a push constant,
but it seems to work well (and I may have fixed some full-bright issues
by accident, though I suspect that's just my imagination, but they do
look good).
This should fix the horrid frame rate dependent behavior of the view
model.
They are also in their own descriptor set so they can be easily shared
between pipelines. This has been verified to work for Draw.
BSP textures are now two-layered with the albedo and emission in the two
layers rather than two separate images. While this does increase memory
usage for the textures themselves (most do not have fullbright pixels),
it cuts down on image and image view handles (and shader resources).
Smashing everything in the process :P (need to work on the C side).
However, while bindless is supposedly good for performance, the biggest
gain this will bring is portability: the texture counts are
automatically limited to what the hardware can handle, and the reliance
on push descriptors is removed (though they were nice and did help get
things up and running).
I had forgotten that the parameters are in reverse order, and even if I
had remembered, I forgot to reset offset before the second loop.
Pre-decrementing offset takes care of both issues at once.
My VersaPro doesn't support more than 32 per-stage samplers (lavapipe).
This is a small part of getting Vulkan to run on lavapipe and even in
itself is rather incomplete.
This allows using references in expressions, eg:
$frames.size * size_t($properties.limits.maxSamplers)
As references remain property list items until actually evaluated.
Fixes the warning about parse_fixed_array not being used (oops, the
problem with partial commits), but more importantly, gives access to
things like maxDescriptorSetSamplers.
This will make property list expressions easier to work with. The
library is rather limited right now (trig, dot, min/max/bound) but even
just min adds a lot of functionality.
For now, just dot product, trig, and min/max/bound, but it works well as
a proof of concept. The main goal was actually min. Only the list of
symbols is provided, it is the user's responsibility to set up the
symbol table and context.
cexpr's symbol tables currently aren't readily extended, and dynamic
scoping is usually a good thing anyway. The chain of contexts is walked
when a symbol is not found in the current context's symtab, but minor
efforts are made to avoid checking the same symtab twice (usually cased
by cloning a context but not updating the symtab).
I want to support reading VkPhysicalDeviceLimits but it has some arrays.
While I don't need to parse them (VkPhysicalDeviceLimits should be
treated as read-only), I do need to be able to access them in property
list expressions, and vkgen generates the cexpr type descriptors too.
However, I will probably want to parse arrays some time in the future.
This ensures that unused parser blocks do not get emitted. In the
testing of the upcoming support for fixed arrays, the blend color
constants were being double emitted (both as custom and normal parser)
due to being an array. gcc did not like that (what with all those
warning flags).
Multiple render passes are needed for supporting shadow mapping, and
this is a huge step towards breaking the Vulkan render free of Quake,
and hopefully will lead the way for breaking the GL renderers free as
well.
This is actually a better solution to the renderer directly accessing
client code than provided by 7e078c7f9c.
Essentially, V_RenderView should not have been calling R_RenderView, and
CL_UpdateScreen should have been calling V_RenderView directly. The
issue was that the renderers expected the world entity model to be valid
at all times. Now, R_RenderView checks the world entity model's validity
and immediately bails if it is not, and R_ClearState (which is called
whenever the client disconnects and thus no longer has a world to
render) clears the world entity model. Thus R_RenderView can (and is)
now called unconditionally from within the renderer, simplifying
renderer-specific variants.
The generated short names for a lot of Vulkan enums start with a number
(eg VK_IMAGE_TYPE_2D -> 2d). Having to prefix the short name with ` is a
tiny cost for the convenience.
While using binary data objects for specialization data works for bools
(as they can be 0 or -1), they don't work so well for numeric values due
to having to get the byte order correct and thus are not portable, and
difficult to get right.
Binary data is still supported, but the data can be written as a string
with an array(...) "constructor" expression taking any number of
parameters, with each parameter itself being an expression (though
values are limited at this stage).
Due to the plist format, quotes are required around the expression
("array(...)")
While there may be better solutions, I needed a varargs function for
building Vulkan specialization data. Like progs functions, negative
parameter counts indicate ellipsis with the number of fixed parameters
being equal to -param_count - 1.
Sets never shrink, so assigning a dynamically created set to a
statically created set after the working size has reduced (going from
demo2 to demo3) causes the set code to attempt to resize the statically
created set, which leads to libc having a bad time.
Why nvidia's drivers accepted double-destroyed framebuffers is beyond
me, but this fixes the Intel drivers complaining about such (and the
subsequent segfault).
When I changed the matrices from an array of floats to an array of
vec4f_t, I forgot to update the flush offsets. Yay for having a
Vulkan-capable Intel device with its different alignment requirements.
When allocating memory for multiple objects that have alignment
requirements, it gets tedious keeping track of the offset and the
alignment. This is a simple function for walking the offset respecting
size and alignment requirements, and doubles as a size calculator.
IN_ButtonAction treats id 0 as not pressed in its internal processing,
and the previous input implementation treated 0 as "no key", so this is
both the simplest and most correct fix.
Fixes mouse left button not working every second time the game is run
(due to keyboard and mouse bindings swapping places in the config file
(separate issue, if it really is one)).
I'm not sure what I was thinking when I made PL_RemoveObjectForKey take
a const plitem. One of those times where C could do with being a little
more strict.
While using barriers is a zillion times better than actually grabbing
the mouse and keyboard, they're still a pain when debugging as qf is not
able to respond to the barrier-hit events. All the other logic is still
there so even when "grabbing", the mouse will not be blocked if the
window doesn't have focus.
The stack is arbitrary strings that the validation layer debug callback
prints in reverse order after each message. This makes it easy to work
out what nodes in a pipeline/render pass plist are causing validation
errors. Still have to narrow down the actual line, but the messages seem
to help with that.
Putting qfvPushDebug/qfvPopDebug around other calls to vulkan should
help out a lot, tool.
As a bonus, the stack is printed before debug_breakpoint is called, so
it's immediately visible in gdb.
Rather than just 0/1, it now acts as flags to control what messages are
printed. In addition to the Vulkan enum names (long and short), none and
all are supported (as well as raw numbers, but they're not checked for
validity). This makes vulkan_use_validation a bit easier to use and less
verbose by default.
Now, if only it was easier to remember the name :P
Id's binding of escape to togglemenu interfered with the hard-coding
(want escape to togglemenu (or console as a fallback) no matter what).
This idea was part of mercury's original design, too.
I'm not at all happy with con_message and con_menu, but fixing them
properly will take a rework of the menus (planned, though). Also, the
Menu_ console command implementations are a bit iffy and could also do
with a rewrite (probably part of the rest of the menu rework) or just
nuking (they were part of Johnny on Flame's work, so I suspect had
something to do with joystick bindings).
It seems X11 does not like creating barriers entirely off the screen,
though the error seems to be a little unreliable (however, off the left
edge was definitely bad).
An imt switcher automatically changes the context's active imt based on
a user specified list of binary inputs. The inputs may be either buttons
(indicated as +button) or cvars (bare name). For buttons, the
pressed/not pressed state is used, and cvars are interpreted as ints
being 0 or not 0. The order of the inputs determines the bit number of
the input, with the first input being bit 0, second bit 1, third bit 2
etc. A default imt is given so large switchers do not need to be fully
configured (the default imt is written to all states).
A context can have any number of switchers attached. The switchers can
wind up fighting over the active imt, but this seems to be something for
the "user" (eg, configuration system) to sort out rather than the
switcher code enforcing anything.
As a result of the inputs being treated as bits, a switcher with N
inputs will have 2**N states, thus there's a maximum of 16 inputs for
now as 65536 states is a lot of configuration.
Using a switcher, setting up a standard strafe/mouse look configuration
is fairly easy.
imt_create key_game imt_mod
imt_create key_game imt_mod_strafe imt_mod
imt_create key_game imt_mod_freelook
imt_create key_game imt_mod_lookstrafe imt_mod_freelook
imt_switcher_create mouse key_game imt_mod_strafe +strafe lookstrafe +mlook freelook
imt_switcher 0 imt_mod 2 imt_mod 4 imt_mod_freelook 8 imt_mod_freelook 12 imt_mod_freelook
imt_switcher 6 imt_mod_lookstrafe 10 imt_mod_lookstrafe 14 imt_mod_lookstrafe
in_bind imt_mod mouse axis 0 move.yaw
in_bind imt_mod mouse axis 1 move.forward
in_bind imt_mod_strafe mouse axis 0 move.side
in_bind imt_mod_lookstrafe mouse axis 0 move.side
in_bind imt_mod_freelook mouse axis 1 move.pitch
This takes advantage of imt chaining and the default imt for the
switcher (there are 8 states that use imt_mod_strafe).
The switcher name must be unique across all contexts, and every imt used
in a switcher must be in the switcher's context.
The listener is invoked when the axis value changes due to IN_UpdateAxis
or IN_ClampAxis updating the axis. This does mean the listener
invocation make be somewhat delayed. I am a tad uncertain about this
design thus it being a separate commit.
Listeners are separate to the main callback as listeners have only
read-only access to the objects, but the main callback is free to modify
the cvar and thus can act as a parser and validator. The listeners are
invoked after the main callback if the cvar is modified. There does not
need to be a main callback for the listeners to be invoked.
This allows id1/qw config files, and to a certain extent scripts, to
work with the new binding system. It does highlight just how limited the
original system was (many keys could not bound).
Mouse axis input does not work yet as that needs a little more work to
support +strafe and +mlook.
I had forgotten that Cmd_Args() preserves quotes, which resulted in
button bindings having excess quotes when used to bind complex commands
(eg, the default quicksave and quickload bindings).
I decided cvars and input buttons/axes need listeners so any changes to
them can be propagated. This will make using cvars in bindings feasible
and I have an idea for automatic imt switching that would benefit from
listeners attached to buttons and cvars.
For now, only the first two axis (mouse X and Y) are supported (XInput
treats the scroll wheel events as axes too, so mice have up to 4!), but
most importantly, this prevents the scroll wheel from being seen as the
X axis. Oops.
Combining absolute and relative inputs at the binding does not work well
because absolute inputs generally update only when the physical input
updates, so clearing the axis input each frame results in a brief pulse
from the physical input, but relative inputs must be cleared each frame
(where frame here is each time the axis is read) but must accumulate the
relative updates between frames.
Other than the axis mode being incorrect, this seems to work quite
nicely.
With the old headers removed, X11_SetGamma became a stub and gcc
complained about it wanting the const attribute. On investigation, it
turned out the X_XF86VidModeSetGamma was a holdover from the initial
implementation of hardware gamma support.
UI key presses are still handled by regular X events, but in-game
"button" presses arrive via raw keyboard events. This gives transparent
handling of keyboard repeat (UI keys see repeat, game keys do not),
without messing with the server's settings (yay, that was most annoying
when it came to debugging), and the keyboard is never grabbed, so this
is a fairly user-friendly setup.
At first, I wasn't too keen on capturing them from the root window
(thinking about the user's security), but after a lot of investigation,
I found a post by Peter Hutterer
(http://who-t.blogspot.com/2011/09/whats-new-in-xi-21-raw-events.html)
commenting that root window events were added to XInput2 specifically
for games. Since application focus is tracked and unfocused key events
are dropped very early on, there's no way for code further down the
food-chain to know there even was an event, abusing the access would
require modifying the x11 input code, in which case all bets are off
anyway and any attempt at security anywhere in the code will fail,
meaning that nefarious progs code and the like shouldn't be a problem.
After a lot of thought, it really doesn't make sense to have an option
to block mouse input in x11 (not grabbing or similar does make sense, of
course). Not initializing mouse input made perfect sense in DOS and even
console Linux (SVGA) what with the low level access.
It turns out that if the barriers are set on the app window, and the app
grabs the pointer (even passively), barrier events will no longer be
sent to the app. However, creating the barriers on the root window and
the events are selected on the root window, the barrier events are sent
regardless of the grab state.
The kernel knows nothing about X11 application focus, so we need to take
care of it ourselves.
Device add/remove events are unaffected: the are always passed on.
Other subsystems, especially low-level input drivers, need to know when
the app has input focus. eg, as the evdev driver uses the raw stream
from the kernel, which has no idea about X application focus (in fact,
it seems the events are shared across multiple apps without any issue),
the evdev driver sees all the events thus needs to know when to drop
them.
It turns out to be possible to get a barrier event at the same time as a
configure notify event (which rebuilds the barriers), and trying to
release the pointer at such a time results in a bad barrier error and
program crash. Thus check the event barrier against the currently
existing barriers before attempting to release the pointer.
This does mean that a better mechanism for sequencing window
repositioning and barrier creation may be required.
This should be a much friendlier way of "grabbing" input, though I
suspect that using raw keyboard events will result in a keyboard grab,
which is part of the reason for wanting a friendly grab.
There does seem to be a problem with the mouse sneaking out of the
top-right and bottom-left corners. I currently suspect a bug in the X
server, but further investigation is needed.
This is needed for getting window position info into in_x11 without
exposing more globals, and is likely to be useful for other things,
especially as it doubles as a resize event when that's eventually
supported.
This is necessary in focus-follows-mouse environments (at least for
openbox, but it wouldn't surprise me if most other WMs behave the same
way) because the WMs don't set focus when the pointer is grabbed (which
XInput does before the WM sees the enter event). This is especially
important when the window is fullscreen on a multi-monitor setup as
there is no border to *maybe* catch the mouse before it enters the
window.
Right now, only raw pointer motion and button events are handled, and
the mouse escapes the window, and there are some issues with focus in
focus-follows-mouse environments. However, this should be a much nicer
setup than DGA.
The current limit is still 32. Dealing with it properly will take some
rather advanced messing with XInput, and will be necessary assuming
non-XInput support is continued.
There's now IN_X11_Preinit, IN_X11_Postinit (both for want of better
names), and in_x11_init. The first two are for taking care of
initialization that needs to be done before window creation and between
window creation and mapping (ie, are very specific to X11 stuff) while
in_x11_init takes care of the setup for the input system. This proved
necessary in my XInput experimentation: a passive enter grab takes
effect only when the pointer enters the window, thus setting up the grab
with the pointer already in the window has no effect until the pointer
leaves the window and returns.
This was always a horrible hack just to get the screen centered on the
window back when we were doing fullscreen badly. With my experiments
with XInput, it has proven to be a liability (I'd forgotten it was even
there until it started imposing a 2s delay to QF's startup).
Input driver can now have an optional init_cvars function. This allows
them to create all their cvars before the actual init pass thus avoiding
some initialization order interdependency issues (in this case, fixing a
segfault when starting x11 clients fullscreen due to the in_dga cvar not
existing yet).
Well... it could be done better, but this works for now assuming it's in
/usr/include (and it's correct for mxe builts). Does need proper
autoconfiscation, though.
Seems to work nicely for keyboard (though key bindings are not
cross-platform). Mouse not tested yet, and I expect there are problems
with it for absolute inputs (yay mouse warp :P).
I didn't notice that uint is defined somewhere on Linux... until I tried
compiling for windows (not defined). Use a define to keep the cast
function naming nice.
Mouse axis and button names are handled internally (and thus
case-insensitive).
Key names are handled by X11. Case-sensitivity is currently determined
by Xlib.
keyhelp provides the input name if it is known, and in_bind tries to use
the provided input name if not a number. Case sensitivity for name
lookups is dependent on the input driver.
Reset the blocks completely when loading configs and fix a leftover from
when I thought I'd expose the block numbers to bindings but then changed
my mind to simply track the base binding.
The cooked inputs (ie_key, ie_mouse) are intended for UI interaction, so
generally should have priority over the raw events, which are intended
for game interaction.
There's now an internal event handler for taking care of device addition
and removal, and a public event handler for dealing with device input
events in various contexts In particular, so the clients can check for
the escape key.
While the console command line is quite good for setting everything up,
the devices being bound do need to be present when the commands are
executed (due to needing extra data provided by the devices). Thus
property lists that store the extra data (button and axis counts, device
names/ids, connection names, etc) seems to be the best solution.
Recipes themselves still use float, but using double in the cexpr values
allows bare floating point numbers (which parse as double) to be used,
making the bind command line a little more user-friendly.
The mouse bound to movement axes works (though signs are all over the
place, so movement direction is a little off), and binding F10 (key 68)
to quit works :)
Each axis binding has its own recipe (meaning the same input axis can be
interpreted differently for each binding)
Recipes are specified with field=value pairs after the axis name.
Valid fields are minzone, maxzone, deadzone, curve and scale, with
deadzone doubling as a balanced/unbalanced flag.
The default recipe has no zones, is balanced, and curve and scale are 1.
Hot-plug support is done via "connections" (not sure I'm happy with the
name) that provide a user specifiable name to input devices. The
connections record the device name (eg, "6d spacemouse") and id (usually
usb path for evdev devices, but may be the device unique id if
available) and whether automatic reconnection should match just the
device name or both device name and id (prevents problems with changing
the device connected to the one usb port).
Unnecessary enum removed, and the imt block struct moved to imt.c
(doesn't need to be public). Also, remove device name from the imt block
(and thus the parameter to the functions) as it turns out not to be
needed.
in_bind is only partially implemented (waiting on imt), but device
listing, device naming, and input identification are working. The event
handling system made for a fairly clean implementation for input
identification thanks to the focused event handling.
This has smashed the keydest handling for many things, and bindings, but
seems to be a good start with the new input system: the console in
qw-client-x11 is usable (keyboard-only).
The button and axis values have been removed from the knum_t enum as
mouse events are separate from key events, and other button and axis
inputs will be handled separately.
keys.c has been disabled in the build as it is obsolute (thus much of
the breakage).
I'm undecided on how to handle application focus (probably gain/lose
events), and the destination handler has been a stub for a while. One less
dependency on the "old" key handling code.
I'm undecided if the pasted text should be sent as a string rather than
individual key events, but this will do the job for now as it gets me
closer to being able to test everything.
It seems that under certain circumstances (window managers?), select is not
reliable for getting key events, so use of select has been disabled until I
figure out what's going on and how to fix it.
For the mouse in x11, I'm not sure which is more cooked: deltas or
window-relative coordinates, but I don't imagine that really matters too
much. However, keyboard and mouse events suitable for 2D user interfaces
are sent at the same time as the more game oriented button and axis events.
The x11 keyboard and mouse devices are really core input devices rather
than x11 input devices in that keyboard and mouse will be present on most
systems and thus not specific to the main user interface (x11, windows,
etc).
It turns out that calling Sys_Shutdown in the signal handler can cause
lockups due to the signal occurring at unsafe times. Fortunately, this is
just the IO related signals (INT, HUP, TERM, QUIT) as the others are
usually caused by actual errors and should not occur in system code thus
timing should not be an issue. However, care will need to be taken when it
comes to handling SIGINT or similar for breaking runaway progs code when
that time comes.
Now nothing works at all ;) However, that's only because the binding
system is incomplete: the X11 input events are getting through to the
binding system, so now it's just a matter of getting that to work.
Input Mapping Tables are still at the core as they are a good concept,
however they include both axis and button mappings, and the size is not
hard-coded, but dependent on the known devices. Not much actually works
yet (nq segfaults when a key is pressed).
kbutton_t is now in_button_t and has been moved to input.h. Also, a
button registration function has been added to take care of +button and
-button command creation and, eventually, direct binding of "physical"
buttons to logical buttons. "Physical" buttons are those coming in from
the OS (keyboard, mouse, joystick...), logical buttons are what the code
looks at for button state.
Additionally, the button edge detection code has been cleaned up such
that it no longer uses magic numbers, and the conversion to a float is
cleaner. Interestingly, I found that the handling is extremely
frame-rate dependent (eg, +forward will accelerate the player to full
speed much faster at 72fps than it does at 20fps). This may be a factor
in why gamers are frame rate obsessed: other games doing the same thing
would certainly feel different under varying frame rates.
For drivers that support it. Polling is still supported and forces the
select timeout to 0 if any driver requires polling. For now, the default
timeout when all drivers use select is 10ms.
Removing the device from the devices list after closing the device
could cause the device to be double-freed if something went wrong in the
device removal callback resulting in system shutdown which would then
close all open devices.
The device is removed from the list before the callback is called.
There's still a small opportunity for such in a multi-threaded
environment, but that would take device removal occurring at the same
time as the input system is shut down. Probably the responsibility of
the threaded environment rather than inputlib.
I had forgotten that _size was the number of rows in the map, not the
number of objects (1024 objects per row). This fixes the missed device
removal messages. And probably a slew of other bugs I'd yet to encounter
:P
This includes device add and remove events, and axis and buttons for
evdev. Will need to sort out X11 input later, but next is getting qwaq
responding.
While QF doesn't currently use nanoseconds, having access to a clock
that is not affected by setting system time is nice, and as a bonus, can
handle suspends should the need arise.
The common input code (input outer loop and event handling) has been
moved into libQFinput, and modified to have the concept of input drivers
that are registered by the appropriate system-level code (x11, win,
etc).
As well, my evdev input library code (with hotplug support) has been
added, but is not yet fully functional. However, the idea is that it
will be available on all systems that support evdev (Linux, and from
what I've read, FreeBSD).
At the low level, only unions can cause a set to grow. Of course, things
get interesting at the higher level when infinite (inverted) sets are
mixed in.
Instead of printing every representable member of an infinite set (ie,
up to element 63 in a set that can hold 64 elements), only those
elements up to one after the last non-member are listed. For example,
{...} - {2 3} -> {0 1 4 ...}
This makes reading (and testing!) infinite sets much easier.
Most of the set ops were always endian-agnostic since they were simply
operating on multiple bits in parallel, but individual element
add/remove/test was very endian-dependent. For the most part, this
didn't matter, but it does matter very much when loading external data
into a set or writing the data out (eg, for PVS).
Attempting to vis ad_tears drags a few lurking bugs out of
SmallestEnclosingBall_vf: poor calculation of 2-point affine space, poor
handling of duplicate points and dropped support points, poor
calculation of the new center (related to duplicate points), and
insufficient iterations for large point sets. qfvis (modified for
cluster spheres) now loads ad_tears.
As per usual, fp math finds a way to confound any epsilon test. So
rather than relying entirely on test_support_points, check the distance
from the sphere center to the affine point and break out of the loop if
the distance is small enough (< 1% of the current radius). This allows
qfvis to load ad_tears without hacks.
Scaling the checks by 1e-6 was a little too tight for very small
triangles, but 1e-5 seems to work well. This fixes SEB getting stuck for
a ridiculously small (for quake) triangle in ad_tears (probably resulted
from some bad math in qfbsp when generating the portal file from the
bsp).
For now, the functions check for a null hunk pointer and use the global
hunk (initialized via Memory_Init) if necessary. However, Hunk_Init is
available (and used by Memory_Init) to create a hunk from any arbitrary
memory block. So long as that block is 64-byte aligned, allocations
within the hunk will remain 64-byte aligned.
I need to write some automated tests for this, and reading of course,
but 1 and two byte outputs look correct. Kind of sad it took sixteen
years to get around to attempting to use the code :(
Mod_DecompressVis_set (via Mod_LeafPVS_set) can be used to recycle pvs
sets, but the set may have been set to everything at some stage, which
is implemented by inverting the set (making the set infinite) and having
1-bits remove elements from the set. This is most definitely not wanted
for pvs :)
Currently undecided what to do about Mod_DecompressVis_mix, thus the
fixme.
Fixes the flickering lights in any map where the camera is out of the
map for a single frame (eg, start.bsp, The Catacombs (hipnotic, hip2m3)).
I knew counting bits individually was slow, but it never really mattered
until now. However, I didn't expect such a dramatic boost just by going
to mapping bytes to bit counts. 16-bit words would be faster still, but
the 64kB lookup table would probably start hurting cache performance,
and 32-bit words (4GB table) definitely would ruin the cache. The
universe isn't big enough for 64-bits :)
The fact that numleafs did not include leaf 0 actually caused in many
places due to never being sure whether to add 1. Hopefully this fixes
some of the confusion. (and that comment in sv_init didn't last long :P)
After seeing set_size and thinking it redundant (thought it returned the
capacity of the set until I checked), I realized set_count would be a
much better name (set_count (node->successors) in qfcc does make much
more sense).
Modern maps can have many more leafs (eg, ad_tears has 98983 leafs).
Using set_t makes dynamic leaf counts easy to support and the code much
easier to read (though set_is_member and the iterators are a little
slower). The main thing to watch out for is the novis set and the set
returned by Mod_LeafPVS never shrink, and may have excess elements (ie,
indicate that nonexistent leafs are visible).
Having set_expand exposed is useful for loading data into a set.
However, it turns out there was a bug in its size calculation in that
when the requested set size was a multiple of SET_BITS (and greater than
the current set size), the new set size one be SET_BITS larger than
requested. There's now some tests for this :)
-999999 seems to be a hold-over from the software renderer passed
through both gl renderers. I guess it didn't matter in the gl renderers
due to various draw hacks, but it made quite a difference in vulkan.
Fixes the view model covering the hud.
Quake just looked wrong without the view model. I can't say I like the
way the depth range is hacked, but it was necessary because the view
model needs to be processed along with the rest of the alias models
(didn't feel like adding more command buffers, which I imagine would be
expensive with the pipeline switching).
When setting local rotation/scale/transform, need to cache the rotation and
scale, otherwise they can't be fetched easily later on (position is easy as
it's just the fourth column of the matrix).
The recent changes to key handling broke using escape to get out of the
console (escape would toggle between console and menu). Thus take care
of the menu (escape) part of the coupling FIXME by implementing a
callback for the escape key (and removing key_togglemenu) and sorting
out the escape key handling in console. Seems to work nicely
This fixes a bug when loading bsp29 files that resulted in leaf nodes
having bogus bounding boxes if any coordinates were negative (and thus
dynamic lights, and probably all sorts of other things) being broken.
And it took me only 9 years to notice :P
Without shadows, this is quite the cheat, but noclip is a cheat anyway,
so probably not that big a deal. It does, however, make noclip usable
for debugging.
Since vulkan supports 32-bit indexes, there's no need for the
shenanigans the EGL-based glsl renderer had to go through to render bsp
models (maps often had quite a bit more than 65536 vertices), though the
reduced GPU memory requirements of 16-bit indices does have its
advantages.
Any sun (a directional light) is in the outside node, which due to not
having its own PVS data is visible to all nodes, but that's a tad
excessive. However, any leaf node with sky surfaces will potentially see
any suns, and leaf nodes with no sky surfaces will see the sun only if
they can see a leaf that does have sky surfaces. This can be quite
expensive to calculate (already known to be moderately expensive for
just the camera leaf node (singular!) when checking for in-map lights)
Getting close to understanding (again) how it all works. I only just
barely understood when I got vulkan's renderer running, but I really
need to understand for when I modify things for shadows. The main thing
hurdle was tinst, but that was dealt with in the previous commit, and
now it's just sorting out the mess of elechains and elementss.
Its sole purpose was to pass the newly allocated instsurf when chaining
an instance model (ammo box, etc) surface, but using expresion
statements removes the need for such shenanigans, and even makes
msurface_t that little bit smaller (though a separate array would be
much better for cache coherence).
More importantly, the relevant code is actually easier to understand: I
spent way too long working out what tinst was for and why it was never
cleared.
This reduces the overhead needed to manage the memory blocks as the
blocks are guaranteed to be page-aligned. Also, the superblock is now
alllocated from within one of the memory blocks it manages. While this
does slightly reduce the available cachelines within the first block (by
one or two depending on 32 vs 64 bit pointers), it removes the need for
an extra memory allocation (probably via malloc) for the superblock.
The renderer's LineGraph now takes a height parameter, and netgraph now
uses cl_* cvars instead of r_* (which never really made sense),
including it's own height cvar (the render graphs still use
r_graphheight).
The uptime display had not been updated for the offset Sys_DoubleTime,
so add Sys_DoubleTimeBase to make it easy to use Sys_DoubleTime as
uptime.
Line up the layout of the client list was not consistent for drop and
qport.
The render plugins have made a bit of a mess of getting at the data and
thus it's a tad confusing how to get at it in different places. Really
needs a proper cleanup :(
conwidth and conheight have been moved into vid.conview (probably change
the name at some time), and scr_vrect has been replaced by a view as
well. This makes it much easier to create 2d elements that follow the
screen size (taking advantage of a view's gravity) which will, in the
end, make changing the window size easier.
One moves and resizes the view in one operation as a bit of an
optimization as moving and resizing both update any child views, and
this does only one update.
The other sets the gravity and updates any child views as their
absolute positions would change as well as the updated view's absolute
position.
It now processes 4 pixels at a time and uses a bit mask instead of a
conditional to set 3 of the 4 pixels to black. On top of the 4:1 pixel
processing and avoiding inner-loop conditional jumps, gcc unrolls the
loop, so Draw_FadeScreen itself is more than 4x as fast as it was. The
end result is about 5% (3fps) speedup to timedemo demo1 on my 900MHz
EEE Pc when nq has been hacked to always draw the fade-screen.
qwaq-curses has its place, but its use for running vkgen was really a
placeholder because I didn't feel like sorting out the different
initialization requirements at the time. qwaq-cmd has the (currently
unnecessary) threading power of qwaq-curses, but doesn't include any UI
stuff and thus doesn't need curses. The work also paves the way for
qwaq-x11 to become a proper engine (though sorting out its init will be
taken care of later).
Fixes#15.
This refactors (as such) keys.c so that it no longer depends on console
or gib, and pulls keys out of video targets. The eventual plan is to
move all high-level general input handling into libQFinput, and probably
low-level (eg, /dev/input handling for joysticks etc on Linux).
Fixes#8
I had forgotten to test with shared libs and it turns out jack and alsa
were directly accessing symbols in the renderer (and in jack's case,
linking in a duplicate of the renderer).
Fixes#16.
The JACK Audio Connection Kit support is now just an output target
rather than a full duplicate of the renderer (in pull mode). This is
what I wanted to to back when I first added jack support, but I needed
to get the renderer working asynchronously without affecting any of the
other outputs.
Fixes#16.
on_update is for pull-model outpput targets to do periodic synchronous
checks (eg, checking that the connection to the actual output device is
still alive and reviving it if necessary)
Output plugins can use either a push model (synchronous) or a pull
model (asynchronous). The ALSA plugin now uses the pull model. This
paves the way for making jack output a simple output plugin rather than
the combined render/output plugin it currently is (for #16) as now
snd_dma works with both models.
This gets the alsa target working nicely for mmapped outout. I'm not
certain, but I think it will even deal with NPOT buffer sizes (I copied
the code from libasound's sample pcm.c, thus the uncertainty).
Non-mmapped output isn't supported yet, but the alsa target now works
nicely for pull rendering.
However, some work still needs to be done for recovery failure: either
disable the sound system, or restart the driver entirely (preferable).
This brings the alsa driver in line with the jack render (progress
towards #16), but breaks most of the other drivers (for now: one step at
a time). The idea is that once the pull model is working for at least
one other target, the jack renderer can become just another target like
it should have been in the first place (but I needed to get the pull
model working first, then forgot about it).
Correct state checking is not done yet, but testsound does produce what
seems to be fairly good sound when it starts up correctly (part of the
state checking (or lack thereof), I imagine).
This failed with errors such as:
from ./include/QF/simd/vec4d.h:32,
from libs/util/simd.c:37:
./include/QF/simd/vec4d.h: In function ‘qmuld’:
/usr/lib/gcc/x86_64-pc-linux-gnu/10.3.0/include/avx2intrin.h:1049:1: error: inlining failed in call to ‘always_inline’ ‘_mm256_permute4x64_pd’: target specific option mismatch
1049 | _mm256_permute4x64_pd (__m256d __X, const int __M)
and rename the variable since it's not the size of the frame (may be
from the very early days of ALSA development, and I suspect the
terminology changed a bit).
The calculation was including the bits per sample, which makes no sense
as the period size determines the number of samples in a submission
chunk (and thus latency). For now, set it to around 5.5ms (will probably
need a cvar).
If Sys_Shutdown gets called twice, particularly if a shutdown callback
hangs and the program is killed with INT or QUIT, shutdown_list would be
in an invalid state. Thus, get the required data (function pointer and
data pointer) from the list element, then unlink the element before
calling the function. This ensures that a reinvocation of Sys_Shutdown
continues from the next callback or ends cleanly. Fixes a segfault when
killing testsound while using the oss output (it hangs on shutdown).
Fixes#12
However, this is a bit of a band-aid in that the code for global defs
seems redundant (there is very similar code a little above that is
always executed) and the code for field defs should probably be executed
unconditionally: I suspect the problem fixed by
d5454faeb7 still shows with game coded
compiled with recent versions of the compiler, I just haven't tested
any.
Support for finding the first address associated with a source line was
added to the engine, returning 0 if not found.
A temporary breakpoint is set and the progs allowed to run free.
However, better handling of temporary breakpoitns is needed as currently
a "permanent" breakpoint will be cleared without clearing the temporary
breakpoing if the permanent breakpoing is hit while execut-to-cursor is
running.
For now, just bsearch (normal and fuzzy), qsort, and prefixsum (not in
C's stdlib that I know of, but I think having native implementations of
float and int prefix sums will be useful.