While I doubt the difference is all that significant, this should speed
up entity rendering because it cuts out a lot of branching, and
eliminates scanning the same list multiple times only to not do anything
for large chunks of the list.
It holds the data for a basic 3d camera (transform, fov, near and far
clip). Not used yet as there is much work to be done in cleaning up the
client code.
Regardless of whether the sky is spinning or not, the matrix needs to be
updated with the current origin in order to get the direction vector
right in the shader. Also, it's in the update that the required x-y
plane rotation gets in so the skies move in the correct direction.
This actually has at least two benefits: the transform id is managed by
the scene and thus does not need separate management by the Ruamoko
wrapper functions, and better memory handling of the transform objects.
Another benefit that isn't realized yet is that this is a step towards
breaking the renderers free of quake and quakeworld: although the
clients don't actually use the scene yet, it will be a good place to
store the rendering information (functions to run, etc).
This is the bulk of the work for recording the resource pointer with
with builtin data. I don't know how much of a difference it makes for
most things, but it's probably pretty big for qwaq-curses due to the
very high number of calls to the curses builtins.
Closes#26
It's not enforced a this stage, and it would be easy enough to handle,
but it turns out all the standard quake and quakeworld progs never used
... for the print functions: the behavior of PF_VarString was
undocumented and so... tough :P.
With the return buffer in progs_t, it could not be addressed by the
progs on 64-bit machines (this was intentional, actually), but in order
to get obj_msg_sendv working properly, I needed a way to "bounce" the
return address of a calling function to the called function. The
cleanest solution I could think of was to add a mode to the with
instruction allowing the return pointer to be loaded into a register and
then calling the function with a 0 offset for the return value but using
the relevant register (next few commits). Testing promptly segfaulted
due to the 64-bit offset not fitting into a 32-bit value.
The plan is to use the types to extract the number of parameters for a
selector when it is necessary to know the count. However, it'll probably
become useful for something else alter (these things seem to always do
so).
It's currently only 4 (or even 3 for v6) words, but this fixes false
positives when checking for null pointers in Ruamoko progs due to
pr_return pointing to the return buffer and thus outside the progs
memory map resulting in an impossible to exceed value.
Thanks to the size of the type encoding being explicit in the encoding,
anything that tries to read the encodings without expecting the width
will simply skip over the width, as it is placed after the ev type in
the encoding.
Any code that needs to read both the old encodings and the new can check
the size of the basic encodings to see if the width field is present.
I abandoned the reason for doing it (adding a pile of vector types), but
I liked the cleanup. All the implementations are hand-written still, but
at least the boilerplate stuff is automated.
This cleans up dprograms_t, making it easier to read and see what chunks
are in it (I was surprised to see only 6, the explicit pairs made it
seem to have more).
PR_SetupParams is new and sets up the parameter pointers so older code
that expects only up to 8 parameter will work with both v6p and Ruamoko
progs without having to check what progs are running. PR_SetupParams is
useful even when Ruamoko progs are expected as it reserves the required
space (respecting alignment) on the stack and returns a pointer to the
top (bottom? confusing) of the stack. PR_PushFrame and PR_PopFrame
need to be used around PR_SetupParams, regardless of using temp strings,
to avoid a stack leak (need to do an audit).
This is part of the work for #26 (Record resource pointer with builtin
function data). Currently, the data pointer gets as far as the
per-instance VM function table (I don't feel like tackling the job of
converting all the builtin functions tonight). All the builtin modules
that register a resources data block pass that block on to
PR_RegisterBuiltins.
The builtin and progs function data is overlaid so the extra data
doesn't cause too much memory to be used (it's actually 8 bytes smaller
now). The plan is to pre-compute the offsets based on the parameter
size and alignment data.
This will make it possible for the engine to set up their parameter
pointers when running Ruamoko progs. At this stage, it doesn't matter
*too* much, except for varargs functions, because no builtin yet takes
anything larger than a float quaternion, but it will be critical when
double or long vec3 and vec4 values are passed.
Just 32-bit rounding to next higher power of two, and base 2 logarithm.
Most importantly, they are suitable for use in initializers as they are
constant in, constant out.
I found the docs in PR_ExecuteProgram and PR_CallFunction to be a little
confusing, so making it explicit that PR_ExecuteProgram calls
PR_CallFunction and that PR_CallFunction should be called only in a
builtin seemed like a good idea.
And fix an incorrect definition for RETURN_QUAT.
Prefixed MAX_STACK_DEPTH and LOCALSTACK_SIZE (and LOCALSTACK_SIZE got an
extra _).
The rest is just edits to documentation comments.
Due to how OP_RETURN works, a destination is required for any function
returning data, but the caller may not have allocated any space for the
value. Thus the VM maintains a buffer into which the data can be put and
ignored. It also makes a good place for return values when the engine
calls Ruamoko code as trusting progs code with return sizes seems like a
recipe for disaster, especially if the return location is on the C
stack.
And provide a table for such for qfcc and the like. With this, using
pr_double_t (for example) in C will cause the double value to always be
8-byte aligned and thus structures shared between gcc and qfcc will be
consistent (with a little fuss to take care of the warts).
And other related fields so integer is now int (and uinteger is uint). I
really don't know why I went with integer in the first place, but this
will make using macros easier for dealing with types.
They are both gone, and pr_pointer_t is now pr_ptr_t (pointer may be a
little clearer than ptr, but ptr is consistent with things like intptr,
and keeps the type name short).
This required delaying the setting of the return pointer by call until
after the current pointer had been saved, and thus passing the desired
pointer into PR_CallFunction (which does have some advantages for C
functions calling progs functions, but some dangers too (should ensure a
128 byte (32 word) buffer when calling untrusted code (which is any,
really)).
This fixes the issue of the data stack not being restored properly
because the returning function needs to return a value from its local
variables (stored on the stack) and accessing stack data below the stack
pointer is a bad idea (sure, no interrupts yet, but who knows...).
I don't know why they were ever signed (oversight at id and just
propagated?). Anyway, this resulted in "unsigned" spreading a bit, but
all to reasonable places.
This has been a long-held wishlist item, really, and I thought I might
as well take the opportunity to add the instructions. The double
versions of STATE require both the nextthink field and time global to be
double (but they're not resolved properly yet: marked with
"FIXME double time" comments).
Also, the frame number for double time state is integer rather than
float.
While it doesn't cover the addressing modes, it does match the bit
pattern used in the Ruamoko instruction set. It will make selecting
branch instructions easier (especially for Ruamoko).
In some cases, gcc-11 does a good enough job translating normal looking
C expressions so use just those, but other times need to dig around for
an appropriate intrinsic.
Also, now need to disable psapi warnings when compiling for anything
less than avx.
And partial implementations in qfcc (most places will generate an
internal error (not implemented) or segfault, but some low-hanging fruit
has already been implemented).
As I expect to be tweaking things for a while, it's part of the build
process. This will make it a lot easier to adjust mnemonics and argument
formats (tweaking the old table was a pain when conventions changed).
It's not quite done as it still needs arg widths and types.
While working on the new opcode table, I decided a lot of the names were
not to my liking. Part of the problem was the earlier clash with the
v6p opcode names, but that has been resolved via the v6p tag.
Always setting w to 0 made it impossible to use the resulting 4d vectors
in division-based operations as they would result in divide-by-zero and
thus an unavoidable exception (CPUs don't like integer div-by-zero).
I'll probably add similar for float and double, but they're not as
critical as they'll just give inf or nan. This also increases my doubts
about the value of keeping 3d vector operations.
Float bit-ops as well.
Also, add q*v4 and v4*q instructions. There are currently 48 free
opcodes, and I might remove the scale instructions, but they could be
useful as expanding a single float to a vector would take 3 instructions
(copy to temp, swizzle-expand temp, multiply, vs just scale).
It turns out gcc optimizes the obvious code nicely. It doesn't do so
well for cmul, but I decided to use obvious code anyway (the instruction
counts were the same, so maybe it doesn't get better for a single pair
of operands).
This allows the VM to select the right execution loop and qfcc currently
still produces only the old IS (it doesn't know how to deal with the new
IS yet)
When it's finalized (most of the conversion operations will go, probably
the float bit ops, maybe (very undecided) the 3-component vector ops,
and likely the CALLN ops), this will be the actual instruction set for
Ruamoko.
Main features:
- Significant reduction in redundant instructions: no more multiple
opcodes to move the one operand size.
- load, store, push, and pop share unified addressing mode encoding
(with the exception of mode 0 for load as that is redundant with mode
0 for store, thus load mode 0 gives quick access to entity.field).
- Full support for both 32 and 64 bit signed integer, unsigned integer,
and floating point values.
- SIMD for 1, 2, (currently) 3, and 4 components. Transfers support up
to 128-bit wide operations (need two operations to transfer a full
4-component double/long vector), but all math operations support both
128-bit (32-bit components) and 256-bit (64-bit components) vectors.
- "Interpreted" operations for the various vector sizes: complex dot
and multiplication, 3d vector dot and cross product, quaternion dot
and multiplication, along with qv and vq shortcuts.
- 4-component swizzles for both sizes (not yet implemented, but the
instructions are allocated), with the option to zero or negate (thus
conjugates for complex and quaternion values) individual components.
- "Based offsets": all relevant instructions include base register
indices for all three operands allowing for direct access to any of
four areas (eg, current entity, current stack frame, Objective-QC
self, ...) instructions to set a register and push/pop the four
registers to/from the stack.
Remaining work:
- Implement swizzle operations and a few other stragglers.
= Make a decision about conversion operations (if any instructions
remain, they'll be just single-component (at 14 meaningful pairs,
that's a lot of instructions to waste on SIMD versions).
- Decide whether to keep CALL1-CALL8: probably little point in
supporting two different calling conventions, and it would free up
another eight instructions.
- Unit tests for the instructions.
- Teach qfcc to generate code for the new instruction set (hah, biggest
job, I'm sure, though hopefully not as crazy as the rewrite eleven
years ago).
I wish I'd done it this way years ago (but maybe gcc 2.95 couldn't hack
the casts, I do know there were aliasing problems in the past). Anyway,
this makes operand access much more consistent for variable sized
operands (eg float vs double vs vec4), and is a big part of the new
instruction set implementation.
And add a unary op macro. Having VectorCompOp makes it easy to write
macros that work for multiple data widths, which is why it and its users
now use (dst, ...) instead of (..., dst) as in the past. I'll sort out
the other macros later now that I know the compiler handily gives
messages about the switched order (uninitialized vars etc).
This renames existing VectorCompCompare (and quaternion equivalent) to
VectorCompCompareAll and makes VectorCompCompare produce a vector of
results with optional negation (converting 0,1 to 0,-1 for compatibility
with simd semantics).
For int, long, float and double. I've been meaning to add them for a
while, and they're part of the new Ruamoko instructions set (which is
progressing nicely).
The opcode table is a nightmare to maintain, but this does clean it up
and speed up opcode lookups since they can now be indexed. Of course, it
turns out I had missed adding several instructions, so had to fix that,
and qfcc needed a bit of a re-jigger to get the opcode out of the table.
The switch from using pr_functions (dfunction_t) to function_table
(bfunction_t) for keeping track of the current function (and thus
profiling data) broke PR_Profile as it never saw anything but 0.
PR_LoadDebug now does only the initial version and crc checks, and the
byte-swapping of the loaded symbols file. PR_DebugSetSym sets up all the
pointers.
mtwist_rand_0_1 produces numbers in the range [0, 1) and
mtwist_rand_m1_1 produces numbers in the range (-1, 1). The numbers will
not be denormal, so the distribution should be fairly uniform (as much
as Mersenne Twister itself is), but this needs proper testing.
0 is included for the mtwist_rand_0_1 as it seems useful, but -1 is not
included in mtwist_rand_m1_1 in order to keep the extremes of the
distribution balanced around 0.
And create rua_game to coordinate other game builtins.
Menus are broken for key handling, but have been since the input rewrite
anyway. rua_input adds the ability to create buttons and axes (but not
destroy them). More work needs to be done to flesh things out, though.
This takes care of the global variables to a point (there is still the
global struct shared between the non-vulkan renderers), but it also
takes care of glsl's points-only rendering.
After yesterday's crazy marathon editing all the particles files, and
starting to do another big change to them today, I realized that I
really do need to merge them down. All the actual spawning is now in the
client library (though particle insertion will need to be moved). GLSL
particle rendering is semi-broken in that it now does only points (until
I come up with a way to select between points and quads (probably a
context object, which I need anyway for Vulkan)).
This has the advantage of getting entity_t out of the particle system,
and much easier to read math. Also, it served as a nice test for my
particle physics shaders (implemented the ideas in C). There's a lot of
code that needs merging down: all but the actual drawing can be merged.
There's some weirdness with color ramps, but I'll look into that later.
This was needed to get crosshaircolor working correctly, but is likely
another step towards resizable windows (the listener set types are
generic for any viddef event, not just palette changes).
Holding onto the pointer is not a good idea, and it is read-only as
direct manipulation of the world matrix is not supported. However, this
is useful for passing the matrix to the GPU.
This gets the pipelines loaded (and unloaded on shutdown). Probably the
easy part :P. Still need to sort out the command buffers,
synchronization, and particle generation (and probably a bunch else
that's not coming to mind).
This needed changing Vulkan_CreatePipeline to
Vulkan_CreateGraphicsPipeline for consistency (and parsing the
difference from a plist seemed... not worth thinking about).
It turned out the bindless approach wouldn't work too well for my design
of the sprite objects, but I don't think that's a big issue at this
stage (and it seems bindless is causing problems for brush/alias
rendering via renderdoc and on my versa pro). However, I have figured
out how to make effective use of descriptor sets (finally :P).
The actual normal still needs checking, but the sprites are currently
unlit so not an issue at this stage.
I'm not at all sure what I was thinking when I designed it, but I
certainly designed it wrong (to the point of being fairly useless). It
turns out memory requirements are already aligned in size (so just
multiplying is fine), and what I really wanted was to get the next
offset aligned to the given requirements.
The vertices and frame images are loaded into the one memory object,
with the vertices first followed by the images.
The vertices are 2D xy+uv sets meant to be applied to the model
transform frame, and are pre-computed for the sprite size (this part
does support sprites with varying frame image sizes).
The frame images are loaded into one image with each frame on its own
layer. This will cause some problems if any sprites with varying frame
image sizes are found, but the three sprites in quake are all uniform
size.
As much as it can be since the texture data is interleaved with the
model data in the files (I guess not that bad a design for 25 years ago
with the tight memory constraints), but this paves the way for
supporting sprites in Vulkan.
This is needed for cleaning up excess memsets when loading files because
Hunk_RawAllocName has nonnull on its hunk pointer (as the rest of the
hunk functions really should, but not just yet).
In trying to reduce unnecessary memsets when loading files, I found that
Hunk_RawAllocName already had nonnull on it, so quakefs needed to know
the hunk it was to use. It seemed much better to to go this way (first
step in what is likely to be a lengthy process) than backtracking a
little and removing the nonnull attribute.
This gets the alias pipeline in line with the bsp pipeline, and thus
everything is about as functional as it was before the rework (minus
dealing with large texture sets).
I guess it's not quite bindless as the texture index is a push constant,
but it seems to work well (and I may have fixed some full-bright issues
by accident, though I suspect that's just my imagination, but they do
look good).
This should fix the horrid frame rate dependent behavior of the view
model.
They are also in their own descriptor set so they can be easily shared
between pipelines. This has been verified to work for Draw.
BSP textures are now two-layered with the albedo and emission in the two
layers rather than two separate images. While this does increase memory
usage for the textures themselves (most do not have fullbright pixels),
it cuts down on image and image view handles (and shader resources).
For now, just dot product, trig, and min/max/bound, but it works well as
a proof of concept. The main goal was actually min. Only the list of
symbols is provided, it is the user's responsibility to set up the
symbol table and context.
cexpr's symbol tables currently aren't readily extended, and dynamic
scoping is usually a good thing anyway. The chain of contexts is walked
when a symbol is not found in the current context's symtab, but minor
efforts are made to avoid checking the same symtab twice (usually cased
by cloning a context but not updating the symtab).
Multiple render passes are needed for supporting shadow mapping, and
this is a huge step towards breaking the Vulkan render free of Quake,
and hopefully will lead the way for breaking the GL renderers free as
well.
This is actually a better solution to the renderer directly accessing
client code than provided by 7e078c7f9c.
Essentially, V_RenderView should not have been calling R_RenderView, and
CL_UpdateScreen should have been calling V_RenderView directly. The
issue was that the renderers expected the world entity model to be valid
at all times. Now, R_RenderView checks the world entity model's validity
and immediately bails if it is not, and R_ClearState (which is called
whenever the client disconnects and thus no longer has a world to
render) clears the world entity model. Thus R_RenderView can (and is)
now called unconditionally from within the renderer, simplifying
renderer-specific variants.
When allocating memory for multiple objects that have alignment
requirements, it gets tedious keeping track of the offset and the
alignment. This is a simple function for walking the offset respecting
size and alignment requirements, and doubles as a size calculator.
I'm not sure what I was thinking when I made PL_RemoveObjectForKey take
a const plitem. One of those times where C could do with being a little
more strict.
The stack is arbitrary strings that the validation layer debug callback
prints in reverse order after each message. This makes it easy to work
out what nodes in a pipeline/render pass plist are causing validation
errors. Still have to narrow down the actual line, but the messages seem
to help with that.
Putting qfvPushDebug/qfvPopDebug around other calls to vulkan should
help out a lot, tool.
As a bonus, the stack is printed before debug_breakpoint is called, so
it's immediately visible in gdb.
I'm not at all happy with con_message and con_menu, but fixing them
properly will take a rework of the menus (planned, though). Also, the
Menu_ console command implementations are a bit iffy and could also do
with a rewrite (probably part of the rest of the menu rework) or just
nuking (they were part of Johnny on Flame's work, so I suspect had
something to do with joystick bindings).
An imt switcher automatically changes the context's active imt based on
a user specified list of binary inputs. The inputs may be either buttons
(indicated as +button) or cvars (bare name). For buttons, the
pressed/not pressed state is used, and cvars are interpreted as ints
being 0 or not 0. The order of the inputs determines the bit number of
the input, with the first input being bit 0, second bit 1, third bit 2
etc. A default imt is given so large switchers do not need to be fully
configured (the default imt is written to all states).
A context can have any number of switchers attached. The switchers can
wind up fighting over the active imt, but this seems to be something for
the "user" (eg, configuration system) to sort out rather than the
switcher code enforcing anything.
As a result of the inputs being treated as bits, a switcher with N
inputs will have 2**N states, thus there's a maximum of 16 inputs for
now as 65536 states is a lot of configuration.
Using a switcher, setting up a standard strafe/mouse look configuration
is fairly easy.
imt_create key_game imt_mod
imt_create key_game imt_mod_strafe imt_mod
imt_create key_game imt_mod_freelook
imt_create key_game imt_mod_lookstrafe imt_mod_freelook
imt_switcher_create mouse key_game imt_mod_strafe +strafe lookstrafe +mlook freelook
imt_switcher 0 imt_mod 2 imt_mod 4 imt_mod_freelook 8 imt_mod_freelook 12 imt_mod_freelook
imt_switcher 6 imt_mod_lookstrafe 10 imt_mod_lookstrafe 14 imt_mod_lookstrafe
in_bind imt_mod mouse axis 0 move.yaw
in_bind imt_mod mouse axis 1 move.forward
in_bind imt_mod_strafe mouse axis 0 move.side
in_bind imt_mod_lookstrafe mouse axis 0 move.side
in_bind imt_mod_freelook mouse axis 1 move.pitch
This takes advantage of imt chaining and the default imt for the
switcher (there are 8 states that use imt_mod_strafe).
The switcher name must be unique across all contexts, and every imt used
in a switcher must be in the switcher's context.
The listener is invoked when the axis value changes due to IN_UpdateAxis
or IN_ClampAxis updating the axis. This does mean the listener
invocation make be somewhat delayed. I am a tad uncertain about this
design thus it being a separate commit.
Listeners are separate to the main callback as listeners have only
read-only access to the objects, but the main callback is free to modify
the cvar and thus can act as a parser and validator. The listeners are
invoked after the main callback if the cvar is modified. There does not
need to be a main callback for the listeners to be invoked.
I decided cvars and input buttons/axes need listeners so any changes to
them can be propagated. This will make using cvars in bindings feasible
and I have an idea for automatic imt switching that would benefit from
listeners attached to buttons and cvars.
Combining absolute and relative inputs at the binding does not work well
because absolute inputs generally update only when the physical input
updates, so clearing the axis input each frame results in a brief pulse
from the physical input, but relative inputs must be cleared each frame
(where frame here is each time the axis is read) but must accumulate the
relative updates between frames.
Other than the axis mode being incorrect, this seems to work quite
nicely.