With the return buffer in progs_t, it could not be addressed by the
progs on 64-bit machines (this was intentional, actually), but in order
to get obj_msg_sendv working properly, I needed a way to "bounce" the
return address of a calling function to the called function. The
cleanest solution I could think of was to add a mode to the with
instruction allowing the return pointer to be loaded into a register and
then calling the function with a 0 offset for the return value but using
the relevant register (next few commits). Testing promptly segfaulted
due to the 64-bit offset not fitting into a 32-bit value.
The plan is to use the types to extract the number of parameters for a
selector when it is necessary to know the count. However, it'll probably
become useful for something else alter (these things seem to always do
so).
It's currently only 4 (or even 3 for v6) words, but this fixes false
positives when checking for null pointers in Ruamoko progs due to
pr_return pointing to the return buffer and thus outside the progs
memory map resulting in an impossible to exceed value.
I abandoned the reason for doing it (adding a pile of vector types), but
I liked the cleanup. All the implementations are hand-written still, but
at least the boilerplate stuff is automated.
PR_SetupParams is new and sets up the parameter pointers so older code
that expects only up to 8 parameter will work with both v6p and Ruamoko
progs without having to check what progs are running. PR_SetupParams is
useful even when Ruamoko progs are expected as it reserves the required
space (respecting alignment) on the stack and returns a pointer to the
top (bottom? confusing) of the stack. PR_PushFrame and PR_PopFrame
need to be used around PR_SetupParams, regardless of using temp strings,
to avoid a stack leak (need to do an audit).
This is part of the work for #26 (Record resource pointer with builtin
function data). Currently, the data pointer gets as far as the
per-instance VM function table (I don't feel like tackling the job of
converting all the builtin functions tonight). All the builtin modules
that register a resources data block pass that block on to
PR_RegisterBuiltins.
The builtin and progs function data is overlaid so the extra data
doesn't cause too much memory to be used (it's actually 8 bytes smaller
now). The plan is to pre-compute the offsets based on the parameter
size and alignment data.
This will make it possible for the engine to set up their parameter
pointers when running Ruamoko progs. At this stage, it doesn't matter
*too* much, except for varargs functions, because no builtin yet takes
anything larger than a float quaternion, but it will be critical when
double or long vec3 and vec4 values are passed.
I found the docs in PR_ExecuteProgram and PR_CallFunction to be a little
confusing, so making it explicit that PR_ExecuteProgram calls
PR_CallFunction and that PR_CallFunction should be called only in a
builtin seemed like a good idea.
And fix an incorrect definition for RETURN_QUAT.
Prefixed MAX_STACK_DEPTH and LOCALSTACK_SIZE (and LOCALSTACK_SIZE got an
extra _).
The rest is just edits to documentation comments.
Due to how OP_RETURN works, a destination is required for any function
returning data, but the caller may not have allocated any space for the
value. Thus the VM maintains a buffer into which the data can be put and
ignored. It also makes a good place for return values when the engine
calls Ruamoko code as trusting progs code with return sizes seems like a
recipe for disaster, especially if the return location is on the C
stack.
And other related fields so integer is now int (and uinteger is uint). I
really don't know why I went with integer in the first place, but this
will make using macros easier for dealing with types.
They are both gone, and pr_pointer_t is now pr_ptr_t (pointer may be a
little clearer than ptr, but ptr is consistent with things like intptr,
and keeps the type name short).
This required delaying the setting of the return pointer by call until
after the current pointer had been saved, and thus passing the desired
pointer into PR_CallFunction (which does have some advantages for C
functions calling progs functions, but some dangers too (should ensure a
128 byte (32 word) buffer when calling untrusted code (which is any,
really)).
This fixes the issue of the data stack not being restored properly
because the returning function needs to return a value from its local
variables (stored on the stack) and accessing stack data below the stack
pointer is a bad idea (sure, no interrupts yet, but who knows...).
I don't know why they were ever signed (oversight at id and just
propagated?). Anyway, this resulted in "unsigned" spreading a bit, but
all to reasonable places.
This has been a long-held wishlist item, really, and I thought I might
as well take the opportunity to add the instructions. The double
versions of STATE require both the nextthink field and time global to be
double (but they're not resolved properly yet: marked with
"FIXME double time" comments).
Also, the frame number for double time state is integer rather than
float.
And partial implementations in qfcc (most places will generate an
internal error (not implemented) or segfault, but some low-hanging fruit
has already been implemented).
When it's finalized (most of the conversion operations will go, probably
the float bit ops, maybe (very undecided) the 3-component vector ops,
and likely the CALLN ops), this will be the actual instruction set for
Ruamoko.
Main features:
- Significant reduction in redundant instructions: no more multiple
opcodes to move the one operand size.
- load, store, push, and pop share unified addressing mode encoding
(with the exception of mode 0 for load as that is redundant with mode
0 for store, thus load mode 0 gives quick access to entity.field).
- Full support for both 32 and 64 bit signed integer, unsigned integer,
and floating point values.
- SIMD for 1, 2, (currently) 3, and 4 components. Transfers support up
to 128-bit wide operations (need two operations to transfer a full
4-component double/long vector), but all math operations support both
128-bit (32-bit components) and 256-bit (64-bit components) vectors.
- "Interpreted" operations for the various vector sizes: complex dot
and multiplication, 3d vector dot and cross product, quaternion dot
and multiplication, along with qv and vq shortcuts.
- 4-component swizzles for both sizes (not yet implemented, but the
instructions are allocated), with the option to zero or negate (thus
conjugates for complex and quaternion values) individual components.
- "Based offsets": all relevant instructions include base register
indices for all three operands allowing for direct access to any of
four areas (eg, current entity, current stack frame, Objective-QC
self, ...) instructions to set a register and push/pop the four
registers to/from the stack.
Remaining work:
- Implement swizzle operations and a few other stragglers.
= Make a decision about conversion operations (if any instructions
remain, they'll be just single-component (at 14 meaningful pairs,
that's a lot of instructions to waste on SIMD versions).
- Decide whether to keep CALL1-CALL8: probably little point in
supporting two different calling conventions, and it would free up
another eight instructions.
- Unit tests for the instructions.
- Teach qfcc to generate code for the new instruction set (hah, biggest
job, I'm sure, though hopefully not as crazy as the rewrite eleven
years ago).
I wish I'd done it this way years ago (but maybe gcc 2.95 couldn't hack
the casts, I do know there were aliasing problems in the past). Anyway,
this makes operand access much more consistent for variable sized
operands (eg float vs double vs vec4), and is a big part of the new
instruction set implementation.
The switch from using pr_functions (dfunction_t) to function_table
(bfunction_t) for keeping track of the current function (and thus
profiling data) broke PR_Profile as it never saw anything but 0.
PR_LoadDebug now does only the initial version and crc checks, and the
byte-swapping of the loaded symbols file. PR_DebugSetSym sets up all the
pointers.
I had forgotten that _size was the number of rows in the map, not the
number of objects (1024 objects per row). This fixes the missed device
removal messages. And probably a slew of other bugs I'd yet to encounter
:P
Support for finding the first address associated with a source line was
added to the engine, returning 0 if not found.
A temporary breakpoint is set and the progs allowed to run free.
However, better handling of temporary breakpoitns is needed as currently
a "permanent" breakpoint will be cleared without clearing the temporary
breakpoing if the permanent breakpoing is hit while execut-to-cursor is
running.
The server edict arrays are now stored outside of progs memory, only the
entity data itself (ie data accessible to progs via ent.fld) is stored in
progs memory. Many of the changes were due to code accessing edicts and
entity fields directly rather than through the provided macros.
I never liked that some of the macros needed the type as a parameter
(yay typeof and __auto_type) or those that returned a value hid the
return statement so they couldn't be used in assignments.
Still "some" more to go: a pile to do with transforms and temporary
entities, and a nasty one with host_cbuf. There's also all the static
block-alloc lists :/