It turned out I need locals count and params_start for debugging, so use
the progs version instead to bail early from PR_EnterFunction and
PR_LeaveFunction (which I had forgotten anyway, oops).
This cleans up dprograms_t, making it easier to read and see what chunks
are in it (I was surprised to see only 6, the explicit pairs made it
seem to have more).
This makes return consistent with load, store, etc, though its
addressing mode is encoded in bits 5 and 6 of c rather than the opcode.
It turns out I had no tests for any of return's addressing modes other
than basic def references, so no tests needed changing.
The parameter defs are allocated from the parameter space using a
minimum alignment of 4, and varargs functions get a va_list struct in
place of the ...
An "args" expression is unconditionally injected into the call arguments
list at the place where ... is in the list, with arguments passed
through ... coming after the ...
Arguments get through to functions now, but there's problems with taking
the address of local variables: currently done using constant pointer
defs, which can't work for the base register addressing used in Ruamoko
progs.
With the update to test-bi's printf (and a hack to qfcc for lea),
triangle.r actually works, printing the expected results (but -1 instead
of 1 for equality, though that too is actually expected). qfcc will take
a bit longer because it seems there are some design issues in address
expressions (ambiguity, and a few other things) that have pretty much
always been there.
PR_SetupParams is new and sets up the parameter pointers so older code
that expects only up to 8 parameter will work with both v6p and Ruamoko
progs without having to check what progs are running. PR_SetupParams is
useful even when Ruamoko progs are expected as it reserves the required
space (respecting alignment) on the stack and returns a pointer to the
top (bottom? confusing) of the stack. PR_PushFrame and PR_PopFrame
need to be used around PR_SetupParams, regardless of using temp strings,
to avoid a stack leak (need to do an audit).
The builtin and progs function data is overlaid so the extra data
doesn't cause too much memory to be used (it's actually 8 bytes smaller
now). The plan is to pre-compute the offsets based on the parameter
size and alignment data.
As even the simplest v6p functions that take parameters but have no
local or temporary variables still have locals for the local copy of the
parameters, this is a both a good check for for the Ruamoko ISA as its
functions never have locals (everything's on the progs data stack), and
an optimization for v6p functions that have no params or locals (simple
getters (very rare?), most .ctor, etc).
And fix an incorrect definition for RETURN_QUAT.
Prefixed MAX_STACK_DEPTH and LOCALSTACK_SIZE (and LOCALSTACK_SIZE got an
extra _).
The rest is just edits to documentation comments.
ldconst isn't implemented yet but the plan is to load various constants
(eg, 0, 1, 2, pi, e, ...).
Stack adjust is useful for adding an offset to the stack pointer without
having to worry about finding it (and it checks for alignment).
nop is just that :)
Due to how OP_RETURN works, a destination is required for any function
returning data, but the caller may not have allocated any space for the
value. Thus the VM maintains a buffer into which the data can be put and
ignored. It also makes a good place for return values when the engine
calls Ruamoko code as trusting progs code with return sizes seems like a
recipe for disaster, especially if the return location is on the C
stack.
It turned out that address mode B was redundant as C with 0 offset
(immediate) was the same (except for the underlying C code of course,
but adding st->b is very cheap). This allowed B to be used for
entity.field for all transfer operations. Thus instructions 0-3 are now
free as load E became load B, and other than the specifics of format
codes for statement printing, transfers+lea are unified.
And other related fields so integer is now int (and uinteger is uint). I
really don't know why I went with integer in the first place, but this
will make using macros easier for dealing with types.
They are both gone, and pr_pointer_t is now pr_ptr_t (pointer may be a
little clearer than ptr, but ptr is consistent with things like intptr,
and keeps the type name short).
This required delaying the setting of the return pointer by call until
after the current pointer had been saved, and thus passing the desired
pointer into PR_CallFunction (which does have some advantages for C
functions calling progs functions, but some dangers too (should ensure a
128 byte (32 word) buffer when calling untrusted code (which is any,
really)).
This fixes the issue of the data stack not being restored properly
because the returning function needs to return a value from its local
variables (stored on the stack) and accessing stack data below the stack
pointer is a bad idea (sure, no interrupts yet, but who knows...).
Call's operand c is used to specify where the return value of the
function is to be stored. This gets both the correct function being
called, and the value being returned correctly. Test still fails due to
the stack restoration issue.
This has been a long-held wishlist item, really, and I thought I might
as well take the opportunity to add the instructions. The double
versions of STATE require both the nextthink field and time global to be
double (but they're not resolved properly yet: marked with
"FIXME double time" comments).
Also, the frame number for double time state is integer rather than
float.
In the end, I decided any/all/none should be separate from the other
horizontal ops, if I even do them (can be implemented by first
converting to bool, then using the appropriate horizontal operation (& |
etc).
ANY/ALL/NONE have been temporarily removed until I implement the HOPS
(horizontal operations) sub-instructions, which will all both 32-bit and
64-bit operands and several other operations (eg, horizontal add).
All the fancy addressing modes for the conditional branch instructions
have been permanently removed: I decided the gain was too little for the
cost (24 instructions vs 6). JUMP and CALL retain their addressing
modes, though.
Other instructions have been shuffled around a little to fill most of
the holes in the upper block of 256 instructions: just a single small
7-instruction hole.
Rearrangements in the actual engine are mostly just to keep the code
organized. The only real changes were the various IF statements and
dealing with the resulting changes in their addressing.
When creating the tests for lea, I noticed that B was yet another simple
assign, so I decided it was best to drop it and move E into its place
(freeing up another instruction).
The compare/ne operator returns "random" -ve, 0, +ve values (really,
just the numerical difference between the chars of the strings), but all
the rest return -1 for true and 0 for false, as with the rest of the
comparison operators.
I realized that being able to do bit-wise operations with 64-bit values
(and 256-bit vectors) is far more important than some convenient boolean
logic operators. The logic ops can be handled via the bit-wise ops so
long as the values are all properly boolean, and I plan on adding some
boolean conversion ope, so no real loss.
Both float 2,3,4 vectors and double 2,3,4 vectors (1 would be just a
copy of the mul instructions).
This completes the currently planned instructions. Now for testing.
Not all possibilities are supported because converting between int and
uint, and long and ulong is essentially a no-op. However, thanks to
Deek's suggestion, not only are all reasonable conversions available,
conversions for all widths are available, so vector conversions are
supported.
The code for the conversions is generated.
The call1-8 instructions have been removed as they are really not needed
(they were put in when I had plans of simple translation of v6p progs to
ruamoko, but they joined the dinosaurs).
The call instruction lost mode A (that is now return) and its mode B is
just the regular function access. The important thing is op_c (with
support for with-bases) specifies the location of the return def.
The return instruction packs both its addressing mode and return value
size into st->c as a 3.5 value: 3 bits for the mode (it supports all
five addressing modes with entity.field being mode 4) and 5 for the
size, limiting return sizes to 32 words, which is enough for one 4x4
double matrix.
This, especially with the following convert patch, frees up a lot of
instructions.
The bug (alignment issues with AVX on windows) seems to have in gcc from
the 4.x days, and is still present in 11.2: it does not ensure stack
parameters that need 32 byte alignment are aligned. Telling gcc to use
the sysv abi (safe on a static function) lets gcc do what it does for
linux (usually pass the parameters in registers, which it seems to have
done).
While working on the new opcode table, I decided a lot of the names were
not to my liking. Part of the problem was the earlier clash with the
v6p opcode names, but that has been resolved via the v6p tag.
Use the new "1" versions of loadvec3 to get a 1 in w to avoid
divide-by-zero errors, and use the correct type for longs (forgot to
change i to l on the vector types).
It turned out I had no way of using a pointer or field as the value to
load, so all 4 modes are duplicated with loads from where operand b
points, but the loaded value interpreted the same way. Also, fixed an
error in the calculation of op-b offsets.
Statements can be bounds checked in the one place (jump calculation),
but memory accesses cannot as they can be used in lea instructions which
should never cause an exception (unless one of lea's operands is OOB).
Float bit-ops as well.
Also, add q*v4 and v4*q instructions. There are currently 48 free
opcodes, and I might remove the scale instructions, but they could be
useful as expanding a single float to a vector would take 3 instructions
(copy to temp, swizzle-expand temp, multiply, vs just scale).