Both float 2,3,4 vectors and double 2,3,4 vectors (1 would be just a
copy of the mul instructions).
This completes the currently planned instructions. Now for testing.
Not all possibilities are supported because converting between int and
uint, and long and ulong is essentially a no-op. However, thanks to
Deek's suggestion, not only are all reasonable conversions available,
conversions for all widths are available, so vector conversions are
supported.
The code for the conversions is generated.
The call1-8 instructions have been removed as they are really not needed
(they were put in when I had plans of simple translation of v6p progs to
ruamoko, but they joined the dinosaurs).
The call instruction lost mode A (that is now return) and its mode B is
just the regular function access. The important thing is op_c (with
support for with-bases) specifies the location of the return def.
The return instruction packs both its addressing mode and return value
size into st->c as a 3.5 value: 3 bits for the mode (it supports all
five addressing modes with entity.field being mode 4) and 5 for the
size, limiting return sizes to 32 words, which is enough for one 4x4
double matrix.
This, especially with the following convert patch, frees up a lot of
instructions.
The bug (alignment issues with AVX on windows) seems to have in gcc from
the 4.x days, and is still present in 11.2: it does not ensure stack
parameters that need 32 byte alignment are aligned. Telling gcc to use
the sysv abi (safe on a static function) lets gcc do what it does for
linux (usually pass the parameters in registers, which it seems to have
done).
While working on the new opcode table, I decided a lot of the names were
not to my liking. Part of the problem was the earlier clash with the
v6p opcode names, but that has been resolved via the v6p tag.
Use the new "1" versions of loadvec3 to get a 1 in w to avoid
divide-by-zero errors, and use the correct type for longs (forgot to
change i to l on the vector types).
It turned out I had no way of using a pointer or field as the value to
load, so all 4 modes are duplicated with loads from where operand b
points, but the loaded value interpreted the same way. Also, fixed an
error in the calculation of op-b offsets.
Statements can be bounds checked in the one place (jump calculation),
but memory accesses cannot as they can be used in lea instructions which
should never cause an exception (unless one of lea's operands is OOB).
Float bit-ops as well.
Also, add q*v4 and v4*q instructions. There are currently 48 free
opcodes, and I might remove the scale instructions, but they could be
useful as expanding a single float to a vector would take 3 instructions
(copy to temp, swizzle-expand temp, multiply, vs just scale).
The swizzle instruction is very powerful in that in can do any of the
256 permutations of xyzw, optionally negate any combination of the
resulting components, and zero any combination of the result components
(even all). This means the one instruction can take care of any actual
swizzles, conjugation for complex and quaternion values, zeroing vectors
(not that it's the only way), and probably other weird things.
The python file was used to generate the jump table and actual swizzle
code.
They even found a bug in the addressing mode functions :) (I'd forgotten
that I wanted signed offsets from the pointer and thus forgot to cast
st->b to short in order to get the sign extension)
This allows the VM to select the right execution loop and qfcc currently
still produces only the old IS (it doesn't know how to deal with the new
IS yet)
When it's finalized (most of the conversion operations will go, probably
the float bit ops, maybe (very undecided) the 3-component vector ops,
and likely the CALLN ops), this will be the actual instruction set for
Ruamoko.
Main features:
- Significant reduction in redundant instructions: no more multiple
opcodes to move the one operand size.
- load, store, push, and pop share unified addressing mode encoding
(with the exception of mode 0 for load as that is redundant with mode
0 for store, thus load mode 0 gives quick access to entity.field).
- Full support for both 32 and 64 bit signed integer, unsigned integer,
and floating point values.
- SIMD for 1, 2, (currently) 3, and 4 components. Transfers support up
to 128-bit wide operations (need two operations to transfer a full
4-component double/long vector), but all math operations support both
128-bit (32-bit components) and 256-bit (64-bit components) vectors.
- "Interpreted" operations for the various vector sizes: complex dot
and multiplication, 3d vector dot and cross product, quaternion dot
and multiplication, along with qv and vq shortcuts.
- 4-component swizzles for both sizes (not yet implemented, but the
instructions are allocated), with the option to zero or negate (thus
conjugates for complex and quaternion values) individual components.
- "Based offsets": all relevant instructions include base register
indices for all three operands allowing for direct access to any of
four areas (eg, current entity, current stack frame, Objective-QC
self, ...) instructions to set a register and push/pop the four
registers to/from the stack.
Remaining work:
- Implement swizzle operations and a few other stragglers.
= Make a decision about conversion operations (if any instructions
remain, they'll be just single-component (at 14 meaningful pairs,
that's a lot of instructions to waste on SIMD versions).
- Decide whether to keep CALL1-CALL8: probably little point in
supporting two different calling conventions, and it would free up
another eight instructions.
- Unit tests for the instructions.
- Teach qfcc to generate code for the new instruction set (hah, biggest
job, I'm sure, though hopefully not as crazy as the rewrite eleven
years ago).
I wish I'd done it this way years ago (but maybe gcc 2.95 couldn't hack
the casts, I do know there were aliasing problems in the past). Anyway,
this makes operand access much more consistent for variable sized
operands (eg float vs double vs vec4), and is a big part of the new
instruction set implementation.
There is no reasonable way (due to hardware-enforced alignment issues)
to simply convert old bytecode to new (probably best done with an
off-line tool, preferably just recompiling when I get qfcc up to the
job), so both loops will need to be present. This just moves the
original loop into its own function in order to make it easy to bring in
the new (and iron out integration issues).
The server edict arrays are now stored outside of progs memory, only the
entity data itself (ie data accessible to progs via ent.fld) is stored in
progs memory. Many of the changes were due to code accessing edicts and
entity fields directly rather than through the provided macros.
And rename prd_exit to prd_terminate (the idea is the host will
terminate the VM). This makes it possible for the debugger to pause the
VM before any code, even a builtin function, is executed. Breaks the
debugger source window, but only because it's not updating on file
change (I think).
I decided I want events for VM enter/exit but enter needs to somehow
pass the function which will be executed (even if a builtin). A generic
void * param seemed the best idea, which meant the error string could be
passed via the param instead of a "global" string in the progs struct.
While there was a breakpoint hook, it was for only breakpoints and more
was needed. Now there's a generic hook that is called for tracing,
breakpoints, watch points, runtime errors and VM errors, with the
"event" type passed as the first parameter and a data pointer in the
second.
The memset instructions now match the move* instructions other than the
first operand (always int). Probably breaks much, but fixed in next few
commits.
If a temp string is found in the return slot, PR_FreeTempStrings won't
delete the string. However, PR_PopFrame was blindly stomping on the
possibly surviving temp string with the push strings, which would cause
a leak.
This "pushes" a temp string onto the callee's stack frame after removing
it from the caller's stack frame. This is so builtins can pass
auto-freed memory to called progs code. No checking is done, but mayhem
is likely to ensue if a string is pushed that was allocated in an
earlier frame.
The progs execution code will call a breakpoint handler just before
executing an instruction with the flag set. This means there's no need
for the breakpoint handler to mess with execution state or even the
instruction in order to continue past the breakpoint.
The flag being set in a progs file is invalid.
It is now set to 0 when progs are loaded and every time
PR_ExecuteProgram() returns. This takes care of the default case, but
when setting parameters, pr_argc needs to be set correctly in case a
vararg function is called.
PR_SaveParams() is required for implementing the +initialize diversion
used by Objective-QuakeC because builtins do not have local def spaces
(of course, a normal stack calling convention would help). However, it
is entirely possible for a call to +initialize to trigger another call
to +initialize, thus the need for stacking parameter stashes. As a
bonus, this implementation cleans up some fields in progs_t.
The engine now requires non-v6 progs to store the log2 alignment for the
param struct in .param_alignment.
PR_EnterFunction is clearer and possibly more efficient.
Only as scalars, I still need to think about what to do for vectors and
quaternions due to param size issues. Also, doubles are not yet
guaranteed to be correctly aligned.
The offset to compensate for st++ was missing.
Obviously, the code has never been tested. Found while looking at the
jump code and thinking about using 32-bit addresses for the jump tables.
It was pointed out by Blub\w (gmqcc) that OP_MUL_FV and friends were buggy
when the operands overlapped (eg, x = x.x * x) as the result would become
'x.x*x.x x.y*x.x*x.x x.z*x.x*x.x' (note the x.x squared for y and z). On
testing, sure enough the bug was present (and is a nice demonstration that
QF's VM does NOT have strict-aliasing bugs). As a very nice benefit: the
code produced by the fixes is actually faster than the broken version :).
The ruamoko code used for testing:
void (string fmt, ...) printf = #0;
vector foo (vector x)
{
x = x * x.x;
return x;
}
vector bar (vector x)
{
x = x.x * x;
return x;
}
int main ()
{
vector x = '2 3 4';
vector y = foo (x);
vector z = bar (x);
printf ("x=%v y=%v z=%v 2*x=%v\n", x, y, z, 2*x);
return 0;
}
Normally, the order doesn't matter, but when tracing code, it becomes very
difficult to tell where the trace ends and the dump begins. Printing the
message first puts the message between the trace and the dump: much easier
:)
another builtin by name, and returns it.
Soon I'll change all our new builtins to by allocated dynamically, as
well as changing the number checkfunction uses, and happily break
everything that uses them :D
integer constants and float function args/return values.
pr_comp.h:
o add the integer opcodes to pr_opcode_e
pr_edict.c:
o add "quaternion" and "integer" to type_name[]
o support quatnernion and integers types when printing values
o support the integer opcodes when bounds checking
pr_exec.c
o enable the integer opcodes
pr_opcode:
o add the integer opcodes to the opcode table
o logical operators all result in an integer rather than a value
expr.h:
o rename int_val to integer_val
qfcc.h:
o kill another magic number
expr.c:
o move the opcode to string conversion out of type_mismatch and into
get_op_string
o rename int_val to integer_val
o general integer type support.
o generate an internal comipiler error for null opcodes rather than
segging.
pr_imm.c:
o rename int_val to integer_val
o support integer constants, converting to float when needed.
pr_lex.c:
o magic number death and support quaternions and integers in type_size[]
qc-lex.l
o rename int_val to integer_val
o support quaternion and integer type keywords
qc-parse.y:
o rename int_val to integer_val
o use binary_expr instead of new_binary_expr for local initialized
variables
builtins.c:
o rename int_val to integer_val
o fix most (all?) of the INT related FIXMEs
defs.qc:
o use integer instead of float where it makes sense
main.c:
o read_result is now integer rather than float
main.qc:
o float -> integer where appropriate
o new test for int const to float arg
add no_exec_limit field. Set to 1 to disable the runaway loop check
for unlimited runs (eg, in qwaq)
pr_exec.c:
don't bother checking the profile counter if pr->no_exec_limit is set
pr_strings.c:
free unreferenced dynamic strings rather than referenced.
add OP_ADD_S. WARNING!!! this /will/ move.
progs.h:
add prototype for PR_PrintStatement
pr_edict.c:
add OP_ADD_S support in the progs checker
pr_exec.c:
implement OP_ADD_S
tools/qfcc/include/.gitignore:
add config.h.in
qfcc.h:
nuke PR_NameImmediate and change PR_ParseImmediate's prototype (see
pr_imm.c)
pr_comp.c:
add ADD_S, adjust for PR_ParseImmediate's prototype, make
PR_ParseExpression work with non-sequential opcodes (slow, will work on
that next). Fix up initialised global parsing.
pr_imm.c:
nuke PR_NameImmediate. didn't work well and wasn't such a good idea anyway.
PR_ParseImmediate now accepts a def_t * arg. if null, will allocate a
new global def, otherwise it will initialize the def passed in.
qwaq/main.c:
sports some debugging code (dumps info about the progs it's running)
qwaq/main.qc:
better ADD_S testing