The tests fail due to differences in how clang and gcc treat floating
point to unsigned integral type conversions when the values overflow. It
wouldn't be so bad if clang was consistent with conversions to 32-bit
unsigned integers, like it seems to be with conversion to 64-bit
unsigned integers.
With this, the "get QF building with clang" mini-project is done and I
won't have to panic when someone comes to me and asks if it will work.
At worst, there'll be a little bit-rot.
Only edicts themselves get a smaller alignment (4, 8 or 32 bytes,
depending on hardware and progs version). I didn't want to waste too
much memory on edict alignment for progs that don't need any better than
void *, but the zone really wants 64 bytes, and the stack might as well
have 64 bytes as well. Fixes a segfault when running v6 progs in a clang
build (clang got really agressive with simd in zone.c).
gcc and clang have rather different swizzle builtins, but both do a nice
job of optimizing the intuitive initializer swizzle (I think gcc 8(?)
didn't do such a good job thus my use of __builtin_shuffle).
It seems clang defaults to unsigned for enums. Interestingly, gcc was ok
with the checks being either way. I guess gcc treats enums that *can* be
unsigned as DWIM.
This is the bulk of the work for recording the resource pointer with
with builtin data. I don't know how much of a difference it makes for
most things, but it's probably pretty big for qwaq-curses due to the
very high number of calls to the curses builtins.
Closes#26
These add legacy support for basic float bitops (& | ^ ~). Avoiding the
instructions would require tot only the source to be converted, but also
the servers (as they do access those fields), and this seemed to be too
much.
I had forgotten that unsigned division was different from signed
division (rather silly of me). However, with some testing and analysis,
unsigned true modulo is not needed as it's not possible to have
negative inputs and thus it's the same as remainder.
This loads the current return pointer into the specified register. No
offset is used (should make that an error, but for now any offset is
simply ignored). This is part of the fix for getting obj_msg_sendv to
work with return values.
With the return buffer in progs_t, it could not be addressed by the
progs on 64-bit machines (this was intentional, actually), but in order
to get obj_msg_sendv working properly, I needed a way to "bounce" the
return address of a calling function to the called function. The
cleanest solution I could think of was to add a mode to the with
instruction allowing the return pointer to be loaded into a register and
then calling the function with a 0 offset for the return value but using
the relevant register (next few commits). Testing promptly segfaulted
due to the 64-bit offset not fitting into a 32-bit value.
The plan is to use the types to extract the number of parameters for a
selector when it is necessary to know the count. However, it'll probably
become useful for something else alter (these things seem to always do
so).
It's a bit disconcerting seeing a builtin in the top 10 when builtins
are counted by call while progs functions are counted by instruction.
Also, show the total profile after the function top-10 list.
It's currently only 4 (or even 3 for v6) words, but this fixes false
positives when checking for null pointers in Ruamoko progs due to
pr_return pointing to the return buffer and thus outside the progs
memory map resulting in an impossible to exceed value.
It turns out the return pointer still needs to be saved even when a
builtin sets up a chain call to progs, but rather than the pointer being
simply restored, it needs to be saved in the call stack exactly as if
the function was called directly by progs. This fixes the invalid self
issue quite thoroughly: parameter state seems to be correct across all
calls now.
I should set up an automated test now that I know and understand the
situation.
When calling a builtin, normally the return pointer needs to be
restored, but if the builtin changes the call depth (usually by
effecting "return foo()" as in support for objects, but possibly
setjmp/longjmp when they are implemented), then the return pointer must
not be restored. This gets vkgen past object allocation, but it dies
when trying to send messages to super. This appears to be a compiler
bug.
Since the operand types sort out the difference between asr and shr, no
need to give them different opnames. Means qfcc doesn't need to worry
about which one it's searching for.
Yet another redundant addressing mode (since ptr + 0 can be used), so
replace it with a variable-indexed array (same as in v6p). Was forced
into noticing the problem when trying to compile Machine.r.
I abandoned the reason for doing it (adding a pile of vector types), but
I liked the cleanup. All the implementations are hand-written still, but
at least the boilerplate stuff is automated.
Of course, only in Ruamoko progs, but it works quite nicely.
global_string is now passed the absolute address of the referenced
operand. With a little groveling through the progs stack, it should be
possible to resolve pointers to locals in functions further up the
stack.
This fixes Ruamoko's return format string. It looks like it's producing
the correct address (but doesn't show all the information it should),
but the rest of the debug code needs work locals.
It turned out I need locals count and params_start for debugging, so use
the progs version instead to bail early from PR_EnterFunction and
PR_LeaveFunction (which I had forgotten anyway, oops).
They now include base register index and effective address of the
operands (though it may be wrong for instructions that don't use a base
register for that operand).
This cleans up dprograms_t, making it easier to read and see what chunks
are in it (I was surprised to see only 6, the explicit pairs made it
seem to have more).
Intel hardware requires 32-byte alignment for lvec4 and dvec4.
Unfortunately, it turns out that my attempts to align progs data in qfcc
went awry do to the order block sizes are calculated when writing the
progs.
This makes return consistent with load, store, etc, though its
addressing mode is encoded in bits 5 and 6 of c rather than the opcode.
It turns out I had no tests for any of return's addressing modes other
than basic def references, so no tests needed changing.
The parameter defs are allocated from the parameter space using a
minimum alignment of 4, and varargs functions get a va_list struct in
place of the ...
An "args" expression is unconditionally injected into the call arguments
list at the place where ... is in the list, with arguments passed
through ... coming after the ...
Arguments get through to functions now, but there's problems with taking
the address of local variables: currently done using constant pointer
defs, which can't work for the base register addressing used in Ruamoko
progs.
With the update to test-bi's printf (and a hack to qfcc for lea),
triangle.r actually works, printing the expected results (but -1 instead
of 1 for equality, though that too is actually expected). qfcc will take
a bit longer because it seems there are some design issues in address
expressions (ambiguity, and a few other things) that have pretty much
always been there.
PR_SetupParams is new and sets up the parameter pointers so older code
that expects only up to 8 parameter will work with both v6p and Ruamoko
progs without having to check what progs are running. PR_SetupParams is
useful even when Ruamoko progs are expected as it reserves the required
space (respecting alignment) on the stack and returns a pointer to the
top (bottom? confusing) of the stack. PR_PushFrame and PR_PopFrame
need to be used around PR_SetupParams, regardless of using temp strings,
to avoid a stack leak (need to do an audit).
This is part of the work for #26 (Record resource pointer with builtin
function data). Currently, the data pointer gets as far as the
per-instance VM function table (I don't feel like tackling the job of
converting all the builtin functions tonight). All the builtin modules
that register a resources data block pass that block on to
PR_RegisterBuiltins.
The builtin and progs function data is overlaid so the extra data
doesn't cause too much memory to be used (it's actually 8 bytes smaller
now). The plan is to pre-compute the offsets based on the parameter
size and alignment data.