or 512kW (kilowatts? :P). Barely enough for vkgen to run (it runs out if
auto release is run during scan_types, probably due to fragmentation). I
imagine I need to look into better memory management schemes, especially
since I want to make zone allocations 64-byte aligned (instead of the
current 8). And it doesn't help that 16 words per allocation are
dedicated to the zone management.
Anyway, with this, vgken runs and produces sufficiently correct results
for the rest of QF to build, so long as qfcc is not optimizing.
Since Z_Malloc uses Z_TagMalloc to do the work, this ensures the check
is always run.
Also, add the check to Z_Realloc when it needs to adjust an existing
block.
Builtins that call progs with parameters now must always wrap the call
to PR_ExecuteProgram so that the data stack is properly preserved across
the call.
I need to do an audit of all the calls to PR_ExecuteProgram.
It doesn't do much good for dynamic progs memory because zone currently
aligns to 8 bytes (oops, forgot to fix that), but at least the stack and
globals are properly aligned.
The ABI for the Ruamoko ISA set currently puts va_list into the
parameter stream whenever a varargs function is called. Unfortunately,
IMP is declared with ... and thus cannot be used directly unless all
methods become varargs, but I think that would cause even more
headaches.
It turns out the return pointer still needs to be saved even when a
builtin sets up a chain call to progs, but rather than the pointer being
simply restored, it needs to be saved in the call stack exactly as if
the function was called directly by progs. This fixes the invalid self
issue quite thoroughly: parameter state seems to be correct across all
calls now.
I should set up an automated test now that I know and understand the
situation.
In Ruamoko ISA progs, the param pointers point to the stack and
generally must most be manipulated by builtins, and there is no need
anyway as Ruamoko doesn't have RCALL. Fixes the mangling of .super.
When calling a builtin, normally the return pointer needs to be
restored, but if the builtin changes the call depth (usually by
effecting "return foo()" as in support for objects, but possibly
setjmp/longjmp when they are implemented), then the return pointer must
not be restored. This gets vkgen past object allocation, but it dies
when trying to send messages to super. This appears to be a compiler
bug.
Many math instructions don't care about the difference between signed
and unsigned operands and are thus specified using int, but need to be
usable with uint. div is NOT mapped because there is a difference:
0x8000 / 2 (16-bit) is 0x4000 unsigned but 0xc000 signed, and 0x8000 /
0xfffe is 0 unsigned and 0x4000 signed. This means I'll need to add some
more instructions. Not sure what to do about % and %% though as that's a
lot of instructions (12).
Since the operand types sort out the difference between asr and shr, no
need to give them different opnames. Means qfcc doesn't need to worry
about which one it's searching for.
Yet another redundant addressing mode (since ptr + 0 can be used), so
replace it with a variable-indexed array (same as in v6p). Was forced
into noticing the problem when trying to compile Machine.r.
Thanks to the size of the type encoding being explicit in the encoding,
anything that tries to read the encodings without expecting the width
will simply skip over the width, as it is placed after the ev type in
the encoding.
Any code that needs to read both the old encodings and the new can check
the size of the basic encodings to see if the width field is present.
It's full of evil hacks, but has always been an evil hack relying on
undefined behavior. The weird shenanigans with local variables are
because Ruamoko doesn't copy the parameters like v6p does and thus v and
z are NOT adjacent as parameters. Worse, the padding is uninitialized
and thus should not be relied upon to be any particular value. Still
does a nice job of testing dot products, though.
With explicit operators, even. While they're a tad verbose, they're at
least unambiguous and most importantly have the right precedence (or at
least adjustable precedence if I got it wrong, but vector ops having
high precedence than scalar or component seems reasonable to me).
I don't remember why I did this originally, but it causes the dags code
to lose the offset temp alias when accessing fields on structural temps
(known to be the case for vectors (temp-component.r), and I seem to
remember having problems with structs).
While it specifically checks vectors, I'm pretty sure it applies to
structs, too. Also, it's a little redundant with vecaddr.r, but is much
more specific and far less evil in what it does (no horrible pointer
shenanigans): just something that is fairly common practice.
I abandoned the reason for doing it (adding a pile of vector types), but
I liked the cleanup. All the implementations are hand-written still, but
at least the boilerplate stuff is automated.
Since Ruamoko progs must use lea to get the address of a local variable,
add use/def/kill references to the move instruction in order to inform
flow analysis of the variable since it is otherwise lost via the
resulting pointer (not an issue when direct var reference move can be
used).
The test and digging for the def can probably do with being more
aggressive, but this did nicely as a proof of concept.
This code now reaches into one level of the expression tree and
rearranges the nodes to allow the constant folder to do its things, but
only for ints, and only when the folding is trivially correct (* and *,
+/- and +/-). There may be more opportunities, but these cover what I
needed for now and anything more will need code generation or smarter
tree manipulation as things are getting out of hand.
It now addressing_mode cleaning up store instructions to use ptr+offset
instead of lea;store ptr...
Entity.field addressing has been impelmented as well.
Move instructions still generate sub-optimal code in that they use an
add instruction instead of lea.
This allows the code handling simple pointer dereferences to recurse
along an alias chain that resulted from casting between different
pointer types (such chains could probably be eliminated by replacing the
type in the original pointer expression, but it wasn't worth it at this
stage).
Aliasing an alias expression to the same type as the original aliased
expression is a no-op, so drop the alias entirely in order to simplify
code generation.
Simply dereferencing a pointer does not need to go through array_expr
and thus collect a 0 offset that will only be constant-folded out again.
Really just a minor optimization in qfcc, but at one stage in today's
modification, it resulted in some unwanted aliasing chains.
While this does make the generated code a little worse, load is behaving
nicely), the two are at least consistent with each other and when I fix
one, I'll fix both. I missed this change the other day when I did the
address_expr cleanup. Yay near-duplicate code :P