Having to remember to copy yet another specifier bit was getting
tedious, so use a union of a struct with the bitfields and an unsigned
int to access them in parallel. Makes for a tidier spec_merge, and one
less headache.
The command line option works the same way as
--advanced/traditional/extended, as does the pragma. As well, raumoko
(alternative spelling) can be used because both are legitimate and some
people may prefer one spelling over the other.
As always, use of the pragma is at one's own risk: its intended use is
forcing the target in the unit tests.
dvec4, lvec4 and ulvec4 need to be aligned to 8 words (32 bytes) in
order to avoid hardware exceptions. Rather than dealing with possibly
mixed alignment when a function has 8-word aligned locals but only
4-word aligned parameters, simply keep the stack frame 8-word aligned at
all times.
As for sizes, the temp def recycler was written before the Ruamoko ISA
was even a pipe dream and thus never expected temp def sizes over 4. At
least now any future adjustments can be done in one place.
My quick and dirty test program works :)
dvec4 xy = {1d, 2d, 0d, 0.5};
void printf(string fmt, ...) = #0;
int main()
{
dvec4 u = {3, 4, 3.14};
dvec4 v = {3, 4, 0, 1};
dvec4 w = v * xy + u;
printf ("[%g, %g, %g, %g]\n", w[0], w[1], w[2], w[3]);
return 0;
}
They're now properly part of the type system and can be used for
declaring variables, initialized (using {} block initializers), operated
on (=, *, + tested) though much work needs to be done on binary
expressions, and indexed. So far, only ivec2 has been tested.
When possible, of course. However, this tightens up struct and constant
index array accesses, and avoids issues with flow analysis losing track
of the def (such trucking is something I want to do, but haven't decided
out to get the information out to the right statements).
Since address expressions always product a pointer type, aliasing one to
another pointer type is redundant. Instead, simply return an address
expression with the desired type.
The FIXME was there because I couldn't remember why the test was
type_compatible but the internal error complains about the types being
the same size. The compatibility check is to see if the op can be used
directly or whether a temp is required. The offset check is because
types that are the same size (which they must be if they are
compatible) is because it is not possible to create an offset alias def
that escapes the bounds of the real def, which any non-zero offset will
do if the types are the same size.
This is the intended purpose of the offset field in address expressions,
and will make struct and array accesses more efficient when I sort out
the code generation side.
Ruamoko passes va_list (@args) through the ... parameter (as such), but
IMP uses ... to defeat parameter type and count checking and doesn't
want va_list. While possibly not the best solution, adding a no_va_list
flag to function types and skipping ex_args entirely does take care of
the problem without hard-coding anything specific to IMP.
The system currently just sets some bits in the type specifier (the
attribute list should probably be carried around with the specifier),
but it gets the job done for now, and at least gets things started.
This makes it much easier to check (and more robust to name changes),
allowing for effectively killing the node to which the variable being
addressed is attached. This fixes the incorrect address being used for
va_list, which is what caused double-alias to fail.
In order to not waste instructions, the Ruamoko ISA does not provide 1
and 2 component 64-bit load/store instructions since they can be
implemented using 2 and 4 component 32-bit instructions (load and store
are independent of the interpretation of the data). This fixes the
double test, and technically the double-alias test, but it fails due to
a problem with the optimizer causing lea to use the wrong reference for
the address. It also breaks the quaternion test due to what seems to be
a type error that may have been lurking for a while, further
investigation is needed there.
Since the call instruction in the Ruamoko ISA specifies the destination
of the return value of the called function, it is much like any
expression type instruction in that the def referenced by its c operand
is both defined and killed by the instruction. However, unlike other
instructions, it really has many pseudo-operands: the arguments placed
on the stack. The problem is that when one of the arguments is also the
destination of the return value, the dags code wants to use the stack
argument as it was the last use of the real argument. Thus, instead of
using the value of the child node for the result, use the value label
attached to the call node (there should be only one such label).
This fixes iterfunc, typedef, zerolinker and vkgen when optimizing. Now
all but the double tests and return postop tests pass (and the retun
postop test is not related to the Ruamoko ISA, so fails either way).
I really need to come up with a better way to get the result type into
the flow analyser. However, this fixes the aliasing ICE when optimizing
Ruamoko code that uses struct assignment.
Many math instructions don't care about the difference between signed
and unsigned operands and are thus specified using int, but need to be
usable with uint. div is NOT mapped because there is a difference:
0x8000 / 2 (16-bit) is 0x4000 unsigned but 0xc000 signed, and 0x8000 /
0xfffe is 0 unsigned and 0x4000 signed. This means I'll need to add some
more instructions. Not sure what to do about % and %% though as that's a
lot of instructions (12).
Thanks to the size of the type encoding being explicit in the encoding,
anything that tries to read the encodings without expecting the width
will simply skip over the width, as it is placed after the ev type in
the encoding.
Any code that needs to read both the old encodings and the new can check
the size of the basic encodings to see if the width field is present.
With explicit operators, even. While they're a tad verbose, they're at
least unambiguous and most importantly have the right precedence (or at
least adjustable precedence if I got it wrong, but vector ops having
high precedence than scalar or component seems reasonable to me).
I don't remember why I did this originally, but it causes the dags code
to lose the offset temp alias when accessing fields on structural temps
(known to be the case for vectors (temp-component.r), and I seem to
remember having problems with structs).
Since Ruamoko progs must use lea to get the address of a local variable,
add use/def/kill references to the move instruction in order to inform
flow analysis of the variable since it is otherwise lost via the
resulting pointer (not an issue when direct var reference move can be
used).
The test and digging for the def can probably do with being more
aggressive, but this did nicely as a proof of concept.
This code now reaches into one level of the expression tree and
rearranges the nodes to allow the constant folder to do its things, but
only for ints, and only when the folding is trivially correct (* and *,
+/- and +/-). There may be more opportunities, but these cover what I
needed for now and anything more will need code generation or smarter
tree manipulation as things are getting out of hand.
It now addressing_mode cleaning up store instructions to use ptr+offset
instead of lea;store ptr...
Entity.field addressing has been impelmented as well.
Move instructions still generate sub-optimal code in that they use an
add instruction instead of lea.
This allows the code handling simple pointer dereferences to recurse
along an alias chain that resulted from casting between different
pointer types (such chains could probably be eliminated by replacing the
type in the original pointer expression, but it wasn't worth it at this
stage).
Aliasing an alias expression to the same type as the original aliased
expression is a no-op, so drop the alias entirely in order to simplify
code generation.
Simply dereferencing a pointer does not need to go through array_expr
and thus collect a 0 offset that will only be constant-folded out again.
Really just a minor optimization in qfcc, but at one stage in today's
modification, it resulted in some unwanted aliasing chains.
While this does make the generated code a little worse, load is behaving
nicely), the two are at least consistent with each other and when I fix
one, I'll fix both. I missed this change the other day when I did the
address_expr cleanup. Yay near-duplicate code :P