The merge_blocks function wasn't reporting whether it had done anything
so the thread/merge/dead blocks loop was bailing early. With this,
simple functions (ie, no control flow) are fully visible to the CSE
optimizer and it can get quite aggressive (removed 3 assignments and a
cross product from my barycenter test code).
Because the aliases were treated as live, every alias of a temp resulted
in an assignment, which proved to be quite significant (4-5 assignments
in some simple GA expressions). By using an alias node in the dag, the
unaliased temp can be marked live while the alias is treated as an
operation rather than an operand. Now my GA expressions have no
superfluous assignments (generally no assignments at all).
That was surprisingly harder than expected due to recursion and a
not-so-good implementation in expr_negate (it went too high-level thus
resulting in multivec expressions getting to the code generator).
It turned out they were always using floats for the source type (meaning
doubles were broken), and not shifting the component in the final sizzle
code meaning all swizzles were ?xxx (neglecting minus or 0). I'd make
tests, but I plan on modifying the instruction set a little bit.
Due to joys of pointers and the like, it's a bit of a bolt-on for now,
but it works nicely for basic math ops which is what I wanted, and the
code is generated from the expression.
I never liked it, but with C2x coming out, it's best to handle bools
properly. I haven't gone through all the uses of int as bool (I'll leave
that for fixing when I encounter them), but this gets QF working with
both c2x (really, gnu2x because of raw strings).
Like defs, a partial write should not define the whole temp. Thus, copy
the "don't visit main" behavior recently added to def_visit_all. Fixes
missing ud-chains for component-by-component assignments to temporary
vectors.
Def and kill are still handled in flow_analyze_statement, but this makes
call meta data more consistent between v6 and ruamoko progs, allowing
the statement use chain to be used for call argument analysis. It even
found a bug in the extraction of param counts from the call instruction.
That is, `array + offset`. This actually works around the bug
highlighted by arraylife.r (because the array is explicitly used), but
is not a proper solution, so that test still fails of course. However,
with this, it's no longer necessary to use `&array[index]` instead of
`array + index`.
While the option to make '*' mean dot product for vectors is important,
it breaks vector scaling in ruamoko progs as the resultant vector op
becomes a dot product instead of the indented hadamard product (ie,
component-wise).
For now, anyway, as the generated code looks good. There might be
problems with actual pointer expressions, but it allows entity.field to
work as expected rather than generate an ICE.
While swizzle does work, it requires the source to be properly aligned
and thus is not really the best choice. The extend instruction has no
alignment requirements (at all) and thus is much better suited to
converting a scalar to a vector type.
Fixes#30
Most were pretty easy and fairly logical, but gib's regex was a bit of a
pain until I figured out the real problem was the conditional
assignments.
However, libs/gamecode/test/test-conv4 fails when optimizing due to gcc
using vcvttps2dq (which is nice, actually) for vector forms, but not the
single equivalent other times. I haven't decided what to do with the
test (I might abandon it as it does seem to be UD).
The destination operand must be a full four component vector, but the
source can be smaller and small sources do not need to be aligned: the
offset of the source operand and the swizzle indices are adjusted. The
adjustments are done during final statement emission in order to avoid
confusing the data flow analyser (and that's when def offsets are known).
While the code would handle int vector types, there aren't any such
instructions, and the expression code shouldn't generate them, but all
float (32 and 64 bit) vector types do have a dot product instruction, so
check width rather than just vector/quaternion.
Having three very similar sets of code for outputting values (just for
debug purposes even) got to be a tad annoying. Now there's only one, and
in the right place, too (with the other value code).
Use with quaternions and vectors is a little broken in that
vec4/quaternion and vec3/vector are not the same types (by design) and
thus a cast is needed (not what I want, though). However, creating
vectors (that happen to be int due to int constants) does seem to be
working nicely otherwise.
When the def can be found. This fixes direct assignments to arrays (and
probably structs) getting lost when the array is later read using a
variable index.
This is achieved by marking a void function with the void_return
attribute and then calling that function in an @return expression.
@return can be used only inside a void function and only with void
functions marked with the void_return attribute. As this is intended for
Objective-QC message forwarding, it is deliberately "difficult" to use
as returning a larger than expected value is unlikely to end well for
the calling function.
However, as a convenience, "@return nil" is allowed (in a void
function). It always returns an integer (which, of course,can be
interpreted as a pointer). This is safe because if the return value is
ignored, it will go into the progs return buffer, and if it is not
ignored, it is the smallest value that can be returned.
When possible, of course. However, this tightens up struct and constant
index array accesses, and avoids issues with flow analysis losing track
of the def (such trucking is something I want to do, but haven't decided
out to get the information out to the right statements).
The FIXME was there because I couldn't remember why the test was
type_compatible but the internal error complains about the types being
the same size. The compatibility check is to see if the op can be used
directly or whether a temp is required. The offset check is because
types that are the same size (which they must be if they are
compatible) is because it is not possible to create an offset alias def
that escapes the bounds of the real def, which any non-zero offset will
do if the types are the same size.
This makes it much easier to check (and more robust to name changes),
allowing for effectively killing the node to which the variable being
addressed is attached. This fixes the incorrect address being used for
va_list, which is what caused double-alias to fail.
I really need to come up with a better way to get the result type into
the flow analyser. However, this fixes the aliasing ICE when optimizing
Ruamoko code that uses struct assignment.
With explicit operators, even. While they're a tad verbose, they're at
least unambiguous and most importantly have the right precedence (or at
least adjustable precedence if I got it wrong, but vector ops having
high precedence than scalar or component seems reasonable to me).
Since Ruamoko progs must use lea to get the address of a local variable,
add use/def/kill references to the move instruction in order to inform
flow analysis of the variable since it is otherwise lost via the
resulting pointer (not an issue when direct var reference move can be
used).
The test and digging for the def can probably do with being more
aggressive, but this did nicely as a proof of concept.
It now addressing_mode cleaning up store instructions to use ptr+offset
instead of lea;store ptr...
Entity.field addressing has been impelmented as well.
Move instructions still generate sub-optimal code in that they use an
add instruction instead of lea.
This allows the code handling simple pointer dereferences to recurse
along an alias chain that resulted from casting between different
pointer types (such chains could probably be eliminated by replacing the
type in the original pointer expression, but it wasn't worth it at this
stage).
This is what using new_ret_expr would result in, but new_ret_expr is no
longer used for referencing .return (except in pascal, but I haven't
gotten around to sorting that out) due to the recent changes for Ruamoko
progs. Fixes an ICE when compiling (with optimization) something like
the following (dir is a vector):
dir /= sqrt (dir * dir);
return dir * speed;