The syntax is not at all correct at this stage (really, just a copy of
Ruamoko), but the keyword table exists (in the wrong place) and the
additional basic types (bool, bvecN and (d)matNxM) have been added.
Boolean base type is currently just int, and matrices have 0 width while
I think about what to use, but finally some progress after several
months' hiatus.
It's disabled by default because it's a runtime thing and I'm not sure I
want to keep it enabled, but it did find some issues (which I've cleaned
up), although it didn't find the problem I was looking for :P
This allows the dags code to optimize the return values, and when I make
the node killing by function calls less aggressive, should make for many
more potential CSE optimizations.
Offset casts are used heavily in geometric algebra, but doing an offset
cast on a dereferenced pointer (array index) doesn't work well for
assignments. Fixes yet another bug in my test scene :)
This takes care of ptrmove instructions, fixing the printf problem in
ptrstructinit. It might even make it possible to not break basic blocks
on function calls, which should make for some more optimization
opportunities, but that can come later.
For this to work properly, it was necessary to avoid converting alias
defs to st_alias nodes.
It also fixes ptrstructinit itself (at first, I hadn't noticed that the
test passed entirely).
It turns out the bug I was chasing in set_poses was not the data getting
lost or incorrect data being read, but the arguments to printf being set
from the parent pose `p` before `p` had actually been set.
This gets rid of the funny little self-loop edge in dags dot files that
has bothered me for a while. It was just because the node to which an
identifier was attached happened to be the parent of the identifier's
leaf node.
The fix in bdafdad0d5 for
`while (count--)` never did appeal to me. I think I understood the core
problem at the time, but I hadn't figured out how to use a var's
use/define sets to detect the write-before-read. Using them allows the
special handling for flow control to be removed, making things more
robust. The function call handling has been superfluous since the
Ruamoko instruction set required the auxiliary operands on the call
statements.
Not by much really, but using one "aux" count instead of different
types, and using functions to iterate through the statement's aux
operands makes the code easier to read.
Two birds with one stone: eliminates most of the problems with going
const-correct with expr_t, and it make dealing with internally generated
expressions at random locations much easier as the set source location
affects all new expressions created within that scope, to any depth.
Debug output is much easier to read now.
ptrmove was not treating its indirect source operand as used because the
pointer wasn't checked for its source.
This fixes part of ptrstructinit, but there's a lot more breakage in
that test: it looks like all sorts of fun with arrays.
There were a few places where some const-casts were needed, but they're
localized to code that's supposed to manipulate types (but I do want to
come up with something to clean that up).
When I implemented the st_alias handing (d8a78fc849) I was
unsure if I needed to unalias aliased defs too, but it turns out to have
been necessary: this is what caused my 2d PGA dynamics test to blow up
strangely due to the GA vector loaded from an array into a local
variable getting the local var replaced by a temp but the var itself
being read later in the code (uninitialized variable due to incorrect
optimization... oops).
When sum_expr gets a null expression as one of its args, it simply
returns the other arg, but that arg needs to have the correct type
applied.
Handle zero (null result) cross product in bivector geometric product.
Clean up types in bivector * vector geometric product.
I'm not sure the regressive product is right (overall sign), but that's
actually partly a problem in the math itself (duals and the regressive
product still get poked at, so it may be just a matter of
interpretation).
I'm not sure anything other than == or != has much meaning on anything
but scalars and pseudo scalars, but all comparisons are supported as a
simple boolean test. Any missing components are assumed to be 0. If
nothing else, it makes unit tests easier to write.
It turns out the algebra types make expression dag creation much more
difficult resulting in missed optimizations (eg, recognizing `a × a`).
This fixes the dead cross products in `⋆(s.B × ⋆s.B)`
The switch to using expression dags instead of trees meant that the
statement generator could traverse sub-expressions multiple times. This
is inefficient but usually ok if there are no side effects. However,
side effects and branches (usually from ?:, due to labels) break: side
effects happen more than once, and labels get emitted multiple times
resulting in orphaned statement blocks (and, in the end, uninitialized
temporaries).
Just running through the list of expressions in a block expression
results in label expressions within the block getting printed by
expressions that reference them and thus don't receive the correct next
pointer and wind up pointing to themselves. Printing the labels first
ensures they have the correct next pointer. However, I suspect there are
other ways things will get tangled.
I'm surprised it took almost two years to discover that I had no
quaternion multiplications in any test code, but getting an ICE for a
quaternion-vector product, and the Hadamard product for
quaternion-quaternion was a bit of a nasty surprise.
This makes a slight improvement to the commutator product in that it
removes the expand statement, but there's still the problem of (a+a)/2.
However, at least now the product is correct and slightly less abysmal.
This takes advantage of evaluate_constexpr to do all the work. Necessary
for use of basis blade constants in algebra contexts (avoids an internal
error).
This fixes the upostop-- test by auto-casting implicit constants to
unsigned (and it gives a warning for signed-unsigned comparisons
otherwise). The generated code isn't quite the best, but the fix for
that is next.
Also clean up the resulting mess, though not properly. There are a few
bogus warnings, and the legit ones could do with a review.
It's actually good enough for building qfcc: it fails a couple of
obscure c23 preprocessor examples that can be ignored for now (the rest
of QF builds without any apparent issue).
The expansion is necessary for the final test in preproc-2.r, but breaks
preproc-1.r because the closing ')' is *not* visible to collect_args
(its assumption is incorrect). This needs reworking (and probably
rethinking) of the entire macro argument collection, but I need a little
break from the preprocessor (and it's good enough for *most* uses), so
I'm adding the code (disabled) in order to avoid losing it and my notes
about the problem.
That is, if anything other than '(' (even a macro/argument that expands
to '(')is seen while checking for a function-type macro, the expansion
fails. This gets preproc-1.r working properly.
It really affected only collect_args, but in fixing that I noticed I was
very inconsistent with scanner's type (should be yyscan_t or void *, but
not yyscan_t *).
There's no guarantee the source file is in a writable directory (in
fact, it is very definitely in a read-only directory when running
`make distcheck`). However, it is reasonable to assume the output file
is being written to a writable directory thus default the object file
directory to that of the output file, but still use the source file's
name for the object file name.
Fixes#51
This gets most of the second preprocessor test working, apparently just
some problems with macro arguments not getting expanded for ## (unless
there's more lurking, of course, which I know there is for __LINE__).
It just feels cleaner than unnecessarily copying token chains. It turns
out that the core problem was just order of operations in next_token:
moving the pending_macro code to after arg/macro detection seems to be
correct (even bare `G LPAREN() 0)` is *not* expanding `G`, as expected).
It turned out I had simply forgotten to ensure the token chains were
properly terminated (the struct copy would copy the next of the source
token and thus macro args always expanded to the last token of the
parent macro). And then I'd missed saving the token text when parsing
predefined macros. __VA_OPT__ is still a problem, but this work was for
making that a little easier.
I got tired of the way the separate token types for macro expansion and
the rest of the preprocessor parser were handled. This makes them a
little more unified. Macro expansion seems to be slightly broken again
in that min/max/bound mess up badly, and __VA_OPT__ does things in the
wrong order, but I wanted to get this in as a checkpoint.
__VA_ARGS__ seems to be working but __VA_OPT__ still needs a lot of work
for dealing with its expansions, but basic error checking and simple
expansions seem to work.
Macros now store their arguments and have a cursor pointing to the next
token to take from their expansion list. While not checked yet, this
will make avoiding recursive macro invocations much easier. More
importantly, it's a step closer to correct argument expansion (though
token pasting is currently broken).
This makes it much easier to keep track of end of file in a conditional
block (#if...#endif) as #include in non-suppressed code would result in
spurious eof errors otherwise. I'm a little concerned about correctness,
but everything seems to work and it should be right as suppressed
include directives do not change the state at all, and the suppressed is
its own flag not in the condition stack.
The op code needs to be set just before being passed to the qc parser so
it doesn't get lost in macro expansion.
And vector values need to not be processed when recording otherwise they
get lost.
I'm not sure what exactly caused vector literals to break, but bailing
out of the vector ops section on conversion to vector or quaternion
fixes game-source.
Allow 32-bit positive values without a warning and warn on conversion
issues for float. The whole conversion system needs cleaning up for
v6/v6p/ruamoko.
-D options weren't counting correctly so build_cpp_args was writing past
the end of the array allocated for command line arguments
parse_cpp_name had an out-by-one resulting in reading past the end of
the string.
The qfcc system include path was being set in the wrong place (not sure
why I thought that was right), and not respecting no_default_paths.
-M was generating preprocessor output when it should not have been,
resulting in corrupted dependency files.
I don't remember why I couldn't get this to work last time I tried, but
it went well this time, and made a significant difference to compiling
vulkan.r (from 1.1s to 0.15s unoptimized or 0.8s optimized).
This fixes the problem with // comments after the file in #include and
the core problem the complicated // handling tried to fix with
suppressed directives. Funny how it's always the simpler code that works
better :/
I'm undecided about @ in macro names, but treating @id as one token in
the body is necessary with the single-pass tokenizing. Fixes an infinite
macro expansion loop in vecaddr.r (`#define dot @dot`), but that's
really only a bandaid for *that* issue as there are plenty of other
cases where macros will loop.
It turns out that the recursive lexing was over-complicated as the
tokens for nested macros need to come from the expanded stream, not the
raw input stream.
cpp does this for us so the double-generation is redundant (and
currently wrong anyway with things like <built-in> and <command-line>
getting into the list of dependencies).
Or at least mostly so. The __QFCC__ define isn't visible, and it seems
undef might not be working properly (ruamoko/lib/types.r doesn't
compile). Of course, there's still the issue of whether it's compiling
correctly.
If ID gets to the preprocessor parser in expressions, the ID is not
defined because if it was defined, it would have been expanded. Thus,
all IDs are 0.
In addition to cleaning up the old flex line rules, this improves
handling of the '# num "file" flags' from cpp to at least parse the
additional flags (support for the system header flag might come later,
but I doubt the extern-c flag will have much meaning).
QuakePascal has lost its line directive handling (no errors, but dead
rules) for now. Eventually the lexers will be merged.
I had wanted to do this earlier but shied away from the large edit. Now
it became more necessary (and will become even more necessary when I get
to the glsl front-end).
Really, function-type macros expand too, but incorrectly as the
parameters are not parsed and thus not expanded, but this gets the basic
handling implemented, including # and ## processing.
Converting ID and char constants too early resulted in poor handling of
keywords and spurious diagnostics about multi-byte character constants,
particularly with -E (preprocess-only)
Mostly white space, but a little more consistency in handling and remove
the use of REJECT (which does actually seem to make a difference, but it
could be just noise).
As far as I can tell, the preprocessor numbers conform with C23 except
for a couple of extensions (both ' and _ work for digit separators, and
d/D work for explicit doubles (since qfcc current defaults to float
instead of double)). This massively cleaned up the numeric rules and
even took care of some UB in the vector parsing code (I'm not sure which
is more surprising: that I didn't see it at the time, or that it was
blindingly obvious now).
This will be used for unifying preprocessing and parsing, the idea being
that the tokens will be recorded for later expansion via macros, without
the need to retokenize.
While my modified version is needed to actually avoid warnings (vs
upstream git flex), the files still work with debian's flex (with no
warnings). I needed to update (and fix) flex so the lexer line numbers
would be correct.
The improved location tracking isn't used yet, but was fairly invasive
on the bison-flex api.
It also cleans up some of the ancient workarounds for bad flex cores.
It turns out I need to create my own cpp in order to handle glsl's
directives. I've decided to make a unified lexer, and continuation lines
seemed a good place to start.
This fixes the really odd bug of certain string values getting swapped
in vkgen when DEBUG_QF_MEMORY was defined in expr.c. It will also
prevent a lot of fun with floats in the future, I imagine.
It's now meant only for ALLOC. Interestingly, when DEBUG_QF_MEMORY is
defined in expr.c, something breaks badly with vkgen (no sniffles out of
valgrind, though), but everything is fine with it not defined. It seems
there may be some unpleasant UB going on somewhere.
I'm not sure why this showed up now (I guess just not enough large
immediate values), but this fixes a segfault in the algtypes test (the
mystery is why it showed up this late).
This gets my `m * p * ~m` code as optimal as possible if my counting is
correct (this does not include the extra extends and add needed to merge
the values). Also, there might be a possibility of recombining some ops
into a vector op, but I'm happy with this.
That is, `x+x -> 2*x` (and similar for higher counts). Doesn't make much
difference for just 2, but it will make collecting scales easier and I
remember some testing showing that `2*x` is faster than `x+x` for
floating point.
Of course, motor-point keeps bouncing around numerically :/
This fixes the motor-point.r test (ie, the sub-type field selector works
on mono-group types now). Still need to sort out something for scalars
(but I imagine that can work only in an @algebra context).
This allows them to be matched with cancelling factors. My fancy zero
test is now just that: a fancy zero:
typedef @algebra(float(3,0,1)) PGA;
typedef PGA.group_mask(0xa) bivector_t;
typedef PGA.group_mask(0x1e) motor_t;
typedef PGA.tvec point_t;
typedef PGA.vec plane_t;
plane_t
apply_motor (motor_t m, point_t p)
{
return (m * p * ~m).vec;
}
0000 nop there were plums...
0001 adjstk 0, 0
apply_motor:
motor.r:32:{
0002 with 2, 0, 1
motor.r:33: return (m * p * ~m).vec;
0003 return (<void>)
The motor-point.r test fails because it uses (m * p * ~m).tvec to get
the value but the type system is slightly broken in that a mono-group
algebra type does not have a structure associated with it and thus the
"missing" field results in 0. Yes, I spent too long chasing that one,
too.
I'd missed this in the previous commit, which was a good thing, really,
as it turns out this was the trigger of the bug that causes my fancy
zero test to become non-zero. It seems the bug is in either
component_sum or in the extend merging.
Or really, the implementers. This gets my fancy zero test down to just
unrecognized permutations of commutative multiplies and dot products
(with the multiplies above the dot products).
While this does "explode" the instruction count (I'll have to collect
like terms later), it does allow for many more opportunities for things
to cancel out to 0 (once (pseudo)commutativity is taken care of).
That is, dot(scale(A,a),scale(B,b)) -> (a*b)*dot(A,B). Does the right
thing when only one side is a scale. No change to the instruction count
in my fancy zero, but it does open more opportunities when I distribute
products.
That is, scale(scale(A,b),c) becomes scale(A,b*c), thus giving the
expression dag more opportunities to find common sub-expressions. My
fancy zero test is down to 20 total instructions (including overhead, or
16 for the actual algebra).
While splitting up the scaled vector into scaled xyz and scaled w does
cost an extra instruction, it allows for other optimizations to be
applied. For one, extends get all the way to the top now, and there are
at most two (in my test cases), thus either break-even or even a slight
reduction in instruction count. However, in the initial implementation,
I forgot to do the actual scaling, and 12 instructions were removed from
my fancy zero case, but real tests failed :P It looks like it's just
distributivity and commutativity holding things back (eg,
omega*gamma*sigma - gamma*omega*sigma: should be 0, but not recognized
as that).
This fixes the motor test :) It turns out that every lead I had
previously was due to the disabling of that feature "breaking" dags
(such that expressions wouldn't be found) and it was the dagged
multi-vector components getting linked by expr->next that made a mess of
things.
Or at least mostly so (there are a few casts). This doesn't fix the
motor bug, but I've wanted to do this for over twenty years and at least
I know what's not causing the bug. However, disabling fold_constants in
expr_algebra.c does "fix" things, so it's still a good place to look.
This doesn't fix the motor bug, but it doesn't make it worse. However,
it does simplify the trees quite a bit, so it should be easier to debug.
It seems the problem has something to do with fold_constants messing up
dagged aliases: in particular, const-folding multiplication by e0123 in
3d PGA as fold_constants sees it as 1.
I'm not yet sure what went wrong, but the introduction of dags broke
something in my set_transform function (perhaps the dual?), but it's
something to do with the symbol being dagged (I guess because it's
required for everything else to dag). However, the strangest thing is
the error shows up with 155a8cbcda which
is before dags had any direct effect on the geometric algebra code. I
have a sneaking suspicion it's yet another convert_name issue.
They should be treated as such only when merging vector components. This
fixes a bug that doesn't actually exist (it's in experimental code),
where the sum of two 3-component vectors was getting lost.
For cross products: remove any a from a×(...+/-a...)
For dot products: remove any a×b from a•(...+/-a×b...) (or b×a)
This removed another 2 instructions :)
They don't have much effect that I've noticed, but the expression dags
code does check for commutative expressions. The algebra code uses the
anticommutative flag for cross, wedge and subtract (unconditional at
this stage). Integer ops that are commutative are always commutative (or
anticommutative). Floating point ops can be controlled (default to non),
but no way to set the options currently.
This takes advantage of the expression dag to detect when an expression
is on both sides of a cross product (which always results in 0). This
removes 3 instructions from my motor test (28 to go).
Especially binary expressions. That expressions can now be reused is
what caused the need to make expression lists non-invasive: the reuse
resulted in loops in the lists. This doesn't directly affect code
generation at this stage but it will help with optimizing algebraic
expressions.
The dags are per sequence point (as per my reading of the C spec).
The removal of the e tag from expr_t necessitated making convert_name
return a new expressions which resulted in get_type no longer being
enough to both convert a name expression and get the type. This was just
another missed spot. With this, all of game-source except ctf builds.
Finally, that little e. is cleaned up. convert_name was a bit of a pain
(in that it relied on modifying the expression rather than returning a
new one, or more that such behavior was relied on).
Sum expressions pull the negation through extend expressions allowing
them to switch to subtraction when appropriate, and offset_cast reaches
past negation to check for extend expressions. This has eliminated all
negation-only instructions from the motor-point, shaving off more
instructions (now 27 not including the return).
I don't know why I didn't think to do it this way before, but simply
recursing into each operand for + or - expressions makes it much easier
to generate correct code. Fixes the motor-point test.
That is, passing int constants through ... in Ruamoko progs is no longer
a warning (still is for v6p and v6 progs). I got tired of getting the
warning for sizeof expressions when int through ... hasn't been a
problem for even most v6p progs, and was intended to not be a problem
for Ruamoko progs.
But really only for memset and memmove because they need to use an int
alias of the variable and it may be only that alias that sets a much
larger variable.
This removes all the special cases and thus it should be more robust. It
did show up some out-by-one (or a factor of two!) errors elsewhere in
the group mask calculations.
I'm uncertain about the names, but this makes it much easier to get at
specific subtypes (eg, PGA.tvec as a type) rather than having to know
the group mask.
This makes working with the plethora of types a little easier still. The
check for an algebra expression in field_expr needed to be moved up
because the struct code thought the algebra type was a normal vector.
And vis-versa.
I'm not sure what I was thinking, but I've decided that not being able
to cast the pseudo-scalar from float to double (for printf etc) was a
bug.
The merge_blocks function wasn't reporting whether it had done anything
so the thread/merge/dead blocks loop was bailing early. With this,
simple functions (ie, no control flow) are fully visible to the CSE
optimizer and it can get quite aggressive (removed 3 assignments and a
cross product from my barycenter test code).
Failing to promote ints to the algebra type results in a segfault in
assignment of a multi-vector due to the symbol pointer walking off the
end of the list of symbols.
And convert addition to subtraction when extend expressions are not
involved. This has taken my little test down to 56 instructions total
(21 for `l p ~l`), down from 74 (39).
This takes care of chained sums of extend expressions. Now `l p ~l` has
only four extend instructions which is expected for the code not
detecting the cross product that always produces 0.
This goes a ways towards minimizing extend expressions, even finding
zeros and thus eliminating whole branches of expressions, but shows the
need for better handling of chained sums of extends.
Any geometric algebra product of two negatives cancels out the negative,
and if the result is negative (because only one operand was negative),
the negation is migrated to above the operation. This resulted in
removing 2 instructions from one if my mini-tests (went from 74 to 78
with the addition/subtraction change, but this takes it back to 76
instructions).
Summed extend expressions are used for merging a sub-vector with a
scalar. Putting the vector first in the sum will simplify checks later
on (it really doesn't matter which is first so long as it's consistent).
Subtraction is implemented as adding a negative (with the plan of
optimizing it later). The idea is to give tree inspection and
manipulation a more consistent view without having to worry about
addition vs subtraction.
Negation is moved as high as possible in the expression, but is always
below an extend expression. The plane here is that the manipulation code
can bypass an alias-add-extend combo and see the negation.
This doesn't affect the generated code (aliases are free), but does
simplify the dag significantly, thus optimizing the compiler somewhat,
but also makes reading dags much easier and therefore optimizing the
debugging process.
Because the aliases were treated as live, every alias of a temp resulted
in an assignment, which proved to be quite significant (4-5 assignments
in some simple GA expressions). By using an alias node in the dag, the
unaliased temp can be marked live while the alias is treated as an
operation rather than an operand. Now my GA expressions have no
superfluous assignments (generally no assignments at all).
Simple k-vectors don't use structs for their layout since they're just
an array of scalars, but having the structs for group sets or full
multi-vectors makes the system alignment agnostic.