dvec4, lvec4 and ulvec4 need to be aligned to 8 words (32 bytes) in
order to avoid hardware exceptions. Rather than dealing with possibly
mixed alignment when a function has 8-word aligned locals but only
4-word aligned parameters, simply keep the stack frame 8-word aligned at
all times.
As for sizes, the temp def recycler was written before the Ruamoko ISA
was even a pipe dream and thus never expected temp def sizes over 4. At
least now any future adjustments can be done in one place.
My quick and dirty test program works :)
dvec4 xy = {1d, 2d, 0d, 0.5};
void printf(string fmt, ...) = #0;
int main()
{
dvec4 u = {3, 4, 3.14};
dvec4 v = {3, 4, 0, 1};
dvec4 w = v * xy + u;
printf ("[%g, %g, %g, %g]\n", w[0], w[1], w[2], w[3]);
return 0;
}
They're now properly part of the type system and can be used for
declaring variables, initialized (using {} block initializers), operated
on (=, *, + tested) though much work needs to be done on binary
expressions, and indexed. So far, only ivec2 has been tested.
When possible, of course. However, this tightens up struct and constant
index array accesses, and avoids issues with flow analysis losing track
of the def (such trucking is something I want to do, but haven't decided
out to get the information out to the right statements).
Since address expressions always product a pointer type, aliasing one to
another pointer type is redundant. Instead, simply return an address
expression with the desired type.
The FIXME was there because I couldn't remember why the test was
type_compatible but the internal error complains about the types being
the same size. The compatibility check is to see if the op can be used
directly or whether a temp is required. The offset check is because
types that are the same size (which they must be if they are
compatible) is because it is not possible to create an offset alias def
that escapes the bounds of the real def, which any non-zero offset will
do if the types are the same size.
This is the intended purpose of the offset field in address expressions,
and will make struct and array accesses more efficient when I sort out
the code generation side.
Ruamoko passes va_list (@args) through the ... parameter (as such), but
IMP uses ... to defeat parameter type and count checking and doesn't
want va_list. While possibly not the best solution, adding a no_va_list
flag to function types and skipping ex_args entirely does take care of
the problem without hard-coding anything specific to IMP.
The system currently just sets some bits in the type specifier (the
attribute list should probably be carried around with the specifier),
but it gets the job done for now, and at least gets things started.
This makes it much easier to check (and more robust to name changes),
allowing for effectively killing the node to which the variable being
addressed is attached. This fixes the incorrect address being used for
va_list, which is what caused double-alias to fail.
In order to not waste instructions, the Ruamoko ISA does not provide 1
and 2 component 64-bit load/store instructions since they can be
implemented using 2 and 4 component 32-bit instructions (load and store
are independent of the interpretation of the data). This fixes the
double test, and technically the double-alias test, but it fails due to
a problem with the optimizer causing lea to use the wrong reference for
the address. It also breaks the quaternion test due to what seems to be
a type error that may have been lurking for a while, further
investigation is needed there.
Since the call instruction in the Ruamoko ISA specifies the destination
of the return value of the called function, it is much like any
expression type instruction in that the def referenced by its c operand
is both defined and killed by the instruction. However, unlike other
instructions, it really has many pseudo-operands: the arguments placed
on the stack. The problem is that when one of the arguments is also the
destination of the return value, the dags code wants to use the stack
argument as it was the last use of the real argument. Thus, instead of
using the value of the child node for the result, use the value label
attached to the call node (there should be only one such label).
This fixes iterfunc, typedef, zerolinker and vkgen when optimizing. Now
all but the double tests and return postop tests pass (and the retun
postop test is not related to the Ruamoko ISA, so fails either way).
I really need to come up with a better way to get the result type into
the flow analyser. However, this fixes the aliasing ICE when optimizing
Ruamoko code that uses struct assignment.
Many math instructions don't care about the difference between signed
and unsigned operands and are thus specified using int, but need to be
usable with uint. div is NOT mapped because there is a difference:
0x8000 / 2 (16-bit) is 0x4000 unsigned but 0xc000 signed, and 0x8000 /
0xfffe is 0 unsigned and 0x4000 signed. This means I'll need to add some
more instructions. Not sure what to do about % and %% though as that's a
lot of instructions (12).
Thanks to the size of the type encoding being explicit in the encoding,
anything that tries to read the encodings without expecting the width
will simply skip over the width, as it is placed after the ev type in
the encoding.
Any code that needs to read both the old encodings and the new can check
the size of the basic encodings to see if the width field is present.
With explicit operators, even. While they're a tad verbose, they're at
least unambiguous and most importantly have the right precedence (or at
least adjustable precedence if I got it wrong, but vector ops having
high precedence than scalar or component seems reasonable to me).
I don't remember why I did this originally, but it causes the dags code
to lose the offset temp alias when accessing fields on structural temps
(known to be the case for vectors (temp-component.r), and I seem to
remember having problems with structs).
Since Ruamoko progs must use lea to get the address of a local variable,
add use/def/kill references to the move instruction in order to inform
flow analysis of the variable since it is otherwise lost via the
resulting pointer (not an issue when direct var reference move can be
used).
The test and digging for the def can probably do with being more
aggressive, but this did nicely as a proof of concept.
This code now reaches into one level of the expression tree and
rearranges the nodes to allow the constant folder to do its things, but
only for ints, and only when the folding is trivially correct (* and *,
+/- and +/-). There may be more opportunities, but these cover what I
needed for now and anything more will need code generation or smarter
tree manipulation as things are getting out of hand.
It now addressing_mode cleaning up store instructions to use ptr+offset
instead of lea;store ptr...
Entity.field addressing has been impelmented as well.
Move instructions still generate sub-optimal code in that they use an
add instruction instead of lea.
This allows the code handling simple pointer dereferences to recurse
along an alias chain that resulted from casting between different
pointer types (such chains could probably be eliminated by replacing the
type in the original pointer expression, but it wasn't worth it at this
stage).
Aliasing an alias expression to the same type as the original aliased
expression is a no-op, so drop the alias entirely in order to simplify
code generation.
Simply dereferencing a pointer does not need to go through array_expr
and thus collect a 0 offset that will only be constant-folded out again.
Really just a minor optimization in qfcc, but at one stage in today's
modification, it resulted in some unwanted aliasing chains.
While this does make the generated code a little worse, load is behaving
nicely), the two are at least consistent with each other and when I fix
one, I'll fix both. I missed this change the other day when I did the
address_expr cleanup. Yay near-duplicate code :P
This is what using new_ret_expr would result in, but new_ret_expr is no
longer used for referencing .return (except in pascal, but I haven't
gotten around to sorting that out) due to the recent changes for Ruamoko
progs. Fixes an ICE when compiling (with optimization) something like
the following (dir is a vector):
dir /= sqrt (dir * dir);
return dir * speed;
It turns out the sorting wasn't working properly and I've decided that
anything that actually needs the defs to be sorted by address (such as a
debugger searching for defs by address) can do the sorting itself. Fixes
a weird swapping of def names.
This is necessary to get statement disassembly working, and likely
debugging in general. locals is the total size of the stack frame and
thus reaches above the function-entry stack pointer, and params_start is
the local space relative start of the parameters. Thus, knowing the
function-entry stack pointer, the bottom of the locals space can be
found by subtracting params_start, and the top of the locals space by
adding (locals - params_start).
This gets all the sections of the progs file nicely aligned and the code
easier to read with the offset and size calculations not being spread
through the function. ivar-struct-return now works when compiled for
Ruamoko.
This cleans up dprograms_t, making it easier to read and see what chunks
are in it (I was surprised to see only 6, the explicit pairs made it
seem to have more).
While I think the reason the dags code moved an instruction before
adjstk and with was they shared a constant with that instruction (which
is a different bug), this ensures other instructions cannot get
reordered in front of adjstk and with, as doing so would cause any such
instructions to access incorrect data.
The goal was to get lea being used for locals in ruamoko progs because
lea takes the base registers into account while the constant pointer
defs used by v6p cannot. Pointer defs are still used for gobals as they
may be out of reach of 16-bit addressing.
address_expr() has been simplified in that it no longer takes an offset:
the vast majority of the callers never passed one, and the few that did
have been reworked to use other mechanisms. In particular,
offset_pointer_expr does the manipulations needed to add an offset
(unscaled by type size) to a pointer. High-level pointer offsets still
apply a scale, though.
Alias expressions now do a better job of hanling aliasing of aliases by
simply replacing the target type when possible.
It's possible I lost the child printing when creating the return
expressions, but dot diagrams are much more useful when they don't have
nodes with just pointer values.
The parameter defs are allocated from the parameter space using a
minimum alignment of 4, and varargs functions get a va_list struct in
place of the ...
An "args" expression is unconditionally injected into the call arguments
list at the place where ... is in the list, with arguments passed
through ... coming after the ...
Arguments get through to functions now, but there's problems with taking
the address of local variables: currently done using constant pointer
defs, which can't work for the base register addressing used in Ruamoko
progs.
With the update to test-bi's printf (and a hack to qfcc for lea),
triangle.r actually works, printing the expected results (but -1 instead
of 1 for equality, though that too is actually expected). qfcc will take
a bit longer because it seems there are some design issues in address
expressions (ambiguity, and a few other things) that have pretty much
always been there.
The aux use ops need to be counted and given nodes explicitly as they
may refer to defs that are not accessed by other statements other than
by aliases, and those aliases need to be marked live as well as the used
def.
The problem was a missed change when switching the internal statement
format to Ruamoko: I "used" the statement's operands directly rather
than the rotated ones when emitting v6p progs. Fixes a compile segfault
when NOT optimizing.
While all base registers can be used for any purpose at any time (this
is why the with instruction has hard-absolute modes: you can never get
permanently lost), qfcc currently uses the convention of register 0 for
globals and register 1 for stack locals (params, locals, function args).
The register used to access a def is stored in the def and that is used
to set the register bits in the instruction opcode.
The def code actually doesn't know anything about any conventions: it
assumes all defs are global for non-temp defs (the function code updates
the defs before emitting code) and the current function provides the
register to use for any temp defs allocated while emitting code.
Seems to work well, but debug is utterly messed up (not surprised, that
will be tricky).
Still need to get the base register index into the instructions, but I
think this is it for basic code generation. I should be able to start
testing Ruamoko properly fairly soon :)
Thanks to the use/def/kill lists attached to statements for pseudo-ops,
it turned out to be a lot easier to implement flow analysis (and thus
dags processing) than I expected. I suspect I should go back and make
the old call code use them too, and probably several other places, as
that will greatly simplify the edge setting.
The means that the actual call expression is not in the statement lint
of the enclosing block expression, but just its result, whether the call
is void or not. This actually simplifies several things, but most
importantly will make Ruamoko calls easier to implement.
The test is because I had some trouble with double-calls, and is how I
found the return-postop issue :P
Since Ruamoko now uses the stack for parameters and locals, parameters
need to come after locals in the address space (instead of before, as in
v6 progs). Thus use separate spaces for parameters and locals regardless
of the target, then stitch them together appropriately for the target.
The third space is used for allocating stack space for arguments to
called functions. It us not used for v6 progs, and comes before locals
in Ruamoko progs.
Other than the return value, and optimization (ice, not implemented)
calls in Ruamoko look like they'll work.
Thanks to me having done something right 20 years ago, that was pretty
easy :). The two boolean types aren't supported yet because I haven't
decided on just how to represent their types in qfcc.
This seems to be the most reasonable approach to allocating space for
function call parameters without using push and pop (or adding to the
stack pointer), though it's probably good even when using push and pop
to help keep things aligned.
My little test program now builds with the Ruamoko ISA :)
void cp (int *dst, int *src, int count)
{
while (count--) {
*dst++ = *src++;
}
}
Calls are broken (unimplemented), and non-void returns are not likely to
work either (only partially implemented).
Operand width is encoded in the instruction opcode, so the width needs
to be accounted for in order to select the correct instruction. With
this, my little test generates correct code for the ruamoko ISA (except
for return, still fails).
For the most part, it wasn't too bad as it's just a rotation of the
operands for some instructions (store, assign, branch), but dealing with
all the direct accesses to specific operands was a small pain. I am very
glad I made all those automated tests :)
This makes the v6p instruction table consistent with the ruamoko
instruction table, and clears up some of the ugliness with the load,
store, and assign instructions (. .= and = are now spelled out). I think
I'd still prefer an enum code (faster) but at least this is more
readable.
Missed this case in duplicate_type. Allows "short foo" and
"sizeof(short)" (even though qfcc and the engine have two ideas of the
size: I expect trouble later).
long is ignored for double, and v6p progs are stuck with 32 bits for
longs (don't feel like extending v6p any further), but the basics are
there for Ruamoko.
short is ignored for ints because the minimum size is 32, and signed is
just noise for ints anyway (and no chars, so...).
unsigned, however, is finally implemented properly (or at least seems to
be working correctly: tests pass after getting things compiling again,
and lt.u is used where it should be :)
Attempting to add ev_ushort caused ptraliasenc to break, but that was
because it was already broken: I had implemented the scan of the xdef
table incorrectly, thus adding only 1 ev type resulted in the walked
pointer being out of phase with its data due to it first passing over
the type encodings (which is why adding long and ulong didn't cause any
obvious trouble).
And other related fields so integer is now int (and uinteger is uint). I
really don't know why I went with integer in the first place, but this
will make using macros easier for dealing with types.
They are both gone, and pr_pointer_t is now pr_ptr_t (pointer may be a
little clearer than ptr, but ptr is consistent with things like intptr,
and keeps the type name short).
I don't know why they were ever signed (oversight at id and just
propagated?). Anyway, this resulted in "unsigned" spreading a bit, but
all to reasonable places.
This includes calls and unconditional jumps, relative and through a
table. The parameters are all lumped into the one object, with some
being unused by the different types (eg, args and ret_type used only by
call expressions). Just having nice names for the parameters (instead of
e1 and e2) makes it nice, even with all the sub-types lumped together.
No mysterious type aliasing bugs this time ;)
The move operator names are definitely obsolete (due to dropping the
expressions a year or two ago) and the precedence checks seem to be
handled elsewhere. Memset and state expressions went away a while back
too.
While this was a pain to get working, that pain only went to prove the
value of using proper "types" (even if only an enum) for different
expression types: just finding all the places to edit was a chore, and
easy to make mistakes (forgetting bits here and there).
Strangely enough, this exposed a pile of *type* aliasing bugs (next
commit).
v6 vs v6p are more or less as before, with ruamoko added in. qfcc will
now try (and fail, due to the opcode table opnames being wrong) to
create ruamoko progs when given the ruamoko target option.
At this stage, I doubt emit.c will need to know the details of the
target (v6, v6p, ruamoko) since the instruction formats are identical,
just different meanings for the opcode itself.
This allows v6, v6p (older QF VM) or ruamoko (new QF VM) to be targeted.
Currently defaults to v6p to allow QF to continue building without too
much hassle.
While qfcc dealing sensibly with mixed target VMs in the object files
has always been an outstanding issue, with the new instruction set it
has become a priority. Most importantly, this should allow QF to
continue building while I work on qfcc targeting the new IS.
And partial implementations in qfcc (most places will generate an
internal error (not implemented) or segfault, but some low-hanging fruit
has already been implemented).
This allows the VM to select the right execution loop and qfcc currently
still produces only the old IS (it doesn't know how to deal with the new
IS yet)
build_struct was unconditionally setting the type's alignment. This was
not a problem before because no types were requesting alignments larger
than those requested by their members (for structs). However, with the
upcoming new instruction set, quaternions need to be 4-word aligned.
The opcode table is a nightmare to maintain, but this does clean it up
and speed up opcode lookups since they can now be indexed. Of course, it
turns out I had missed adding several instructions, so had to fix that,
and qfcc needed a bit of a re-jigger to get the opcode out of the table.
The assignment to the node's variable must come after any uses of that
node, which the node's parent set indicates. In the swap test, this was
not a problem as the node had no parents, and in the link order test, it
just happened(?) to work.
While using just the label node's reachable set was sufficient for a
simple swap (t = a; a = b; b = t;), it is not sufficient for
read-before-write dependencies such as found in linked-list building:
{ o = array[ind]; o.next = obj; obj = o; }
The assignment to o.next uses obj, but that use is hidden because obj's
reachable nodes does not include o thus assigning o to obj causes the
array dereference to be assigned directly to obj and thus o.next winds
up pointing to o instead of whatever obj was. The parent nodes of obj's
node are its users, so any new assigned to obj must come after those
parents as well as any node reachable by obj's node.
Fixes a runaway loop error when adding a frikbot to the server.
qfo_to_progs was modifying the space data pointers in the input qfo,
making it impossible to reuse the qfo. However, qfo_relocate_refs needs
the updated pointers, thus do a shallow copy of the qfo and its spaces
(but not any of the data)
build_builtin_function does the right thing, and it was only legacy
syntax functions that were affected anyway. Certainly, external
variables should not be initialized, but klik uses @extern { } wrapped
around several builtin functions and I had added the feature to allow
just this as it is rather convenient.
I decided that the check for whether control reaches the end of the
function without performing some necessary action (eg, invoking
[super dealoc] in a derived -dealoc) is conceptually the return
statement using a pseudo operand and the necessary action defining that
pseudo operand and thus is the same as checking for uninitialised
variables. Thus, add a pseudo operand type and use one to represent the
invocation of [super alloc], with a special function to call when the
"used" pseudo operand is "uninitialised".
While I currently don't know what else pseudo operands could be used
for, the system should be flexible enough to add any check.
Fixes#24
I want to use the function's pseudo address that was used for managing
aliased temporary variables for other pseudo operands as well. The new
name seems to better reflect the variable's purpose even without the
other pseudo operands as temporary variables are, effectively, pseudo
operands until they are properly allocated.
Forgetting to invoke [super dealloc] in a derived class's -dealloc
method has caused me to waste far too much time chasing down the
resulting memory leaks and crashes. This is actually the main focus of
issue #24, but I want to take care of multiple paths before I consider
the issue to be done.
However, as a bonus, four cases were found :)
While get_selector does the job of getting a selector from a selector
reference expression, I have long considered lumping various expression
types under ex_expr to be a mistake. Not only is this a step towards
sorting that out, it will make working on #24 easier.
After seeing set_size and thinking it redundant (thought it returned the
capacity of the set until I checked), I realized set_count would be a
much better name (set_count (node->successors) in qfcc does make much
more sense).
When moving an identifier label from one node to another, the first node
must be evaluated before the second node, which the edge guarantees.
However, code for swapping two variables
t = a; a = b; b = t;
creates a dependency cycle. The solution is to create a new leaf node
for the source operand of the assignment. This fixes the swap.r test
without pessimizing postop code.
This takes care of the core problem in #3, but there is still room for
improvement in that the load/store can be combined into a move.
This reverts commit 2fcda44ab0.
Killing the node is not the correcgt answer as it blocks many
optimization opportunities. The correct answer is adding edges to
describe the temporal dependencies. Of course, this breaks the swap.r
test.
In order to correctly handle swap-style code
{ t = a; a = b; b = t; }
edges need to be created for each of the assignments moving an
identifier lable, but the dag must remain acyclic (the above example
wants to create a cycle). Having the reachable nodes recorded makes
checking for potential loops a quick operation.
Identifiers can be constants. I don't remember quite what it fixed other
than some bogus kill relations in the dags (which might have caused
issues later).
If the src type is not a class, there is no inheritance chain to walk.
Fixes a segfault when returning self after a syntax error in the
following:
+(EditStatus *)withRect:(Rect)rect
{
return [[[self alloc] initWithRect:rect]:
}
-setCursorMode:(CursorMode)mode
{
cursorMode = mode;
return self;
}
The server edict arrays are now stored outside of progs memory, only the
entity data itself (ie data accessible to progs via ent.fld) is stored in
progs memory. Many of the changes were due to code accessing edicts and
entity fields directly rather than through the provided macros.
It now takes a context pointer (opaque data) that holds the buffers it
uses for the temporary strings. If the context pointer is null, a static
context is used (making those uses of va NOT thread-safe). Most calls to
va use the static context, but all such calls have been formatted
consistently so they are easy to find when it comes time to do a full
audit.
Block expressions hide ex_error, but get_type() always returns null when
it finds one (which it does by recursing into block expression), so just
check the type itself.
When a global variable is accessed via only an alias in a function the
actual def's flowvar would remain in the state it was from the last
function that accessed the global normally. This would result in invalid
flowvar accesses which can be difficult to reproduce (thus no test
case).
There's still some cleanup to do, but everything seems to be working
nicely: `make -j` works, `make distcheck` passes. There is probably
plenty of bitrot in the package directories (RPM, debian), though.
The vc project files have been removed since those versions are way out
of date and quakeforge is pretty much dependent on gcc now anyway.
Most of the old Makefile.am files are now Makemodule.am. This should
allow for new Makefile.am files that allow local building (to be added
on an as-needed bases). The current remaining Makefile.am files are for
standalone sub-projects.a
The installable bins are currently built in the top-level build
directory. This may change if the clutter gets to be too much.
While this does make a noticeable difference in build times, the main
reason for the switch was to take care of the growing dependency issues:
now it's possible to build tools for code generation (eg, using qfcc and
ruamoko programs for code-gen).
This should keep things nicely extensible, since additional data can be
done in the data space and found using defs. This gets the compilation
units into the sym file.
The compilation unit stores the directory from which qfcc was run and
any source files mentioned. This is similar to dwarf's compilation unit.
Right now, this is the only data in the new debug space, but more might
come in the future so it seems best to treat the debug space separately
in the object files.
getcwd is assumed to use malloc if its buff param is null. This may need
fixing in the future, but it's in one spot. The result in "saved" in the
non-progs pool.
It never really helped sort out the path issues when using build
directories. It worked well enough for single directory projects, but
things got messy very quickly, especially when mixing ruamoko libs with
external progs. A better method based on dwarf is coming.
Killed nodes can leave stray (dangling) edges that cause some confusion
in the dot graphs and may cause problems later on down the track, so
ensure there are no dangling edges.
The reason double-alias fails is when the double assignment occurs, the
int operands don't yet have leaf nodes and thus the nodes cannot be
killed. This doesn't fix the bug.
Because type aliases need to be unaliased, the type pointers in the type
encodings need to be correct when it comes to linking defs and
functions. This fixes the linking errors in ruamoko/game.
I was very uncertain about the validity of messing with the old type
encoding that way, but adding the check to ensure the type has been
processed never fired, so it seems ok. And the comments help me a lot :)
When aliasing a type that already has aliases, the top node needs to be
replaced if it is unnamed, or the alias-free branch of the new node
needs to reach around to the alias-free branch of the existing node.
This fixes the bogus param counts in qwaq.
This fixes the typelinker test, but not the linking error in
ruamoko/game that it was supposed to represent. I guess there's
something more going on (maybe type encoding relocation issues).
Fixes#6
It turned out that the problem with @zero was caused by initial type
chaining occurring before the structures had been initialized and thus
the linker's @zero type encoding string was incorrect: {?=} instead of
{tag @zero-}, thus when the actual type encoding supplied by an object
file came along (with the correct encoding string), it wasn't found.
It is now "consistent" with the rest of the type building in that it
uses find_type(append_type(return, params)) like the C version, thus
allowing append_type to do its thing with type aliases. This fixes the
overload test.
The full_type branch of an alias splitter (alias with null name) needs
to mirror the clean side up to the type alias. It is causing problems
with functions, but that's expected because parameters complicate
things.
It's not connected up yet because I'm unsure of just where to put things
(it gets messy fast), but just being able to see the structure of
complex types is nice.
This eases type unaliasing on functions a little.
Still more to to go, but this fixes a really hair-pulling bug: linux's
heap randomiser was making the typedef test fail randomly whenever
typedef.qfo was compiled.
When a type is aliased, the alias has two type chains: the simple type
chain with all other aliases stripped, and the full type chain. There
are still plenty of bugs in it, but having the clean type chain takes
care of the major issue that was in the previous attempt as only the
head of the type-chain needs to be skipped for type comparison.
Most of the bugs are in finding the locations where the head needs to be
skipped.
All simple type checks are now done using is_* helper functions. This
will help hide the implementation details of the type system from the
rest of the compiler (especially the changes needed for type aliasing).
They take a pointer to a free-list used for hashlinks so the hashlink
pools can be per-thread. However, hash tables that are not updated are
always thread-safe, so this affects only updates. progs_t has been set
up such that it is easy for multiple progs within one thread can share
hashlinks.
and its usage. The parts of flow_analyze_statement that use it know
where the returned operand needs to go. Unfortunately, this breaks dags
pretty hard, but that's because dags needs to learn about the fancy
assignment-type statements.