The array access code was loading the vector, modifying the element,
then forgetting to write the modified vector back to whence it came.
However, that would be rather sub-optimal, so now when the vector is
accessed by a pointer, the array code switches to field access to get at
the vector element thus avoiding the need to copy the whole vector.
Needed for proper analysis (ud-chains etc). Of course, it was then
necessary to remove the parameter defs from the uninitialized defs.
Also, plug a couple of memory leaks (forgot to free some temporary
sets).
That is, `array + offset`. This actually works around the bug
highlighted by arraylife.r (because the array is explicitly used), but
is not a proper solution, so that test still fails of course. However,
with this, it's no longer necessary to use `&array[index]` instead of
`array + index`.
I could never remember what any of the numbers meant. While define is
still a little fuzzy (they're (pseudo)statement numbers), at least now
I'll always know that the numbers are the define set. Also, having the
flow address of the variable helps with understanding the reaching defs
output.
It seems that the optimizer keeps array assignments live when passing
the array as a pointer, but not when passing the address of an element.
Found when testing the following code:
BasisBlade *pga_blades[16] = {
blades[1], blades[2], blades[3], blades[4],
blades[7], blades[6], blades[5], blades[0],
blades[8], blades[9], blades[10], blades[15],
blades[14], blades[13], blades[12], blades[11],
};
BasisGroup *pga_groups[4] = {
[BasisGroup new:4 basis:&pga_blades[ 0]],
[BasisGroup new:4 basis:&pga_blades[ 4]],
[BasisGroup new:4 basis:&pga_blades[ 8]],
[BasisGroup new:4 basis:&pga_blades[12]],
};
Only the first element of pga_blades is being assigned in the optimized
code, but everything is correct when not optimizing.
I had messed up the handling of declarators for combinations of pointer,
function, and array: the pointer would get lost (and presumably arrays
of functions etc). I think I had gotten confused and thought things were
a tree rather than a simple list, but Holub set me straight once again
(I've never regretted getting that book). Once I understood that, it was
just a matter of finding all the places that needed to be fixed. Nicely,
most of the duplicated code has been refactored and should be easier to
debug in the future.
It turns out I broke the type system when it comes to pointers to
functions and arrays. This test checks basic function and array pointers
and passes with qfcc from before the type system rework.
The type system rewrite had lost some of the checks for function fields.
This puts the actual code in the one place and covers parameters as well
as globals.
Internally, * is not really a valid operator for vectors since it can
have many meanings. This didn't cause trouble until trying to build
everything in game-source (since there's still a lot of legacy code in
there).
The precedence check changes done in
63795e790b seem to have been incorrect
(game-source/ctf produced many false positives), so putting that check
against '=' back into the code seems like a good idea (no more false
positives). That sounds a bit cargo-cult, but I'm really not sure what I
was thinking when I did the changes (probably just tired).
This applies only to the top-level scope of the function. I'm not sure
if it's right for traditional quakec code, but that can be adjusted
easily enough.
The symtab code itself cares only about global/not global for the size
of the hash table, but other code can use the symtab type for various
checks (eg, parameter shadowing).
Along with QuakeC's, of course. This fixes type typeredef2 test (a lot
of work for one little syntax error). Unfortunately, it came at the cost
of requiring `>>` in front of state expressions on C-style functions
(QuakeC-style functions are unaffected). Also, there are now two
shift/reduce conflicts with structs and unions (but these same conflicts
are in gcc 3.4).
This has highlighted the need for having the equivalent of the
expression tree for the declaration system as there are now several
hacks to deal with the separation of types and declarators. But that's a
job for another week.
The grammar constructs for declarations come from gcc 3.4's parser (I
think it's the last version of gcc that used bison. Also, 3.4 is still
GPL 2, so no chance of an issue there).
This simplifies type type_specifier rule significantly as now TYPE_SPEC
(was TYPE) includes all types and their basic modifiers (long, short,
signed, unsigned). This should allow me to make the type system closer
to gcc's (as of 3.4 as that seems to be the last version that used a
bison parser) and thus fix typeredef2.
typeredef1 parses properly but fails due to it erroneously complaining
that foo is redeclared as a different kind of object (it's the same
kind).
typeredef2 is the real problem in that it's a syntax error when it
should not be. This has proven to be a show-stopper for development on
my laptop as it has very recent vulkan headers which have such a
duplicate typedef.
Once a unicode char (ie, > 127) was used, any ascii chars would get the
tail of the last unicode char resulting in broken utf-8 streams. The
resulting null glyph boxes were not very appealing.
Because of the way the plane normal is used (front/on/back checks, and
midpoint calculation), other than possible precision, there is no need
to normalize the normal. Removing the square root and division resulted
in a huge boost: from 34s to 14 seconds. The average clusters visible
hasn't change much, and a quick check in-game didn't show any issues.
At least modern gcc produces nice code for ?: (cmov), and a SIMD
cross-product uses several fewer instructions. The cross-product shaved
off 0.5-1s, but the modulo -> ?: shaved off about 3-4s, for a total of
about 10% speedup (1.09 insn/cyc vs 1.01 insn/cyc, so even perf agrees).
This fixes the basic vecconst test (extending it to other types breaks
because long and ulong are not properly supported yet). The conversion
is done by the progs VM rather than writing another 256 conversions
(though loops could be used). This works nicely as a test for using the
VM to help with compiling.
Raw 'x y z' style vector constants that look like ints (no fractional
parts) used to initialize vector globals/constants don't get converted
to float vectors, resulting in nans for negative values and denormals
for positive values. This tends to make game physics... interesting.
While the option to make '*' mean dot product for vectors is important,
it breaks vector scaling in ruamoko progs as the resultant vector op
becomes a dot product instead of the indented hadamard product (ie,
component-wise).
The common idiom for self init (below) causes a double-call when
compiling with --advanced, resulting in an incorrect retain count.
if (!(self = [super init])) {
return nil;
}
The support for the new vector types broke compiling code using
--advanced. Thus it's necessary to ensure vector constants are
float-type and vec3 and vec4 are treated as vector and quaternion, which
meant resurrecting the old vector expression code for v6p progs.
Id's comments are a little inconsistent, but for the most part usable
info can be extracted. While not yet supported, Arcane Dimensions'
comments are extremely consistent (just some issues with hyphen counts
in separators), so parsing out usable info will be fairly easy. The hard
part will be presenting it.
The method is still held by known_methods, so freeing it causes grief.
However, it may cause a leak thus the free is only commented out. More
investigation is needed. I'm surprised the problem didn't show on linux,
but cygwin-native hit it and valgrind on linux found the spot :)
While it does get a bit cluttered currently, being able to see the
contents of structures makes a huge difference. Also highlights that
vector immediates do not get the correct type encodings.
This fixes the internal error generated by the likes of
`(sv_gravity * '0 0 1')` where sv_gravity is a float and `'0 0 1'` is an
ivec3: the vector is promoted to vec3 first so that expanding sv_gravity
is expanded to vec3 instead of ivec3 (which is not permitted for a
float: expansion requires the destination base type to be the same as
the source).
For now, anyway, as the generated code looks good. There might be
problems with actual pointer expressions, but it allows entity.field to
work as expected rather than generate an ICE.
The resultant unicode is encoded as utf-8, which does conflict with the
quake character map, but right now unicode is useful only with font
text, and those support only standard unicode (currently only as utf-8),
but something will need to be sorted out.
Arrays are passed as a pointer to the first element, so are always valid
parameters. Fixes a bogus "formal parameter N is too large to be passed
by value" error.
While swizzle does work, it requires the source to be properly aligned
and thus is not really the best choice. The extend instruction has no
alignment requirements (at all) and thus is much better suited to
converting a scalar to a vector type.
Fixes#30
It seems clang loses track of the usage of the referenced unions by the
time the code leaves the switch. Due to the misoptimization, "random"
values would get into the vector constants. This puts the usages in the
same blocks as the unions, causing clang to "get it right" (though I
strongly suspect I was running into UB).
While I might need to tighten up the rules later, this allows binary
operations between vector (the type) and explicitly typed vec3 constants
(and non-constants, about which I am undecided). The idea is that
explicit constants such as '1 2 3'f should be compatible with either
type.
This applies to quaternions as well.
As a class's ivars are built up by inheritance, but with only that
class's ivars in the symbol table, is is necessary to include an offset
based on the super class's ivars in order to ensure alignments are
respected. This is achieved via the new `base` parameter to
build_struct(), which is used to offset the current size while
calculating the aligned offset of the symbols. The parameter is ignored
for unions, as they always start at 0. The ivars for the current class
still have a base offset of 0 until they are actually added to the
class.
Fixes#29
The alignment is specified as a power of 2 (ie, actual alignment = 1 <<
alignment) allowing old object files to be compatible (as their
alignment is 0). This is necessary for (in part for #30) as it turned
out even global vectors were not aligned correctly.
Currently, only data spaces even vaguely respect alignment. This may
need to be fixed in the future.
Most were pretty easy and fairly logical, but gib's regex was a bit of a
pain until I figured out the real problem was the conditional
assignments.
However, libs/gamecode/test/test-conv4 fails when optimizing due to gcc
using vcvttps2dq (which is nice, actually) for vector forms, but not the
single equivalent other times. I haven't decided what to do with the
test (I might abandon it as it does seem to be UD).
At at some stage blender enforced frames being integers (In the past,
there was support for fractional, I think, but I also seem to remember
it not working) (yes, for anybody looking, this commit message is more
or less copied from io_object_mu).
Defs and symbols benefit from swizzling as that's one instruction vs 2-3
for loading a scalar into a vector component by component. Constants are
ok because the result gets converted to a vector constant.
qfcc is putting two temps in the same location due to
defspace_alloc_aligned_loc returning the same address when there was a
hole caused by an earlier aligned alloc: specifically, a size-3 hole and
a size-2 allocation with alignment-2.
The destination operand must be a full four component vector, but the
source can be smaller and small sources do not need to be aligned: the
offset of the source operand and the swizzle indices are adjusted. The
adjustments are done during final statement emission in order to avoid
confusing the data flow analyser (and that's when def offsets are known).
This reverts commit 2904c619c1.
In order to support swizzle operations, I need to be able to alias defs
to larger types (eg, float to vec4), but alias_def rightly won't allow
this. However, as the plan is to do this in the final steps before
emitting the instruction, I plan on creating an alias to a float then
adjusting the type in the alias, but to do so without extra shenanigans,
I need alias_def to allow aliases to the same type. As a fringe benefit,
it makes the code agree with the comment in def.h :P
This came up when investigating an internal error from the line above.
It turned out the error was correct (problem with converting scalars to
vectors), but the break was not.
Currently, only vector/vec3 and quaternion/vec4 can be printed anyway,
but I plan on making explicit format strings for the types, so there
should be no need to promote any vector types (and really, any hidden
promotion is a bit of a pain, but standards...).
While the code would handle int vector types, there aren't any such
instructions, and the expression code shouldn't generate them, but all
float (32 and 64 bit) vector types do have a dot product instruction, so
check width rather than just vector/quaternion.
This fixes an error that's been lurking for over two years (since I made
parameters unlimited internally). The problem was the array was being
allocated on the stack and a simple struct copy was used to store type
type, resulting in a dangling pointer onto the stack. I'm surprised it
didn't cause more problems.
This allows all the tests to build and pass. I'll need to add tests to
ensure warnings happen when they should and that all vec operations are
correct (ouch, that'll be a lot of work), but vectors and quaternions
are working again.
Vector expressions no longer auto-widen due to the new vector types (I
might add such later, but for now this lets the tests try to build
(minus actual fixes in qfcc)).
With this, all vector widths and types are supported: 2, 3, 4 and int,
uint, long, ulong, float and double, along with support for suffixes to
make the type explicit: '1 2'd specifies a dvec2 constant, while '1 2 3'u
is a uivec3 constant. Default types are double (dvec2, dvec3, dvec4) for
literals with float-type components, and int (ivec2...) for those with
integer-type components.
Having three very similar sets of code for outputting values (just for
debug purposes even) got to be a tad annoying. Now there's only one, and
in the right place, too (with the other value code).
I'd created new_value_expr some time ago, but never used it...
Also, replace convert_* with cast_expr to the appropriate type (removes
a pile of value check and create code).
Use with quaternions and vectors is a little broken in that
vec4/quaternion and vec3/vector are not the same types (by design) and
thus a cast is needed (not what I want, though). However, creating
vectors (that happen to be int due to int constants) does seem to be
working nicely otherwise.
Nicely, I was able to reuse the generated conversion code used by the
progs engine to do the work in qfcc, just needed appropriate definitions
for the operand macros, and to set up the conversion code. Helped
greatly by the new value load/store functions.
pr_type_t now contains only the one "value" field, and all the access
macros now use their PACKED variant for base access, making access to
larger types more consistent with the smaller types.
This is an extremely extensive patch as it hits every cvar, and every
usage of the cvars. Cvars no longer store the value they control,
instead, they use a cexpr value object to reference the value and
specify the value's type (currently, a null type is used for strings).
Non-string cvars are passed through cexpr, allowing expressions in the
cvars' settings. Also, cvars have returned to an enhanced version of the
original (id quake) registration scheme.
As a minor benefit, relevant code having direct access to the
cvar-controlled variables is probably a slight optimization as it
removed a pointer dereference, and the variables can be located for data
locality.
The static cvar descriptors are made private as an additional safety
layer, though there's nothing stopping external modification via
Cvar_FindVar (which is needed for adding listeners).
While not used yet (partly due to working out the design), cvars can
have a validation function.
Registering a cvar allows a primary listener (and its data) to be
specified: it will always be called first when the cvar is modified. The
combination of proper listeners and direct access to the controlled
variable greatly simplifies the more complex cvar interactions as much
less null checking is required, and there's no need for one cvar's
callback to call another's.
nq-x11 is known to work at least well enough for the demos. More testing
will come.
This means that a tex_t object is passed in instead of just raw bytes
and width and height, but it means the texture can specify whether it's
flipped or uses BGR instead of RGB. This fixes the upside down
screenshots for vulkan.
QFS_NextFilename was renamed to QFS_NextFile to reflect the fact it now
returns a QFile pointer for the newly created file (as well as the
name). This necessitated updating WritePNG to take a file pointer
instead of a file name, with the advantage that WritePNGqfs is no longer
necessary and callers have much more control over the creation of the
file.
This makes QFS_NextFile much more secure against file system race
conditions and attacks (at least in theory). If nothing else, it will
make it more robust in a multi-threaded environment.
The "not" because I'm pretty sure they're false positives due to when
the function is called, but clang doesn't know that (wonder why gcc was
ok with it).
clang doesn't like anything but a bare 0 as null (and in some of the
cases, it was quite right: '\0' should not be treated as a null
pointer). And the crashers were just for paranoia and probably aren't
needed any more (kept for now, though).
It seems clang defaults to unsigned for enums. Interestingly, gcc was ok
with the checks being either way. I guess gcc treats enums that *can* be
unsigned as DWIM.
In working with vectors and matrices while testing the scene wrappers, I
found that there was a fair bit of confusion about how large something
could be. Return values can be up to 32 words (but qfcc wasn't aware of
that), parameters were limited to 4 words still (and possibly should be
for varargs), and temp defs were limited to 8 words (1 lvec4). Temps are
used for handling return values (at least when not optimizing) and thus
must be capable of holding a return value, and passing large arguments
through *formal* parameters should be allowed. It seems reasonable to
limit parameter sizes to return value sizes.
A temp and a move are still used for large return values (4x4 matrix),
but that's an optimization issue: the code itself is at least correct.
This is the bulk of the work for recording the resource pointer with
with builtin data. I don't know how much of a difference it makes for
most things, but it's probably pretty big for qwaq-curses due to the
very high number of calls to the curses builtins.
Closes#26
When the def can be found. This fixes direct assignments to arrays (and
probably structs) getting lost when the array is later read using a
variable index.
Float is not int, and Ruamoko has only int ifz/ifnz, which will fail for
-0.0 (0x80000000 when viewed as an int). And then there's nan, but I
haven't seen too many of those in quake.
I suspect this is an ancient bug that wasn't noticed due to not looking
at progs.src compiled code enough, but it makes the first statements of
the function point to the correct line instead of a forward declaration.
Currently only via pragma (not command line options), but I needed to
test the concept. Converting legacy code is just too error prone.
Telling the compiler how to treat the operator makes more sense. When *
acts as @dot with Ruamoko progs, the result is automatically aliased as
a float as this is the legacy meaning (ie, float result for dot
product).
This is a very tiny optimization, but there's no point in adjust the
stack if there's no actual adjustment. I didn't bother with it initially
because I thought it wouldn't happen (and I was more interested in
getting things working first), but it turns out that simple getters that
result in a zero adjustment are quite common (70/535 in qwaq-app.dat).
It now takes the function name to print in error message (passed on to
PR_Sprintf) and the argument number of the format string. The variable
arguments (in ...) are assumed to be immediately after the format
argument.
This is achieved by marking a void function with the void_return
attribute and then calling that function in an @return expression.
@return can be used only inside a void function and only with void
functions marked with the void_return attribute. As this is intended for
Objective-QC message forwarding, it is deliberately "difficult" to use
as returning a larger than expected value is unlikely to end well for
the calling function.
However, as a convenience, "@return nil" is allowed (in a void
function). It always returns an integer (which, of course,can be
interpreted as a pointer). This is safe because if the return value is
ignored, it will go into the progs return buffer, and if it is not
ignored, it is the smallest value that can be returned.
Having to remember to copy yet another specifier bit was getting
tedious, so use a union of a struct with the bitfields and an unsigned
int to access them in parallel. Makes for a tidier spec_merge, and one
less headache.
The command line option works the same way as
--advanced/traditional/extended, as does the pragma. As well, raumoko
(alternative spelling) can be used because both are legitimate and some
people may prefer one spelling over the other.
As always, use of the pragma is at one's own risk: its intended use is
forcing the target in the unit tests.
dvec4, lvec4 and ulvec4 need to be aligned to 8 words (32 bytes) in
order to avoid hardware exceptions. Rather than dealing with possibly
mixed alignment when a function has 8-word aligned locals but only
4-word aligned parameters, simply keep the stack frame 8-word aligned at
all times.
As for sizes, the temp def recycler was written before the Ruamoko ISA
was even a pipe dream and thus never expected temp def sizes over 4. At
least now any future adjustments can be done in one place.
My quick and dirty test program works :)
dvec4 xy = {1d, 2d, 0d, 0.5};
void printf(string fmt, ...) = #0;
int main()
{
dvec4 u = {3, 4, 3.14};
dvec4 v = {3, 4, 0, 1};
dvec4 w = v * xy + u;
printf ("[%g, %g, %g, %g]\n", w[0], w[1], w[2], w[3]);
return 0;
}
They're now properly part of the type system and can be used for
declaring variables, initialized (using {} block initializers), operated
on (=, *, + tested) though much work needs to be done on binary
expressions, and indexed. So far, only ivec2 has been tested.
When possible, of course. However, this tightens up struct and constant
index array accesses, and avoids issues with flow analysis losing track
of the def (such trucking is something I want to do, but haven't decided
out to get the information out to the right statements).
Since address expressions always product a pointer type, aliasing one to
another pointer type is redundant. Instead, simply return an address
expression with the desired type.
The FIXME was there because I couldn't remember why the test was
type_compatible but the internal error complains about the types being
the same size. The compatibility check is to see if the op can be used
directly or whether a temp is required. The offset check is because
types that are the same size (which they must be if they are
compatible) is because it is not possible to create an offset alias def
that escapes the bounds of the real def, which any non-zero offset will
do if the types are the same size.
This is the intended purpose of the offset field in address expressions,
and will make struct and array accesses more efficient when I sort out
the code generation side.
Ruamoko passes va_list (@args) through the ... parameter (as such), but
IMP uses ... to defeat parameter type and count checking and doesn't
want va_list. While possibly not the best solution, adding a no_va_list
flag to function types and skipping ex_args entirely does take care of
the problem without hard-coding anything specific to IMP.
The system currently just sets some bits in the type specifier (the
attribute list should probably be carried around with the specifier),
but it gets the job done for now, and at least gets things started.
This makes it much easier to check (and more robust to name changes),
allowing for effectively killing the node to which the variable being
addressed is attached. This fixes the incorrect address being used for
va_list, which is what caused double-alias to fail.
In order to not waste instructions, the Ruamoko ISA does not provide 1
and 2 component 64-bit load/store instructions since they can be
implemented using 2 and 4 component 32-bit instructions (load and store
are independent of the interpretation of the data). This fixes the
double test, and technically the double-alias test, but it fails due to
a problem with the optimizer causing lea to use the wrong reference for
the address. It also breaks the quaternion test due to what seems to be
a type error that may have been lurking for a while, further
investigation is needed there.