The singleton alias resulted in the adjusted swizzles being corrupted
when for the same def. Other than adding properly sized swizzles
(planned), the simplest solution is to (separately) allow alias that
stick out from from the def.
While the progs engine itself implements the instructions correctly, the
opcode specs (and thus qfcc) treated the results as 32-bit (which was,
really, a hidden fixme, it seems).
I didn't particularly like that solution due to the implied extra
bandwidth (probably should profile such sometime), but I think the
extend operations could be merged into simple assignments by the
optimizer at some stage (or further cleaned up when this stuff gets
moved to actual code gen where it should be).
Currently via only the group mask (which is really horrible to work
with: requires too much knowledge of implementation details, but does
the job for testing), but it got some basics working.
Using swizzle for some of the geometric product operations was a bit
clunky and has problems with alignment. While I will still need it in
the future, it turns out that the extend instruction almost did what I
needed. However, I'm unsure that reversing components within an extended
vector is the way to go.
It turned out they were always using floats for the source type (meaning
doubles were broken), and not shifting the component in the final sizzle
code meaning all swizzles were ?xxx (neglecting minus or 0). I'd make
tests, but I plan on modifying the instruction set a little bit.
Also, correct the handling of scalars in dot and wedge products: it
turns out s.v and s^v both scale. However, it seems the CSE code loses
things sometimes.
This has shown the need for more instructions, such as a 2d wedge
product and narrower swizzles. Also, making dot product produce a vector
instead of a scalar was a big mistake (works nicely in C, but not so
well in Ruamoko).
The current code is pretty broken when it comes to vector types (losing
the vector and bogus errors among other issues). The whole thing needs a
rework or even just to be tossed in favor of better DAG processing.
I guess Hamish's suggestion made sense at the time, but I found that
with the current instructions, the reversed bivector wasn't so nice to
implement it would need a swizzle as well as the cross-product.
By default. Conversion of quake strings needs to be requested (which is
done by nq and qw clients and servers, as well as qfprogs via an
option). I got tired of seeing mangled source code in the disassembly.
I'm not sure if that was a thinko, typo, or something else, but judging
by the relevant commit message, the use of quaternion and vector was
intended only for advanced progs (v6p).
This makes working with them much easier, and the type system reflects
what's in the multi-vector. Unfortunately, that does mean that large
algebras will wind up having a LOT of types, but it allows for efficient
storage of sparse multi-vectors:
auto v = 4*(e1 + e032 + e123);
results in:
0005 0213 1:0008<00000008>4:void 0:0000<00000000>?:invalid
0:0044<00000044>4:void assign (<void>), v
0006 0213 1:000c<0000000c>4:void 0:0000<00000000>?:invalid
0:0048<00000048>4:void assign (<void>), {v + 4}
Where the two source vectors are:
44:1 0 .imm float:18e [4, 0, 0, 0]
48:1 0 .imm float:1aa [4, 0, 0, 4]
They just happen to be adjacent, but don't need to be.
Scaling now works for multi-vector expressions, and always subtracting
even when addition is wanted doesn't work too well. However, now there's
the problem of multi-vectors very quickly becoming full algebra vectors,
which means certain things need a rethink.
This gets only some very basics working:
* Algebra (multi-vector) types: eg @algebra(float(3,0,1)).
* Algebra scopes (using either the above or @algebra(TYPE_NAME) where
the above was used in a typedef.
* Basis blades (eg, e12) done via procedural symbols that evaluate to
suitable constants based on the basis group for the blade.
* Addition and subtraction of multi-vectors (only partially tested).
* Assignment of sub-algebra multi-vectors to full-algebra multi-vectors
(missing elements zeroed).
There's still much work to be done, but I thought it time to get
something into git.
If a symbol is not found in the table and a callback is provided, the
callback will be used to check for a valid procedural symbol before
moving on to the next table in the chain. This allows for both tight
scoping of the procedural symbols and caching.
Due to joys of pointers and the like, it's a bit of a bolt-on for now,
but it works nicely for basic math ops which is what I wanted, and the
code is generated from the expression.
Only · (dot product) and × (cross product for vector, commutator product
for geometric algebra) have been tested so far, but that involved
fighting with cpp to get it to not convert the · to \U000000b7, which
was rather annoying.
I realized recently that I had made a huge mistake making Ruamoko's
based addressing use unsigned offsets as it makes stack-relative
addressing more awkward when it comes to runtime-determined stack frames
(eg, using alloca). This does put a bit of an extra limit on directly
addressable globals, but that's what the based addressing is meant to
help with anyway.
I'm not sure what's up, but arm gcc thinks the array isn't properly
initialized even though x86_64 gcc does. Maybe something with padding.
At least c23 makes it easy to 0-initialize VLAs.
I'm actually surprised anything worked, though I guess it was just the
one entry getting corrupted (and not 32, but I figured allocate slots
for all of the dynamic lights just in case). Or none, really, since
larger scenes (ie, those with multiple lights that fit in the same image
size) would result in not all the maps getting used and thus one spare
for dynamic lights.
Probably not a good idea for large maps, but handy for generating C
structs for small test maps. Does not include vertices or surfaces, just
the bsp tree itself for now.
This seems excessive, but gmsp3v2 map has 1399 lights. Worse, it has a
lot of different light sizes that go up by small increments (generally
around 10) resulting in 33 shadow map images (1 too many). Quantizing
the sizes to 32 drops this nicely to 20, and reduces memory consumption
slightly too (image buffer overhead, I guess).