This seems to be pretty close to as fast as it gets (might be able to do
better with some shuffles of the negation constants instead of loading
separate constants).
Care needs to be taken to ensure the right function is used with the
right arguments, but with these, the need to use qconj(d|f) for a
one-off inverse rotation is removed.
I think the sub-line allocator falling over is the final source of
qfvis's leaks. It certainly causes a mess of the sub-lines. But having
some tests to get working sure beats scratching my head over qfvis :)
The idea is to not search through blocks for an available allocation.
While the goal was to speed up allocation of cache lines of varying
cluster sizes, it's not enough due to fragmentation.
They take advantage of gcc's vector_size attribute and so only cross,
dot, qmul, qvmul and qrot (create rotation quaternion from two vectors)
are needed at this stage as basic (per-component) math is supported
natively by gcc.
The provided functions work on horizontal (array-of-structs) data, ie a
vec4d_t or vec4f_t represents a single vector, or traditional vector
layout. Vertical layout (struct-of-arrays) does not need any special
functions as the regular math can be used to operate on four vectors at
a time.
Functions are provided for loading a vec4 from a vec3 (4th element set
to 0) and storing a vec4 into a vec3 (discarding the 4th element).
With this, QF will require AVX2 support (needed for vec4d_t). Without
support for doubles, SSE is possible, but may not be worthwhile for
horizontal data.
Fused-multiply-add is NOT used because it alters the results between
unoptimized and optimized code, resulting in -mfma really meaning
-mfast-math-anyway. I really do not want to have to debug issues that
occur only in optimized code.
It is capable of parsing single expressions with fairly simple
operations. It current supports ints, enums, cvars and (external) data
structs. It is also thread-safe (in theory, needs proper testing) and
the memory it uses can be mass-freed.
This was inspired by
Hoard: A Scalable Memory Allocator
for Multithreaded Applications
Emery D. Berger, Kathryn S. McKinley, Robert D. Blumofe, Paul R.
Wilson,
It's not anywhere near the same implementation, but it did take a few
basic concepts. The idea is twofold:
1) A pool of memory from which blocks can be allocated and then freed
en-mass and is fairly efficient for small (4-16 byte) blocks
2) Tread safety for use with the Vulkan renderer (and any other
multi-threaded tasks).
However, based on the Hoard paper, small allocations are cache-line
aligned. On top of that, larger allocations are page aligned.
I suspect it would help qfvis somewhat if I ever get around to tweaking
qfvis to use cmem.
The calculation fails (produces NaN) if the vectors are anti-parallel,
but works for all other combinations. I came up with this implementation
when I discovered Unity's Quaternion.FromToRotation could did not work
with very small angles. This implementation will produce a usable
quaternion below 0.00255 degrees (though it will be slightly larger than
unit). Unity's failed such that I could see KSP's skybox snap while it
rotated around my test vessel.
There's still some cleanup to do, but everything seems to be working
nicely: `make -j` works, `make distcheck` passes. There is probably
plenty of bitrot in the package directories (RPM, debian), though.
The vc project files have been removed since those versions are way out
of date and quakeforge is pretty much dependent on gcc now anyway.
Most of the old Makefile.am files are now Makemodule.am. This should
allow for new Makefile.am files that allow local building (to be added
on an as-needed bases). The current remaining Makefile.am files are for
standalone sub-projects.a
The installable bins are currently built in the top-level build
directory. This may change if the clutter gets to be too much.
While this does make a noticeable difference in build times, the main
reason for the switch was to take care of the growing dependency issues:
now it's possible to build tools for code generation (eg, using qfcc and
ruamoko programs for code-gen).
When I ported SEB to python, I discovered that I apparently didn't
really understand the paper's description of the end condition and the
usage of the affine and convex sets for center testing. This cleans up
the test and makes SEB more correct for the cases that have less than 4
supporting points (especially when there are less than 4 points total).
The better accuracy is for specific cases (90 degree rotations around a
main axis: the matrix element for that axis is now 1 instead of
0.99999994). The speedup comes from doing fewer additions (multiply
seems to be faster than add for fp, at least in this situation).
After messing with SIMD stuff for a little, I think I now understand why
the industry went with xyzw instead of the mathematical wxyz. Anyway, this
will make for less pain in the future (assuming I got everything).
The idea comes from The OpenGL Shader Wrangler
(http://prideout.net/blog/?p=11). Text files are broken up into chunks via
lines beginning with -- (^-- in regex). The chunks are optionally named
with tags of the form: [0-9A-Za-z._]+. Unnamed chunks cannot be found.
Searching for chunks looks for the longest tag that matches the beginning
of the search tag (eg, a chunk named "Vertex" will be found with a search
tag of "Vertex.foo"). Note that '.' forms the units for the searc
("Vertex.foo" will not find "Vertex.f").
Unlike glsw, this implementation does not have the concept of effects keys
as that will be separate. Also, this implementation takes strings rather
than file names (thus is more generally useful).
set_bits_t is now 64 bits for x86_64 machines (in linux, anyway). This gave
qfvis a huge speed boost: from ~815s to ~720s.
Also, expose some of the set internals so custom set operators can be
created.
Now we can get tight (<1e-6 * radius_squared error) bounding spheres. More
importantly (for qfvis, anyway) very quickly: 1.7Mspheres/second for a 5
point cloud on my 2.33GHz Core 2 :)
It "works" for lines, triangles and tetrahedrons. For lines and triangles,
it gives the barycentric coordinates of the perpendicular projection of the
point onto to features. Only tetrahedrons are guaranteed to reproduce the
original point.
Getting everything right with an enum proved to be too difficult if not
impossible. Also use better tests for equivalence and intersection.
Many more tests have been added. All pass :)
And the tests really exercised VectorShear (first attempt had things
messed up when more than one shear value was non-zero). Also,
Mat4Decompose wasn't orthogonalizing the z axis row. Oops. Anyway,
Mat4Decompose is now known to work well, and the usage of its output is
understood :)
I got the idea from blender when I discovered by accident that quat * vect
produces the same result as quat * qvect * quat* and looked up the code to
check what was going on. While matrix/vector multiplication still beats the
pants off quaternion/vector multiplication, QuatMultVec is a slight
optimization over quat * qvect * quat* (17+,24* vs 24+,32*, plus no need to
to generate quat*).
Buffer underflow and though strcpy has always been safe there, change to
memmove. Had the added benefit of helping me create more test cases for
better coverage.