A bit of a mess for optimized vs unoptimized, but the tests acknowledge
the differences in precision while checking that the code produces the
right results allowing for that precision.
It seems that i686 code generation is all over the place reguarding sse2
vs fp, with the resulting differences in carried precision. I'm not sure
I'm happy with the situation, but at least it's being tested to a
certain extent. Not sure if this broke basic (no sse) i686 tests.
GCC does a fairly nice job of producing code for vector types when the
hardware doesn't support SIMD, but it seems to break certain math
optimization rules due to excess precision (?). Still, it works well
enough for the core engine, but may not be well suited to the tools.
However, so far, only qfvis uses vector types (and it's not tested yet),
and tools should probably be used on suitable machines anyway (not
forces, of course).
While this caused some trouble for pr_strings and configurable strftime
(evil hacks abound), it's the result of discovering an ancient (from
maybe as early as 2004, definitely before 2012) bug in qwaq's printing
that somehow got past months of trial-by-fire testing (origin understood
thanks to the warning finding it).
They take advantage of gcc's vector_size attribute and so only cross,
dot, qmul, qvmul and qrot (create rotation quaternion from two vectors)
are needed at this stage as basic (per-component) math is supported
natively by gcc.
The provided functions work on horizontal (array-of-structs) data, ie a
vec4d_t or vec4f_t represents a single vector, or traditional vector
layout. Vertical layout (struct-of-arrays) does not need any special
functions as the regular math can be used to operate on four vectors at
a time.
Functions are provided for loading a vec4 from a vec3 (4th element set
to 0) and storing a vec4 into a vec3 (discarding the 4th element).
With this, QF will require AVX2 support (needed for vec4d_t). Without
support for doubles, SSE is possible, but may not be worthwhile for
horizontal data.
Fused-multiply-add is NOT used because it alters the results between
unoptimized and optimized code, resulting in -mfma really meaning
-mfast-math-anyway. I really do not want to have to debug issues that
occur only in optimized code.
It seems -ffast-math is not necessarily faster, and the errors it causes
may not be worth the gains, but I'm not sure I want to nuke it completely,
so instead disabling the worst offender of it
(-fno-unsafe-math-optimizations) seems to be the best option. qfbsp now
produces identical text output between optimized and unoptimized builds,
and may be slightly faster than before the change (1.9s for start.map vs
2.0s)
While it further breaks RPM building, all AC_SUBST(HAVE_*) have been nuked.
When AM_SUBST_NOTMAKE, tell automake to not generate var = @var@ in
Makefile.in for qf specific vars (QF_SUBST is a wrapper for AC_SUBST that
also calls AM_SUBST_NOTMAKE).
Most of the guts of configure.ac have been moved to config.d and are then
brought in by m4_include. This will make maintaining configure.ac much easier.
Also drop use of PROGRAM and VERSION, using PACKAGE_NAME, PACKAGE_VERSION, and
on occasion, PACKAGE_STRING instead, and clean out some old files we no longer
need.