Most of the set ops were always endian-agnostic since they were simply
operating on multiple bits in parallel, but individual element
add/remove/test was very endian-dependent. For the most part, this
didn't matter, but it does matter very much when loading external data
into a set or writing the data out (eg, for PVS).
Attempting to vis ad_tears drags a few lurking bugs out of
SmallestEnclosingBall_vf: poor calculation of 2-point affine space, poor
handling of duplicate points and dropped support points, poor
calculation of the new center (related to duplicate points), and
insufficient iterations for large point sets. qfvis (modified for
cluster spheres) now loads ad_tears.
As per usual, fp math finds a way to confound any epsilon test. So
rather than relying entirely on test_support_points, check the distance
from the sphere center to the affine point and break out of the loop if
the distance is small enough (< 1% of the current radius). This allows
qfvis to load ad_tears without hacks.
Scaling the checks by 1e-6 was a little too tight for very small
triangles, but 1e-5 seems to work well. This fixes SEB getting stuck for
a ridiculously small (for quake) triangle in ad_tears (probably resulted
from some bad math in qfbsp when generating the portal file from the
bsp).
For now, the functions check for a null hunk pointer and use the global
hunk (initialized via Memory_Init) if necessary. However, Hunk_Init is
available (and used by Memory_Init) to create a hunk from any arbitrary
memory block. So long as that block is 64-byte aligned, allocations
within the hunk will remain 64-byte aligned.
I need to write some automated tests for this, and reading of course,
but 1 and two byte outputs look correct. Kind of sad it took sixteen
years to get around to attempting to use the code :(
I knew counting bits individually was slow, but it never really mattered
until now. However, I didn't expect such a dramatic boost just by going
to mapping bytes to bit counts. 16-bit words would be faster still, but
the 64kB lookup table would probably start hurting cache performance,
and 32-bit words (4GB table) definitely would ruin the cache. The
universe isn't big enough for 64-bits :)
After seeing set_size and thinking it redundant (thought it returned the
capacity of the set until I checked), I realized set_count would be a
much better name (set_count (node->successors) in qfcc does make much
more sense).
Having set_expand exposed is useful for loading data into a set.
However, it turns out there was a bug in its size calculation in that
when the requested set size was a multiple of SET_BITS (and greater than
the current set size), the new set size one be SET_BITS larger than
requested. There's now some tests for this :)
This fixes a bug when loading bsp29 files that resulted in leaf nodes
having bogus bounding boxes if any coordinates were negative (and thus
dynamic lights, and probably all sorts of other things) being broken.
And it took me only 9 years to notice :P
This reduces the overhead needed to manage the memory blocks as the
blocks are guaranteed to be page-aligned. Also, the superblock is now
alllocated from within one of the memory blocks it manages. While this
does slightly reduce the available cachelines within the first block (by
one or two depending on 32 vs 64 bit pointers), it removes the need for
an extra memory allocation (probably via malloc) for the superblock.
The uptime display had not been updated for the offset Sys_DoubleTime,
so add Sys_DoubleTimeBase to make it easy to use Sys_DoubleTime as
uptime.
Line up the layout of the client list was not consistent for drop and
qport.
This failed with errors such as:
from ./include/QF/simd/vec4d.h:32,
from libs/util/simd.c:37:
./include/QF/simd/vec4d.h: In function ‘qmuld’:
/usr/lib/gcc/x86_64-pc-linux-gnu/10.3.0/include/avx2intrin.h:1049:1: error: inlining failed in call to ‘always_inline’ ‘_mm256_permute4x64_pd’: target specific option mismatch
1049 | _mm256_permute4x64_pd (__m256d __X, const int __M)
If Sys_Shutdown gets called twice, particularly if a shutdown callback
hangs and the program is killed with INT or QUIT, shutdown_list would be
in an invalid state. Thus, get the required data (function pointer and
data pointer) from the list element, then unlink the element before
calling the function. This ensures that a reinvocation of Sys_Shutdown
continues from the next callback or ends cleanly. Fixes a segfault when
killing testsound while using the oss output (it hangs on shutdown).
Fuzzy bsearch is useful for finding an entry in a prefix sum array
(value is >= ele[0], < ele[1]), and the reentrant version is good when
data needs to be passed to the compare function. Adapted from the code
used in pr_resolve.
A bit of a mess for optimized vs unoptimized, but the tests acknowledge
the differences in precision while checking that the code produces the
right results allowing for that precision.
It seems that i686 code generation is all over the place reguarding sse2
vs fp, with the resulting differences in carried precision. I'm not sure
I'm happy with the situation, but at least it's being tested to a
certain extent. Not sure if this broke basic (no sse) i686 tests.
GCC does a fairly nice job of producing code for vector types when the
hardware doesn't support SIMD, but it seems to break certain math
optimization rules due to excess precision (?). Still, it works well
enough for the core engine, but may not be well suited to the tools.
However, so far, only qfvis uses vector types (and it's not tested yet),
and tools should probably be used on suitable machines anyway (not
forces, of course).
I don't know that the cache line size is 64 bytes on 32 bit systems, but
it should be ok to assume that 64-byte alignment behaves well on systems
with smaller cache lines so long as they are powers of two. This does
mean there is some waste on 32-bit systems, but it should be fairly
minimal (32 bytes per memblock, which manages page sized regions).
QF now uses its own configuration file (quakeforge.cfg for now) rather
than overwriting config.cfg so that people trying out QF in their normal
quake installs don't trash their config.cfg for other quake clients. If
quakeforge.cfg is present, all other config files are ignored except
that quake.rc is scanned for a startdemos command and that is executed.
And improve the generated code for MSG_ReadShort
I suspect gcc didn't like all the excess pointer dereferences and so
couldn't assume that the bytes were being read sequentially.
And improve the generated code as well (ie, use a code sequence that gcc
recognizes and optimizes to a single 32-bit read and a byte-swap).
nq uses big-endian for its packet headers (arg, though it is consistent
with IP, it's not with the rest of quake).
I'm not sure that the mismatch between refdef_t and the assembly defines
was a problem (many fields unused), but the main problem was due to
execute permission on the pages: one chunk of asm was in the data
section, and the patched code was not marked as being executable (due to
such a thing not existing when quake was written).
While this caused some trouble for pr_strings and configurable strftime
(evil hacks abound), it's the result of discovering an ancient (from
maybe as early as 2004, definitely before 2012) bug in qwaq's printing
that somehow got past months of trial-by-fire testing (origin understood
thanks to the warning finding it).
Still "some" more to go: a pile to do with transforms and temporary
entities, and a nasty one with host_cbuf. There's also all the static
block-alloc lists :/
Useful for avoiding a pile of wrapper functions that merely pass on
command-specific data to the actual implementation. Used to clean up the
wrappers in nq and qw cl_input.c
This seems to be pretty close to as fast as it gets (might be able to do
better with some shuffles of the negation constants instead of loading
separate constants).
The main purpose is to allow fluent-style:
const char *targetname = PL_String (PL_ObjectForKey (entity, "targetname"));
if (targetname && !PL_ObjectForKey (targets, targetname)) {
PL_D_AddObject (targets, targetname, entity);
}
[note: the above is iffy due to ownership of entity, but the code from
which the above comes works around the issue]
Setting the result type cexpr_exprval tells cexpr to simply return whoe
exprval object rather than the referenced value, thus allowing the
caller to check the type when the expression is context sensitive.
The order in which keys are added to the dictionary object is
maintained. Adding a key after removing an old key adds the new key to
the end of the list rather than reusing the old key's spot.
PL_ParseLabeledArray works the same way as PL_ParseArray, but instead
takes a dictionary object. The keys of the items are ignored, and the
order is not preserved (at this stage), but this is a cleaner solution
to getting an array of objects when the definitions of those objects
need to be accessible by name as well.
This allows plist objects to be accessed directly from cexpr expressions
using struct.field syntax for dictionary objects and array[index] syntax
for array objects.
It now takes a context pointer (opaque data) that holds the buffers it
uses for the temporary strings. If the context pointer is null, a static
context is used (making those uses of va NOT thread-safe). Most calls to
va use the static context, but all such calls have been formatted
consistently so they are easy to find when it comes time to do a full
audit.