This seems to be pretty close to as fast as it gets (might be able to do
better with some shuffles of the negation constants instead of loading
separate constants).
The main purpose is to allow fluent-style:
const char *targetname = PL_String (PL_ObjectForKey (entity, "targetname"));
if (targetname && !PL_ObjectForKey (targets, targetname)) {
PL_D_AddObject (targets, targetname, entity);
}
[note: the above is iffy due to ownership of entity, but the code from
which the above comes works around the issue]
Setting the result type cexpr_exprval tells cexpr to simply return whoe
exprval object rather than the referenced value, thus allowing the
caller to check the type when the expression is context sensitive.
The order in which keys are added to the dictionary object is
maintained. Adding a key after removing an old key adds the new key to
the end of the list rather than reusing the old key's spot.
PL_ParseLabeledArray works the same way as PL_ParseArray, but instead
takes a dictionary object. The keys of the items are ignored, and the
order is not preserved (at this stage), but this is a cleaner solution
to getting an array of objects when the definitions of those objects
need to be accessible by name as well.
This allows plist objects to be accessed directly from cexpr expressions
using struct.field syntax for dictionary objects and array[index] syntax
for array objects.
It now takes a context pointer (opaque data) that holds the buffers it
uses for the temporary strings. If the context pointer is null, a static
context is used (making those uses of va NOT thread-safe). Most calls to
va use the static context, but all such calls have been formatted
consistently so they are easy to find when it comes time to do a full
audit.
The way I wound up using the field meant that exprctx should not "own"
the hashlinks chain, but rather just point to it. This fixes the nasty
access errors I had.
I want to be able to use name references, but that requires string
items, so anything that would normally be dictionary or array (or
binary, even) would also need to accept string. This seemed to be the
cleanest solution. Any custom parser would then need to check the type
and act appropriately, but any inappropriate types have already been
pre-filtered by the standard parsers.
Care needs to be taken to ensure the right function is used with the
right arguments, but with these, the need to use qconj(d|f) for a
one-off inverse rotation is removed.
I forgot to right-shift the value so offsets were becoming 0 or 8
instead of 0-15. This fixes the management of small objects. It turns
out that after this fix, qfvis's problems were caused by fragmentation
in the windings. Need to revisit line allocation and use POT-specialized
pools.
I think the sub-line allocator falling over is the final source of
qfvis's leaks. It certainly causes a mess of the sub-lines. But having
some tests to get working sure beats scratching my head over qfvis :)
They're binned by powers of two (with in between sizes going to the
smaller bin should I make cache-line allocations NPOT (which I think
might be worthwhile). However, there seems to still be a bug somewhere
causing a nasty leak as now my hacked qfvis consumes 40G in less than a
minute.
The idea is to not search through blocks for an available allocation.
While the goal was to speed up allocation of cache lines of varying
cluster sizes, it's not enough due to fragmentation.
They take advantage of gcc's vector_size attribute and so only cross,
dot, qmul, qvmul and qrot (create rotation quaternion from two vectors)
are needed at this stage as basic (per-component) math is supported
natively by gcc.
The provided functions work on horizontal (array-of-structs) data, ie a
vec4d_t or vec4f_t represents a single vector, or traditional vector
layout. Vertical layout (struct-of-arrays) does not need any special
functions as the regular math can be used to operate on four vectors at
a time.
Functions are provided for loading a vec4 from a vec3 (4th element set
to 0) and storing a vec4 into a vec3 (discarding the 4th element).
With this, QF will require AVX2 support (needed for vec4d_t). Without
support for doubles, SSE is possible, but may not be worthwhile for
horizontal data.
Fused-multiply-add is NOT used because it alters the results between
unoptimized and optimized code, resulting in -mfma really meaning
-mfast-math-anyway. I really do not want to have to debug issues that
occur only in optimized code.