While the main bulk of the improvement (36s down from 42s for
gmsp3v2.bsp on my i7-6850K) comes from using a high-tide allocator for
the windings (which necessitated using a fixed size), it is ever so
slightly faster than using malloc as the back-end.
This is for the conversion /to/ paletted textures. The conversion is
necessary for csqc support. In the process, the conversion has been sped up
by implementing a color cache for the conversion process. I haven't
measured the difference yet, but Mr Fixit does seem to load much faster for
the sw renderer than it did before the change (many months old memory).
This separate the FOV calculations from other refdef calcs, cleaning up the
renderer proper and making it easier for other parts of the engine (eg,
csqc) to update the fov.
The server edict arrays are now stored outside of progs memory, only the
entity data itself (ie data accessible to progs via ent.fld) is stored in
progs memory. Many of the changes were due to code accessing edicts and
entity fields directly rather than through the provided macros.
Loading is broken for multi-file image sets due to the way images are
loaded (this needs some thought for making it effecient), but the
Blender environment map loading works.
They're unlit (fullbright, but that's nothing new for quake), but
working nicely. As a bonus, sort out the sky pass (forced to due to the
way command buffers are used).
There were actually several problems: translucency wasn't using or
depending on the depth buffer, and the depth buffer wasn't marked as
read-only in the g-buffer pass. Getting that correct seems to have given
bigass1 a 0.5% boost (hard to say, could be the usual noise).
That was... easier than expected. A little more tedious that I would
have liked, but my scripting system isn't perfect (I suspect it's best
suited as the output of a code generator), and the C side could do with
a little more automation.
Other than dealing with shader data alignment issues, that went well :).
Nicely, the implementation gets the explicit scaling out of the shader,
and allows for a directional flag.
I never liked that some of the macros needed the type as a parameter
(yay typeof and __auto_type) or those that returned a value hid the
return statement so they couldn't be used in assignments.
Still "some" more to go: a pile to do with transforms and temporary
entities, and a nasty one with host_cbuf. There's also all the static
block-alloc lists :/
Light styles and shadows aren't implemented yet.
The map's entities are used to create the lights, and the PVS used to
determine which lights might be visible (ie, the surfaces they light).
That could do with some more improvements (eg, checking if a leaf is
outside a spotlight's cone), but the concept seems to work.
Double benefit, actually: faster when building a fat PVS (don't need to
copy as much) and can be used in multiple threads. Also, default visiblity
can be set, and the buffer size has its own macro.
Useful for avoiding a pile of wrapper functions that merely pass on
command-specific data to the actual implementation. Used to clean up the
wrappers in nq and qw cl_input.c
This is the first step towards component-based entities.
There's still some transform-related stuff in the struct that needs to
be moved, but it's all entirely client related (rather than renderer)
and will probably go into a "client" component. Also, the current
components are directly included structs rather than references as I
didn't want to deal with the object management at this stage.
As part of the process (because transforms use simd) this also starts
the process of moving QF to using simd for vectors and matrices. There's
now a mess of simd and sisd code mixed together, but it works
surprisingly well together.
The plan is to have a fully component based entity system. This adds
hierarchical transforms. Not particularly useful for quake itself at
this stage, but it will allow for much more flexibility later on,
especially when QuakeForge becomes more general-purpose.
This seems to be pretty close to as fast as it gets (might be able to do
better with some shuffles of the negation constants instead of loading
separate constants).
It's not used yet as work needs to be done to better support generic
entities, but this is the next step to real-time lighting (though, to be
honest, I expect it will be too slow to be usable).
The main purpose is to allow fluent-style:
const char *targetname = PL_String (PL_ObjectForKey (entity, "targetname"));
if (targetname && !PL_ObjectForKey (targets, targetname)) {
PL_D_AddObject (targets, targetname, entity);
}
[note: the above is iffy due to ownership of entity, but the code from
which the above comes works around the issue]
Static lights are yet to come (so the screen is black most of the time),
but dynamic lights work very nicely (and look very good) despite the
falloff being incorrect.
While I could reconstruct the position from the screen coords and depth,
this is easier and good enough for now. Reconstruction is an
optimization thing.
Lighting doesn't actually do lights yet, but it's producing pixels.
Translucent seems to be working (2d draw uses it), and compose seems to
be working.
After getting lights even vaguely working for alias models, I realized
that it just wasn't going to be feasible to do nice lighting with
forward rendering. This gets the bulk of the work done for deferred
rendering, but still need to sort out the shaders before any real
testing can be done.
The order in which keys are added to the dictionary object is
maintained. Adding a key after removing an old key adds the new key to
the end of the list rather than reusing the old key's spot.
PL_ParseLabeledArray works the same way as PL_ParseArray, but instead
takes a dictionary object. The keys of the items are ignored, and the
order is not preserved (at this stage), but this is a cleaner solution
to getting an array of objects when the definitions of those objects
need to be accessible by name as well.
It turns out I had conflated frame buffers with frames and wound up
making a minor mess when separating the number of frames the renderer
could have in flight from the number of swap-chain images. This is the
first step towards correcting that mistake.
It's not entirely there yet, but the basics are working. Work is still
needed for avoiding duplication of objects (different threads will have
different contexts and thus different tables, so necessary per-thread
duplication should not become a problem) and general access to arbitrary
fields (mostly just parsing the strings)
This allows plist objects to be accessed directly from cexpr expressions
using struct.field syntax for dictionary objects and array[index] syntax
for array objects.
The node struct was 72 bytes thus two cache line. Moving the pointer
into the brush model data block allows nodes to fit in a single cache
line (not that they're aligned yet, but that's next). It doesn't seem to
have made any difference to performance (at least in the vulkan
renderer), but it hasn't hurt, either, as the only place that needed the
parent pointer was R_MarkLeaves.
It's not quite as expected, but that may be due to one of msaa, the 0-15
range in the palette not being all the way to white, the color gradients
being not quite linear (haven't checked yet) or some combination of the
above. However, it's that what should be yellow is more green. At least
the zombies are no longer white and the ogres don't look like they're
wearing skeleton suits.
Doesn't seem to make much difference performance-wise, but speed does
seem to be fill-rate limited due to the 8x msaa. Still, it does mean
fewer bindings to worry about.
This is a big step towards a cleaner api. The struct reference in
model_t really should be a pointer, but bsp submodel(?) loading messed
that up, though that's just a matter of taking more care in the loading
code. It seems sensible to make that a separate step.
I've decided that alias model skins should be in a single four-level
array texture rather than spread over four textures, but there's no way
I want to write that code again: getting it right was hard enough the
first time :P
It now takes a context pointer (opaque data) that holds the buffers it
uses for the temporary strings. If the context pointer is null, a static
context is used (making those uses of va NOT thread-safe). Most calls to
va use the static context, but all such calls have been formatted
consistently so they are easy to find when it comes time to do a full
audit.
It's a tad bogus as it's the lights close to the camera, but it should
at least be a good start once things are working. There's currently
something very wrong with the state of things.
The dynamic array macros made this much easier than last time I looked
at it, especially when it came to figuring out the bad memory accesses
that I seem to remember from my last attempt 9 years ago.
This makes tex_t more generally useable and probably more portable. The
goal was to be able to use tex_t with data that is in a separate chunk
of memory.