Like everything else related to doing standard math with SSE2 vs. x87, there's nothing to be gained here with anything but first generation SSE2 systems which are irrelevant these days.
Taking 'thespir2.wad' from https://forum.zdoom.org/viewtopic.php?f=1&t=10655 the SSE2 version is reproducably ~3% slower than the x87 version on my Core i7, which quite closely mirrors all my previous tests since 2007.
Overall this just looks like an optimization not worth doing.
- got rid of glsegextras.
This was probably one of the most ill-conceived means to save some memory in ZDoom, but now, when a pure software rendered engine no longer needs to be considered it's just totally useless to keep this mess in.
The only reason this even existed was that ZDoom's original VC projects used __fastcall. The CMake generated project do not, they stick to __cdecl.
Since no performance gain can be seen by using __fastcall the best course of action is to just remove all traces of it from the source and forget that it ever existed.
- Added additional debug spew for the nodebuilder.
- Restore the nodebuilder's debug spew that was present in ZDBSP but not the internal version.
Use the CRT's printf for this output to ensure that it is identical to ZDBSP's output for the
same input.
SVN r3980 (trunk)
warnings. At first, I was going to try and clean them all up. Then I decided
that was a worthless cause and went about just acting on the ones that
might actually be helpful:
C4189 (local variable is initialized but not referenced)
C4702 (unreachable code)
C4512 (assignment operator could not be generated)
SVN r420 (trunk)
or not SSE2 is available at runtime. Since most of the time is spent in
ClassifyLine, using SSE2 in just this one function helps the most.
- Nodebuilding is a little faster if we inline PointOnSide.
- Changed FEventTree into a regular binary tree, since there just aren't enough
nodes inserted into it to make a red-black tree worthwhile.
- Added more checks at the start of ClassifyLine so that it has a better chance
of avoiding the more complicated checking, and it seems to have paid off with
a reasonably modest performance boost.
- Added a "vertex map" for ZDBSP's vertex selection. (Think BLOCKMAP for
vertices instead of lines.) On large maps, this can result in a very
significant speed up. (In one particular map, ZDBSP had previously
spent 40% of its time just scanning through all the vertices in the
map. Now the time it spends finding vertices is immeasurable.) On small maps,
this won't make much of a difference, because the number of vertices to search
was so small to begin with.
SVN r173 (trunk)
- Added code to explicitly handle outputting overlapping segs when
building GL nodes with ZDBSP, removing the check that discarded
them early on.
- AddIntersection() should convert to doubles before subtracting the vertex
from the node, not after, to avoid integer overflow. (See cah.wad, MAP12
and MAP13.) A simpler dot product will also suffice for distance calculation.
- Splitters that come too close to a vertex should be avoided. (See cata.wad.)
- Red-Black Tree implementation was broken and colored every node red.
- Moved most of the code for outputting degenerate GL subsectors into another
function.
SVN r160 (trunk)