Like everything else related to doing standard math with SSE2 vs. x87, there's nothing to be gained here with anything but first generation SSE2 systems which are irrelevant these days.
Taking 'thespir2.wad' from https://forum.zdoom.org/viewtopic.php?f=1&t=10655 the SSE2 version is reproducably ~3% slower than the x87 version on my Core i7, which quite closely mirrors all my previous tests since 2007.
Overall this just looks like an optimization not worth doing.
or not SSE2 is available at runtime. Since most of the time is spent in
ClassifyLine, using SSE2 in just this one function helps the most.
- Nodebuilding is a little faster if we inline PointOnSide.
- Changed FEventTree into a regular binary tree, since there just aren't enough
nodes inserted into it to make a red-black tree worthwhile.
- Added more checks at the start of ClassifyLine so that it has a better chance
of avoiding the more complicated checking, and it seems to have paid off with
a reasonably modest performance boost.
- Added a "vertex map" for ZDBSP's vertex selection. (Think BLOCKMAP for
vertices instead of lines.) On large maps, this can result in a very
significant speed up. (In one particular map, ZDBSP had previously
spent 40% of its time just scanning through all the vertices in the
map. Now the time it spends finding vertices is immeasurable.) On small maps,
this won't make much of a difference, because the number of vertices to search
was so small to begin with.
SVN r173 (trunk)