Like everything else related to doing standard math with SSE2 vs. x87, there's nothing to be gained here with anything but first generation SSE2 systems which are irrelevant these days.
Taking 'thespir2.wad' from https://forum.zdoom.org/viewtopic.php?f=1&t=10655 the SSE2 version is reproducably ~3% slower than the x87 version on my Core i7, which quite closely mirrors all my previous tests since 2007.
Overall this just looks like an optimization not worth doing.
gl/data/gl_setup.cpp:430:11: warning: comparison of integers of different signs: 'int' and 'unsigned int' [-Wsign-compare]
gl/data/gl_setup.cpp:527:19: warning: comparison of integers of different signs: 'int' and 'unsigned int' [-Wsign-compare]
gl/data/gl_setup.cpp:542:19: warning: comparison of integers of different signs: 'int' and 'unsigned int' [-Wsign-compare]
nodebuild.cpp:1056:63: warning: format specifies type 'ptrdiff_t' (aka 'long') but the argument has type 'int' [-Wformat]
p_glnodes.cpp:379:50: warning: comparison of integers of different signs: 'int' and 'unsigned int' [-Wsign-compare]
p_saveg.cpp:381:18: warning: comparison of integers of different signs: 'int' and 'unsigned int' [-Wsign-compare]
p_scroll.cpp:532:11: warning: comparison of integers of different signs: 'int' and 'unsigned int' [-Wsign-compare]
p_setup.cpp:2304:43: warning: format specifies type 'int' but the argument has type 'unsigned long' [-Wformat]
p_setup.cpp:2302:12: warning: comparison of integers of different signs: 'int' and 'unsigned long' [-Wsign-compare]
scripting/codegeneration/codegen.cpp:8488:20: warning: comparison of integers of different signs: 'int' and 'size_t' (aka 'unsigned long') [-Wsign-compare]
scripting/codegeneration/codegen.cpp:8606:15: warning: comparison of integers of different signs: 'int' and 'size_t' (aka 'unsigned long') [-Wsign-compare]
The only reason this even existed was that ZDoom's original VC projects used __fastcall. The CMake generated project do not, they stick to __cdecl.
Since no performance gain can be seen by using __fastcall the best course of action is to just remove all traces of it from the source and forget that it ever existed.
- Added additional debug spew for the nodebuilder.
- Restore the nodebuilder's debug spew that was present in ZDBSP but not the internal version.
Use the CRT's printf for this output to ensure that it is identical to ZDBSP's output for the
same input.
SVN r3980 (trunk)
- The stat meters now return an FString instead of sprintfing into a fixed
output buffer.
- NOASM is now automatically defined when compiling for a non-x86 target.
- Some changes have been made to the integral types in doomtype.h:
- For consistancy with the other integral types, byte is no longer a
synonym for BYTE.
- Most uses of BOOL have been change to the standard C++ bool type. Those
that weren't were changed to INTBOOL to indicate they may contain values
other than 0 or 1 but are still used as a boolean.
- Compiler-provided types with explicit bit sizes are now used. In
particular, DWORD is no longer a long so it will work with both 64-bit
Windows and Linux.
- Since some files need to include Windows headers, uint32 is a synonym
for the non-Windows version of DWORD.
- Removed d_textur.h. The pic_t struct it defined was used nowhere, and that
was all it contained.
SVN r326 (trunk)
or not SSE2 is available at runtime. Since most of the time is spent in
ClassifyLine, using SSE2 in just this one function helps the most.
- Nodebuilding is a little faster if we inline PointOnSide.
- Changed FEventTree into a regular binary tree, since there just aren't enough
nodes inserted into it to make a red-black tree worthwhile.
- Added more checks at the start of ClassifyLine so that it has a better chance
of avoiding the more complicated checking, and it seems to have paid off with
a reasonably modest performance boost.
- Added a "vertex map" for ZDBSP's vertex selection. (Think BLOCKMAP for
vertices instead of lines.) On large maps, this can result in a very
significant speed up. (In one particular map, ZDBSP had previously
spent 40% of its time just scanning through all the vertices in the
map. Now the time it spends finding vertices is immeasurable.) On small maps,
this won't make much of a difference, because the number of vertices to search
was so small to begin with.
SVN r173 (trunk)
- Added code to explicitly handle outputting overlapping segs when
building GL nodes with ZDBSP, removing the check that discarded
them early on.
- AddIntersection() should convert to doubles before subtracting the vertex
from the node, not after, to avoid integer overflow. (See cah.wad, MAP12
and MAP13.) A simpler dot product will also suffice for distance calculation.
- Splitters that come too close to a vertex should be avoided. (See cata.wad.)
- Red-Black Tree implementation was broken and colored every node red.
- Moved most of the code for outputting degenerate GL subsectors into another
function.
SVN r160 (trunk)
- Changed f_finale.cpp/atkstates[] into a static variable, since its
anonymous type prevents it from being accessed from other files anyway.
- Fixed: The behavior of the eventtail advancement in d_net.cpp/CheckAbort()
was compiler-dependant.
- Fixed warnings GCC 4 threw up while compiling re2c and lemon.
- Removed __cdecl from makewad.c again. This is already defined as a builtin
for MinGW, and redefining it produces a warning. (Why is main explicitly
declared __cdecl anyway?)
- Fixed building ccdv-win32 with GCC 4. GCC 4 creates a memcpy call, which
won't work because it doesn't get linked with the standard C library.
SVN r135 (trunk)