mirror of
https://github.com/dhewm/dhewm3.git
synced 2024-11-26 14:21:18 +00:00
Fix dmap bug if compiled with GCC and x86 FMA enabled
Only happend if `ONATIVE` was enabled (or some other flag was set that enables the FMA extension), the root cause was that the cross product didn't return 0 when it should, but a small value < 0. Caused some faces to be missing in maps compiled with dmap. https://github.com/RobertBeckebans/RBDOOM-3-BFG/issues/436#issuecomment-851061826 has lots of explanation. I think this is a compiler bug, this commit works around it. fixes #147
This commit is contained in:
parent
f2731821a9
commit
320c15f63a
1 changed files with 26 additions and 0 deletions
|
@ -621,6 +621,27 @@ ID_INLINE bool idVec3::FixDenormals( void ) {
|
|||
return denormal;
|
||||
}
|
||||
|
||||
#if defined(__GNUC__) && !defined(__clang__) && defined(__OPTIMIZE__) && (defined(__FP_FAST_FMA) || defined(__FP_FAST_FMAF))
|
||||
// DG: If the x86/x86_64 FMA extension is enabled and -O2 or higher is used, GCC miscompiles the
|
||||
// cross-product in a ways that even v.Cross(v) isn't exactly (0, 0, 0).
|
||||
// This happens because it uses vfmsub* for the first part of a.x*b.y - (a.y*b.x)
|
||||
// (and normal multiplication for the a.y*b.x part), which uses an intermediate with
|
||||
// "infinite precision" for the multiplication part, while the (a.y*b.x) part was previously
|
||||
// stored with single precision.
|
||||
// So if the result should be 0 we instead get the rounding error between double(?) and
|
||||
// single-precision float results of the multiplications.
|
||||
// This caused problems with dmap's PointInTri().
|
||||
// The specific flag (implied by -O2) that triggers this behavior is -fexpensive-optimizations,
|
||||
// so as a workaround disable that flag for the cross product functions.
|
||||
// Not limiting this to x86 in case GCC does similar things with other CPUs FMA operations.
|
||||
// Not doing it for clang because there apparently this optimization is only done
|
||||
// with -ffast-math which we don't use. Only doing this if some kind of "fast" FMA is available
|
||||
// because I assume that otherwise GCC won't try this optimization
|
||||
|
||||
#pragma GCC push_options
|
||||
#pragma GCC optimize("-fno-expensive-optimizations")
|
||||
#endif
|
||||
|
||||
ID_INLINE idVec3 idVec3::Cross( const idVec3 &a ) const {
|
||||
return idVec3( y * a.z - z * a.y, z * a.x - x * a.z, x * a.y - y * a.x );
|
||||
}
|
||||
|
@ -633,6 +654,11 @@ ID_INLINE idVec3 &idVec3::Cross( const idVec3 &a, const idVec3 &b ) {
|
|||
return *this;
|
||||
}
|
||||
|
||||
#if defined(__GNUC__) && !defined(__clang__) && defined(__OPTIMIZE__) && (defined(__FP_FAST_FMA) || defined(__FP_FAST_FMAF))
|
||||
// DG: restore normal optimization settings
|
||||
#pragma GCC pop_options
|
||||
#endif
|
||||
|
||||
ID_INLINE float idVec3::Length( void ) const {
|
||||
return ( float )idMath::Sqrt( x * x + y * y + z * z );
|
||||
}
|
||||
|
|
Loading…
Reference in a new issue