Commit graph

35 commits

Author SHA1 Message Date
Bill Currie
8ff60b4603 [simd] Add unsigned vector types
Mostly as a convenience for working with very small arrays of unsigned
ints.
2023-06-15 09:36:50 +09:00
Bill Currie
ef2793010c [simd] Improve portability for aarch64
No need for compiler flags for 64-bit arm, and the few intrinsics that
are actually needed are in a different header.
2023-03-25 21:21:13 +09:00
Bill Currie
87f3c206f1 [simd] Remove some intrinsics uses
GCC does a nice enough job compiling the more readable form (though
admittedly, hadd is possibly more readable than what's there for
dot[fd], hadd is supposedly slower than the shuffles and adds, and qfvis
seems to support that).
2022-05-20 11:09:15 +09:00
Bill Currie
ef6e4722d5 [simd] Add some comments to mat4fquat
Having to refigure out what values are going into the vectors got old
very fast. The comments don't help with verifying the values, but at
least I can tell at a glance where 2(xy - wz) goes and thus determine
the "orientation" of the matrix.
2022-05-04 13:48:34 +09:00
Bill Currie
8ac53664d7 [simd] Rename VEC_TYPE to QF_VEC_TYPE
Makes it less likely to clash with anything (like qfcc's VEC_TYPE :P)
2022-04-29 16:59:55 +09:00
Bill Currie
7a6ca0ebcb [simd] Use portable swizzles
gcc and clang have rather different swizzle builtins, but both do a nice
job of optimizing the intuitive initializer swizzle (I think gcc 8(?)
didn't do such a good job thus my use of __builtin_shuffle).
2022-03-31 02:25:33 +09:00
Bill Currie
c467217b4b [simd] Use inttypes.h for long vector formats
Much nicer than cluttering up the headers with word-size checks.
2022-01-30 10:50:14 +09:00
Bill Currie
23613ca985 [simd] Get the new functions working on older hardware
In some cases, gcc-11 does a good enough job translating normal looking
C expressions so use just those, but other times need to dig around for
an appropriate intrinsic.

Also, now need to disable psapi warnings when compiling for anything
less than avx.
2022-01-07 11:48:28 +09:00
Bill Currie
80c5e2c3f6 [simd] Remove requirements for AVX2 for vec4d
It seems gcc-11 does a pretty good job of emulating the instructions (it
no longer requires avx2 for 256-bit wide vectors).
2022-01-06 18:06:56 +09:00
Bill Currie
63db48bf42 [simd] Add integral loadvec3 versions that set w to 1
Always setting w to 0 made it impossible to use the resulting 4d vectors
in division-based operations as they would result in divide-by-zero and
thus an unavoidable exception (CPUs don't like integer div-by-zero).
I'll probably add similar for float and double, but they're not as
critical as they'll just give inf or nan. This also increases my doubts
about the value of keeping 3d vector operations.
2022-01-04 18:23:32 +09:00
Bill Currie
09b029d82c [simd] Correct result for cmuld
I must have had quite the brain-fart when I wrote that. Yay for tests :)
2022-01-03 23:27:01 +09:00
Bill Currie
9084121ad2 [simd] Correct result for dot2f
It turns out gcc optimizes the obvious code nicely. It doesn't do so
well for cmul, but I decided to use obvious code anyway (the instruction
counts were the same, so maybe it doesn't get better for a single pair
of operands).
2022-01-03 23:27:01 +09:00
Bill Currie
0e1964bf74 [simd] Split out the ivec implementations
And add any/all/none functions.
2022-01-02 16:02:57 +09:00
Bill Currie
97034d9dde [simd] Add 2d vector types
For int, long, float and double. I've been meaning to add them for a
while, and they're part of the new Ruamoko instructions set (which is
progressing nicely).
2022-01-02 00:57:55 +09:00
Bill Currie
e062163aa4 [simd] Remove some intrinsics dependencies
Not sure why I used intrinsics at the time. Probably wasn't comfortable
getting gcc to do what I wanted.
2021-12-26 12:50:46 +09:00
D G Turner
b799d48ccb [simd] fix build when avx2 is not available, but avx is.
This failed with errors such as:
                 from ./include/QF/simd/vec4d.h:32,
                 from libs/util/simd.c:37:
./include/QF/simd/vec4d.h: In function ‘qmuld’:
/usr/lib/gcc/x86_64-pc-linux-gnu/10.3.0/include/avx2intrin.h:1049:1: error: inlining failed in call to ‘always_inline’ ‘_mm256_permute4x64_pd’: target specific option mismatch
 1049 | _mm256_permute4x64_pd (__m256d __X, const int __M)
2021-06-23 01:10:42 +01:00
Bill Currie
93167279fc Fix a bunch of issues found by gcc-11 2021-06-13 14:30:59 +09:00
Bill Currie
778c07e91f [util] Get vectors working for non-SSE archs
GCC does a fairly nice job of producing code for vector types when the
hardware doesn't support SIMD, but it seems to break certain math
optimization rules due to excess precision (?). Still, it works well
enough for the core engine, but may not be well suited to the tools.
However, so far, only qfvis uses vector types (and it's not tested yet),
and tools should probably be used on suitable machines anyway (not
forces, of course).
2021-06-01 18:53:53 +09:00
Bill Currie
cb7d2671d8 [simd] Make mmulf safe for src=dst
I guess I'd forgotten that the parameters are actually pointers.
2021-04-29 19:25:31 +09:00
Bill Currie
e6bc5e3e11 [simd] Add qexpf function 2021-04-25 15:02:08 +09:00
Bill Currie
9ac4cdc6bd [simd] Fix more portability issues
I had missed vec4d.h because it's mostly unused at this stage.
2021-04-02 23:25:14 +09:00
Bill Currie
b6ab832ed4 [simd] Add vabsf and some more tests 2021-03-28 19:49:43 +09:00
Bill Currie
29e029c792 [util] Add float a simd version of the SEB
And its support functions. I can't tell if it's any faster (mtwist_rand
is a significant chunk of the benchmark timings, oops), but it's nice to
have.
2021-03-27 23:38:10 +09:00
Bill Currie
8309e1852a [simd] Fix some portability issues
Use [u]int64_t instead of long, and fix some incorrect attribute usage
(I had misread the gcc docs at the time).
2021-03-27 20:04:10 +09:00
Bill Currie
5158cc5527 [util] Add normal and magnitude float vector functions 2021-03-19 11:09:57 +09:00
Bill Currie
5949753579 Make m3vmulf return v[3] unchanged 2021-03-10 19:40:19 +09:00
Bill Currie
09e1a63470 [util] Add a simd mat4 transpose function 2021-03-09 23:50:32 +09:00
Bill Currie
4a97bc3ba5 [util] Create simd quaternion to matrix function
This seems to be pretty close to as fast as it gets (might be able to do
better with some shuffles of the negation constants instead of loading
separate constants).
2021-03-04 17:45:10 +09:00
Bill Currie
45c0255643 [util] Add simd 4x4 matrix functions
Currently just add, subtract, multiply (m m and m v).
2021-03-03 16:34:16 +09:00
Bill Currie
9039c6975a [util] Clean up some missed vsqrt changes 2021-01-05 08:35:53 +09:00
Bill Currie
015cee7b6f [util] Add vector-quaternion shortcut functions
Care needs to be taken to ensure the right function is used with the
right arguments, but with these, the need to use qconj(d|f) for a
one-off inverse rotation is removed.
2021-01-02 10:44:45 +09:00
Bill Currie
7bf90e5f4a [util] Sort out implementation issues for simd 2021-01-02 09:55:59 +09:00
Bill Currie
3125009a7c [util] Add vector and quaternion types to cexpr
Although there's no distinction between the two at the C level, I think
it's probably best to separate them in a scripting language.
2020-12-30 18:20:11 +09:00
Bill Currie
1ddd57b09e [util] Add qconj, vtrunc, vceil and vfloor functions
I had forgotten these rather critical functions. Both double and float
versions are included.
2020-12-30 18:20:11 +09:00
Bill Currie
09a10f80e1 [util] Add basic SIMD implemented vector functions
They take advantage of gcc's vector_size attribute and so only cross,
dot, qmul, qvmul and qrot (create rotation quaternion from two vectors)
are needed at this stage as basic (per-component) math is supported
natively by gcc.

The provided functions work on horizontal (array-of-structs) data, ie a
vec4d_t or vec4f_t represents a single vector, or traditional vector
layout. Vertical layout (struct-of-arrays) does not need any special
functions as the regular math can be used to operate on four vectors at
a time.

Functions are provided for loading a vec4 from a vec3 (4th element set
to 0) and storing a vec4 into a vec3 (discarding the 4th element).

With this, QF will require AVX2 support (needed for vec4d_t). Without
support for doubles, SSE is possible, but may not be worthwhile for
horizontal data.

Fused-multiply-add is NOT used because it alters the results between
unoptimized and optimized code, resulting in -mfma really meaning
-mfast-math-anyway. I really do not want to have to debug issues that
occur only in optimized code.
2020-12-30 18:20:11 +09:00