After messing with SIMD stuff for a little, I think I now understand why
the industry went with xyzw instead of the mathematical wxyz. Anyway, this
will make for less pain in the future (assuming I got everything).
I got the idea from blender when I discovered by accident that quat * vect
produces the same result as quat * qvect * quat* and looked up the code to
check what was going on. While matrix/vector multiplication still beats the
pants off quaternion/vector multiplication, QuatMultVec is a slight
optimization over quat * qvect * quat* (17+,24* vs 24+,32*, plus no need to
to generate quat*).