This seems to be pretty close to as fast as it gets (might be able to do better with some shuffles of the negation constants instead of loading separate constants).
Currently just add, subtract, multiply (m m and m v).