the VM compiler uses SSE for floating-point ops when possible
aside from the speed improvements, this also makes for nicer code in the renderer interaction with libjpeg, thanks to mem_dest support etc