dvec4, lvec4 and ulvec4 need to be aligned to 8 words (32 bytes) in
order to avoid hardware exceptions. Rather than dealing with possibly
mixed alignment when a function has 8-word aligned locals but only
4-word aligned parameters, simply keep the stack frame 8-word aligned at
all times.
As for sizes, the temp def recycler was written before the Ruamoko ISA
was even a pipe dream and thus never expected temp def sizes over 4. At
least now any future adjustments can be done in one place.
My quick and dirty test program works :)
dvec4 xy = {1d, 2d, 0d, 0.5};
void printf(string fmt, ...) = #0;
int main()
{
dvec4 u = {3, 4, 3.14};
dvec4 v = {3, 4, 0, 1};
dvec4 w = v * xy + u;
printf ("[%g, %g, %g, %g]\n", w[0], w[1], w[2], w[3]);
return 0;
}