Just head and tail are atomic, but it seems to work nicely (at least on
intel). I actually had more trouble with gcc (due to accidentally
testing lock-free with the wrong ring buffer... oops, but yup, gcc will
happily optimize your loop to spin really really fast). Also served as a
nice test for C11 threading.
clang doesn't like the same variable name being used in nested
expression statements, so give the "safety" variables in reused macros
semi-meaningful (based on macro name) tails to keep them separate.