I guess it's kind of UB, but it's handy for images that will be
conditionally written by the GPU but need to be in shader-read-only for
draw calls and the validation layers can't tell that the layers won't be
used.
It's a bit flaky for particles, especially at higher frame rates, but
that's due to supporting only 64 overlapping pixels. A reasonable
solution is probably switching to a priority heap for the "sort" and
upping the limit.
Mostly, this gets the stage flags in with the barrier, but also adds a
couple more barrier templates. It should make for slightly less verbose
code, and one less opportunity for error (mismatched barrier/stages).