- clamp scissors fully to avoid NVidia's awful drivers locking up the entire system if they end up out of bounds
- perform buffer clears as part of the render pass. this puts some restrictions on how FRenderState.Clear can be used
- add an offset uniform to the present shaders so the vulkan target can flip the image during presentation
* Colors can npw be defined per sidedef, not only per sector.
* Gradients can be selectively disabled or vertically flipped per wall tier.
* Gradients can be clamped to their respective tier, i.e top and bottom of the tier, not the front sector defines where it starts.
The per-wall colors are implemented for hardware and softpoly renderer only, but not for the classic software renderer, because its code is far too scattered to do this efficiently.
* disable the dynamic light code if no buffers are available
* added a duplicate of the getTexel function which cannot be patched without creating syntax problems.
* fixe int<->float conversion warning, which on some compilers may be an error.
Sadly, anything else makes no sense.
All the recently made changes live or die, depending on this extension's presence.
Without it, there are major performance issues with the buffer uploads. All of the traditional buffer upload methods are without exception horrendously slow, especially in the context of a Doom engine where frequent small updates are required.
It could be solved with a complete restructuring of the engine, of course, but that's hardly worth the effort, considering it's only for legacy hardware whose market share will inevitably shrink considerably over the next years.
And even then, under the best circumstances I'd still get the same performance as the old immediate mode renderer in GZDoom 1.x and still couldn't implement the additions I'd like to make.
So, since I need to keep GZDoom 1.x around anyway for older GL 2.x hardware, it may as well serve for 3.x hardware, too. It's certainly less work than constantly trying to find workarounds for the older hardware's limitations that cost more time than working on future-proofing the engine.
This new, trimmed down 4.x renderer runs on a core profile configuration and uses persistently mapped buffers for nearly everything that is getting transferred to the GPU. (The global uniforms are still being used as such but they'll be phased out after the first beta release.
- reactivate alpha testing per fixed function pipeline
- use the 'modern' way to define clip planes (GL_CLIP_DISTANCE). This is far more portable than the old glClipPlane method and a lot more robust than checking this in the fragment shader.
Due to the way the engine works it needs to render a lot of small primitives with frequent state changes.
But due to the performance of buffer uploads it is impossible to upload each primitive's vertices to a buffer separately because buffer uploads nearly always stall the GPU.
On the other hand, in order to reduce the amount of buffer uploads all the necessary state changes would have to be saved in an array until they can finally be used. This method also imposed an unacceptable overhead.
Fortunately, uploading uniform arrays is very fast and doesn't cause GPU stalls, so now the engine puts the vertex data per primitive into a uniform array and uses a static vertex buffer to index the array in the vertex shader.
This method offers the same performance as immediate mode but only uses core profile features.