That's again one less write access to the buffer. The uniform method was chosen because this way a buffer update can be completely avoided, and setting a single uniform is a lot cheaper and simpler to handle.
This eliminates most behavioral differences for FFlatVertexBuffer between both operating modes, now the only difference is where the buffer is located.
Unfortunately the math behind the old clip planes is utterly impenetrable and so poorly documented that I have no idea how to set that up, so it is deactivated for now. It wasn't working anyway.
# Conflicts:
# src/CMakeLists.txt
# src/p_setup.cpp
# src/r_defs.h
# src/version.h
This only updates to a compileable state. The new portals are not yet functional in the hardware renderer because they require some refactoring in the data management first.
There was one issue preventing the previous 2.0 betas from running under GL 3.x: The lack of persistently mapped buffers.
For the dynamic light buffer today's changes take care of that problem.
For the vertex buffer there is no good workaround but we can use immediate mode render calls instead which have been reinstated.
To handle the current setup, the engine first tries to get a core profile context and checks for presence of GL 4.4 or the GL_ARB_buffer_storage extension.
If this fails the context is deleted again and a compatibility context retrieved which is then used for 'old style' rendering which does work on older GL versions.
This new version does not support GL 3.2 or lower, meaning that Intel GMA 3000 or lower is not supported. The reason for this is that the engine uses a few GL 3.3 features which are not present in the latest Intel driver.
In general the Intel GMA 3000 is far too weak, though, to run the demanding shader of GZDoom 2.x, so this is no real loss. Performance would be far from satisfying.
A command line option '-gl3' exists to force the fallback render path. On my Geforce 550Ti there's approx. 10% performance loss on this path.
Sadly, anything else makes no sense.
All the recently made changes live or die, depending on this extension's presence.
Without it, there are major performance issues with the buffer uploads. All of the traditional buffer upload methods are without exception horrendously slow, especially in the context of a Doom engine where frequent small updates are required.
It could be solved with a complete restructuring of the engine, of course, but that's hardly worth the effort, considering it's only for legacy hardware whose market share will inevitably shrink considerably over the next years.
And even then, under the best circumstances I'd still get the same performance as the old immediate mode renderer in GZDoom 1.x and still couldn't implement the additions I'd like to make.
So, since I need to keep GZDoom 1.x around anyway for older GL 2.x hardware, it may as well serve for 3.x hardware, too. It's certainly less work than constantly trying to find workarounds for the older hardware's limitations that cost more time than working on future-proofing the engine.
This new, trimmed down 4.x renderer runs on a core profile configuration and uses persistently mapped buffers for nearly everything that is getting transferred to the GPU. (The global uniforms are still being used as such but they'll be phased out after the first beta release.
- reactivate alpha testing per fixed function pipeline
- use the 'modern' way to define clip planes (GL_CLIP_DISTANCE). This is far more portable than the old glClipPlane method and a lot more robust than checking this in the fragment shader.
Due to the way the engine works it needs to render a lot of small primitives with frequent state changes.
But due to the performance of buffer uploads it is impossible to upload each primitive's vertices to a buffer separately because buffer uploads nearly always stall the GPU.
On the other hand, in order to reduce the amount of buffer uploads all the necessary state changes would have to be saved in an array until they can finally be used. This method also imposed an unacceptable overhead.
Fortunately, uploading uniform arrays is very fast and doesn't cause GPU stalls, so now the engine puts the vertex data per primitive into a uniform array and uses a static vertex buffer to index the array in the vertex shader.
This method offers the same performance as immediate mode but only uses core profile features.
After thinking about it for a day or so I believe it's the best option to remove all compatibility code because it's a major obstacle for a transition to a core profile.
This means it won't work anymore on anything that doesn't support OpenGL 4.0, but I don't think this is a problem. On older NVidia cards performance gains could not be seen and on older AMDs using the vertex buffer was even worse as long as it got mixed with immediate mode rendering.
- replaced GLUs texture scaling with our own function. This is only used to scale down textures larger than what the hardware can handle so we do not need a dependency to an essentially deprecated library for it.