Drawing sky chains first ensures that sky surfaces correctly block parts
of the map that should not be visible (by writing the correct depth to
the depth buffer when doing box or dome skies). Writing brush models
first means that the models (ammo boxes etc) could be visible when they
should not be.
While there's currently only the one still, this will allow the entities
to be multiply queued for multi-pass rendering (eg, shadows). As the
avoidance of putting an entity in the same queue more than once relies
on the entity id, all entities now come from the scene (which is stored
in cl_world in the client code for nq and qw), thus the extensive
changes in the clients.
GL and GLSL were drawing the view model after particles instead of
before. For GL, this is likely due to avoiding fog affecting the view
model (which I think is not the right thing to do), and GLSL due to
copying GL (because I had no idea at the time). This makes the two
renderers consistent with the software renderers, and might even speed
things up a little as that's one less set of blends to do when the
particles are covered by the view model (I don't expect much
difference).
While I doubt the difference is all that significant, this should speed
up entity rendering because it cuts out a lot of branching, and
eliminates scanning the same list multiple times only to not do anything
for large chunks of the list.
The actual view and projection matrices are now consistent with vulkan,
with the vulkan-gl disparity moved into adjustment matrices. The goal is
to allow the same camera data and code to be used in all renderers. The
extra matrix multiplication shouldn't be too expensive as it occurs only
when the field of view (not often, under user control) or near and far
clip distances (very rarely) change.
While both matrices had positive determinants in the first place, I find
the projection matrix easier to understand without all the negatives,
and having quake-x/vulkan-z positively parallel in the z-up matrix makes
that a lot easier to think about.
Regardless of whether the sky is spinning or not, the matrix needs to be
updated with the current origin in order to get the direction vector
right in the shader. Also, it's in the update that the required x-y
plane rotation gets in so the skies move in the correct direction.
This is the bulk of the work for recording the resource pointer with
with builtin data. I don't know how much of a difference it makes for
most things, but it's probably pretty big for qwaq-curses due to the
very high number of calls to the curses builtins.
Closes#26
This is part of the work for #26 (Record resource pointer with builtin
function data). Currently, the data pointer gets as far as the
per-instance VM function table (I don't feel like tackling the job of
converting all the builtin functions tonight). All the builtin modules
that register a resources data block pass that block on to
PR_RegisterBuiltins.
This will make it possible for the engine to set up their parameter
pointers when running Ruamoko progs. At this stage, it doesn't matter
*too* much, except for varargs functions, because no builtin yet takes
anything larger than a float quaternion, but it will be critical when
double or long vec3 and vec4 values are passed.
long is ignored for double, and v6p progs are stuck with 32 bits for
longs (don't feel like extending v6p any further), but the basics are
there for Ruamoko.
short is ignored for ints because the minimum size is 32, and signed is
just noise for ints anyway (and no chars, so...).
unsigned, however, is finally implemented properly (or at least seems to
be working correctly: tests pass after getting things compiling again,
and lt.u is used where it should be :)
And other related fields so integer is now int (and uinteger is uint). I
really don't know why I went with integer in the first place, but this
will make using macros easier for dealing with types.
Forgetting to invoke [super dealloc] in a derived class's -dealloc
method has caused me to waste far too much time chasing down the
resulting memory leaks and crashes. This is actually the main focus of
issue #24, but I want to take care of multiple paths before I consider
the issue to be done.
However, as a bonus, four cases were found :)
This takes care of the global variables to a point (there is still the
global struct shared between the non-vulkan renderers), but it also
takes care of glsl's points-only rendering.
After yesterday's crazy marathon editing all the particles files, and
starting to do another big change to them today, I realized that I
really do need to merge them down. All the actual spawning is now in the
client library (though particle insertion will need to be moved). GLSL
particle rendering is semi-broken in that it now does only points (until
I come up with a way to select between points and quads (probably a
context object, which I need anyway for Vulkan)).
This may seem a little contradictory, but it's due to the difference
between a high level (engine) render pass and a Vulkan render pass
object (and quite likely a poor choice in names for the high level
object). This is necessary for supporting compute shader dispatches as
they cannot be submitted inside a Vulkan render pass.
This has the advantage of getting entity_t out of the particle system,
and much easier to read math. Also, it served as a nice test for my
particle physics shaders (implemented the ideas in C). There's a lot of
code that needs merging down: all but the actual drawing can be merged.
There's some weirdness with color ramps, but I'll look into that later.
They should increment by one for each pic, not 4 (I think some fluff
remaining from copying glsl's draw code).
I noticed the problem when I saw large gaps of 0s in the vertex data in
renderdoc.
This gets the crosshair working in Vulkan (next commit) and fixes issues
with changing the palette (though I've never seen a different palette
for quate, there's still the change from "all black" to an actual
palette).
This means color, emission, and translucent. Fixes the HOM issues on my
VersaPro (but halves the frame-rate... definitely need to bring back the
forward renderer as an option).
This gets the pipelines loaded (and unloaded on shutdown). Probably the
easy part :P. Still need to sort out the command buffers,
synchronization, and particle generation (and probably a bunch else
that's not coming to mind).
This needed changing Vulkan_CreatePipeline to
Vulkan_CreateGraphicsPipeline for consistency (and parsing the
difference from a plist seemed... not worth thinking about).
It turned out the bindless approach wouldn't work too well for my design
of the sprite objects, but I don't think that's a big issue at this
stage (and it seems bindless is causing problems for brush/alias
rendering via renderdoc and on my versa pro). However, I have figured
out how to make effective use of descriptor sets (finally :P).
The actual normal still needs checking, but the sprites are currently
unlit so not an issue at this stage.
I'm not at all sure what I was thinking when I designed it, but I
certainly designed it wrong (to the point of being fairly useless). It
turns out memory requirements are already aligned in size (so just
multiplying is fine), and what I really wanted was to get the next
offset aligned to the given requirements.
This adds the shaders and the pipeline specs. I'm not sure that the
deferred rendering side of the render pass is appropriate, but I thought
I'd give it a go, since quake sprites are really cutoff rather than
translucent.
With the switch to multi-layer textures for brush models, the bsp and
alias texture descriptor sets became identical and thus the definitions
shareable. However, due to complications I don't want to address yet,
they're still separately identified, but I should be able to use the
texture set for most, if not all, pipelines.
The vertices and frame images are loaded into the one memory object,
with the vertices first followed by the images.
The vertices are 2D xy+uv sets meant to be applied to the model
transform frame, and are pre-computed for the sprite size (this part
does support sprites with varying frame image sizes).
The frame images are loaded into one image with each frame on its own
layer. This will cause some problems if any sprites with varying frame
image sizes are found, but the three sprites in quake are all uniform
size.
As much as it can be since the texture data is interleaved with the
model data in the files (I guess not that bad a design for 25 years ago
with the tight memory constraints), but this paves the way for
supporting sprites in Vulkan.
As the sw renderer's implementation was the closest to id's, it was used
as the model (thus a fair bit of cleanup is still needed). This fixes
some incorrect implementations in glsl and gl.
I'd forgotten (when doing the original brush texture loader) that
turbulent surfaces were unlit and thus always full-bright, then never
wrote the turb shader to take care of it. The best solution seems to be
to just mix the two colors in the shader as it will allow turb surfaces
to be lit in the future (probably with severely limited light counts due
to being a forward renderer).
This gets the alias pipeline in line with the bsp pipeline, and thus
everything is about as functional as it was before the rework (minus
dealing with large texture sets).
I guess it's not quite bindless as the texture index is a push constant,
but it seems to work well (and I may have fixed some full-bright issues
by accident, though I suspect that's just my imagination, but they do
look good).
This should fix the horrid frame rate dependent behavior of the view
model.
They are also in their own descriptor set so they can be easily shared
between pipelines. This has been verified to work for Draw.
BSP textures are now two-layered with the albedo and emission in the two
layers rather than two separate images. While this does increase memory
usage for the textures themselves (most do not have fullbright pixels),
it cuts down on image and image view handles (and shader resources).
Smashing everything in the process :P (need to work on the C side).
However, while bindless is supposedly good for performance, the biggest
gain this will bring is portability: the texture counts are
automatically limited to what the hardware can handle, and the reliance
on push descriptors is removed (though they were nice and did help get
things up and running).
I had forgotten that the parameters are in reverse order, and even if I
had remembered, I forgot to reset offset before the second loop.
Pre-decrementing offset takes care of both issues at once.
My VersaPro doesn't support more than 32 per-stage samplers (lavapipe).
This is a small part of getting Vulkan to run on lavapipe and even in
itself is rather incomplete.
Fixes the warning about parse_fixed_array not being used (oops, the
problem with partial commits), but more importantly, gives access to
things like maxDescriptorSetSamplers.
This will make property list expressions easier to work with. The
library is rather limited right now (trig, dot, min/max/bound) but even
just min adds a lot of functionality.
I want to support reading VkPhysicalDeviceLimits but it has some arrays.
While I don't need to parse them (VkPhysicalDeviceLimits should be
treated as read-only), I do need to be able to access them in property
list expressions, and vkgen generates the cexpr type descriptors too.
However, I will probably want to parse arrays some time in the future.
This ensures that unused parser blocks do not get emitted. In the
testing of the upcoming support for fixed arrays, the blend color
constants were being double emitted (both as custom and normal parser)
due to being an array. gcc did not like that (what with all those
warning flags).
Multiple render passes are needed for supporting shadow mapping, and
this is a huge step towards breaking the Vulkan render free of Quake,
and hopefully will lead the way for breaking the GL renderers free as
well.
This is actually a better solution to the renderer directly accessing
client code than provided by 7e078c7f9c.
Essentially, V_RenderView should not have been calling R_RenderView, and
CL_UpdateScreen should have been calling V_RenderView directly. The
issue was that the renderers expected the world entity model to be valid
at all times. Now, R_RenderView checks the world entity model's validity
and immediately bails if it is not, and R_ClearState (which is called
whenever the client disconnects and thus no longer has a world to
render) clears the world entity model. Thus R_RenderView can (and is)
now called unconditionally from within the renderer, simplifying
renderer-specific variants.
While using binary data objects for specialization data works for bools
(as they can be 0 or -1), they don't work so well for numeric values due
to having to get the byte order correct and thus are not portable, and
difficult to get right.
Binary data is still supported, but the data can be written as a string
with an array(...) "constructor" expression taking any number of
parameters, with each parameter itself being an expression (though
values are limited at this stage).
Due to the plist format, quotes are required around the expression
("array(...)")
Sets never shrink, so assigning a dynamically created set to a
statically created set after the working size has reduced (going from
demo2 to demo3) causes the set code to attempt to resize the statically
created set, which leads to libc having a bad time.
Why nvidia's drivers accepted double-destroyed framebuffers is beyond
me, but this fixes the Intel drivers complaining about such (and the
subsequent segfault).
When I changed the matrices from an array of floats to an array of
vec4f_t, I forgot to update the flush offsets. Yay for having a
Vulkan-capable Intel device with its different alignment requirements.
When allocating memory for multiple objects that have alignment
requirements, it gets tedious keeping track of the offset and the
alignment. This is a simple function for walking the offset respecting
size and alignment requirements, and doubles as a size calculator.
The stack is arbitrary strings that the validation layer debug callback
prints in reverse order after each message. This makes it easy to work
out what nodes in a pipeline/render pass plist are causing validation
errors. Still have to narrow down the actual line, but the messages seem
to help with that.
Putting qfvPushDebug/qfvPopDebug around other calls to vulkan should
help out a lot, tool.
As a bonus, the stack is printed before debug_breakpoint is called, so
it's immediately visible in gdb.
Rather than just 0/1, it now acts as flags to control what messages are
printed. In addition to the Vulkan enum names (long and short), none and
all are supported (as well as raw numbers, but they're not checked for
validity). This makes vulkan_use_validation a bit easier to use and less
verbose by default.
Now, if only it was easier to remember the name :P
Well... it could be done better, but this works for now assuming it's in
/usr/include (and it's correct for mxe builts). Does need proper
autoconfiscation, though.
For now, the functions check for a null hunk pointer and use the global
hunk (initialized via Memory_Init) if necessary. However, Hunk_Init is
available (and used by Memory_Init) to create a hunk from any arbitrary
memory block. So long as that block is 64-byte aligned, allocations
within the hunk will remain 64-byte aligned.
The fact that numleafs did not include leaf 0 actually caused in many
places due to never being sure whether to add 1. Hopefully this fixes
some of the confusion. (and that comment in sv_init didn't last long :P)
Modern maps can have many more leafs (eg, ad_tears has 98983 leafs).
Using set_t makes dynamic leaf counts easy to support and the code much
easier to read (though set_is_member and the iterators are a little
slower). The main thing to watch out for is the novis set and the set
returned by Mod_LeafPVS never shrink, and may have excess elements (ie,
indicate that nonexistent leafs are visible).
-999999 seems to be a hold-over from the software renderer passed
through both gl renderers. I guess it didn't matter in the gl renderers
due to various draw hacks, but it made quite a difference in vulkan.
Fixes the view model covering the hud.
Quake just looked wrong without the view model. I can't say I like the
way the depth range is hacked, but it was necessary because the view
model needs to be processed along with the rest of the alias models
(didn't feel like adding more command buffers, which I imagine would be
expensive with the pipeline switching).
Without shadows, this is quite the cheat, but noclip is a cheat anyway,
so probably not that big a deal. It does, however, make noclip usable
for debugging.
Since vulkan supports 32-bit indexes, there's no need for the
shenanigans the EGL-based glsl renderer had to go through to render bsp
models (maps often had quite a bit more than 65536 vertices), though the
reduced GPU memory requirements of 16-bit indices does have its
advantages.
Any sun (a directional light) is in the outside node, which due to not
having its own PVS data is visible to all nodes, but that's a tad
excessive. However, any leaf node with sky surfaces will potentially see
any suns, and leaf nodes with no sky surfaces will see the sun only if
they can see a leaf that does have sky surfaces. This can be quite
expensive to calculate (already known to be moderately expensive for
just the camera leaf node (singular!) when checking for in-map lights)
Getting close to understanding (again) how it all works. I only just
barely understood when I got vulkan's renderer running, but I really
need to understand for when I modify things for shadows. The main thing
hurdle was tinst, but that was dealt with in the previous commit, and
now it's just sorting out the mess of elechains and elementss.
Its sole purpose was to pass the newly allocated instsurf when chaining
an instance model (ammo box, etc) surface, but using expresion
statements removes the need for such shenanigans, and even makes
msurface_t that little bit smaller (though a separate array would be
much better for cache coherence).
More importantly, the relevant code is actually easier to understand: I
spent way too long working out what tinst was for and why it was never
cleared.
The renderer's LineGraph now takes a height parameter, and netgraph now
uses cl_* cvars instead of r_* (which never really made sense),
including it's own height cvar (the render graphs still use
r_graphheight).
The render plugins have made a bit of a mess of getting at the data and
thus it's a tad confusing how to get at it in different places. Really
needs a proper cleanup :(
conwidth and conheight have been moved into vid.conview (probably change
the name at some time), and scr_vrect has been replaced by a view as
well. This makes it much easier to create 2d elements that follow the
screen size (taking advantage of a view's gravity) which will, in the
end, make changing the window size easier.
It now processes 4 pixels at a time and uses a bit mask instead of a
conditional to set 3 of the 4 pixels to black. On top of the 4:1 pixel
processing and avoiding inner-loop conditional jumps, gcc unrolls the
loop, so Draw_FadeScreen itself is more than 4x as fast as it was. The
end result is about 5% (3fps) speedup to timedemo demo1 on my 900MHz
EEE Pc when nq has been hacked to always draw the fade-screen.
qwaq-curses has its place, but its use for running vkgen was really a
placeholder because I didn't feel like sorting out the different
initialization requirements at the time. qwaq-cmd has the (currently
unnecessary) threading power of qwaq-curses, but doesn't include any UI
stuff and thus doesn't need curses. The work also paves the way for
qwaq-x11 to become a proper engine (though sorting out its init will be
taken care of later).
Fixes#15.
This refactors (as such) keys.c so that it no longer depends on console
or gib, and pulls keys out of video targets. The eventual plan is to
move all high-level general input handling into libQFinput, and probably
low-level (eg, /dev/input handling for joysticks etc on Linux).
Fixes#8
Standard quake has just linear, but the modding community added inverse,
inverse-square (raw and offset (1/(r^2+1)), infinite (sun), and
ambient (minlight). Other than the lack of shadows, marcher now looks
really good.
Because LoadImage uses Hunk_TempAlloc, the face images need to be copied
individually. Really, what's neeeded is to be able to load the image
data into a pre-allocated buffer (ideally, the staging buffer for
vulkan, but that's for later).
Mostly, this gets the stage flags in with the barrier, but also adds a
couple more barrier templates. It should make for slightly less verbose
code, and one less opportunity for error (mismatched barrier/stages).
This gets the shaders needed for creating shadow maps, and the changes
to the lighting pipeline for binding the shadow maps, but no generation
or reading is done yet. It feels like parts of various systems are
getting a little big for their britches and I need to do an audit of
various things.
The built up "path" name of the handle resource was not always surviving
the intervening call to cexpr_eval_string (in particular, when other
handles were created in the process of creating a handle). Rather than
simply increase the number of va buffers (where would it end?), just
regenerate the path when adding the new handle. It's probably quick
enough, and the code is not usually not on a critical path.
I was reading about multi-pass rendering on mobile devices
(https://developer.oculus.com/blog/loads-stores-passes-and-advanced-gpu-pipelines/)
and discovered that I had used the wrong flags (but then, I think Graham
Sellers had, too, since used his Vulkan Programming Guide as a
reference). Doesn't seem to make any difference on desktop, but as
there's no loss there, but potential gains on mobile, I'd say it's a
win.
I'm not sure that the mismatch between refdef_t and the assembly defines
was a problem (many fields unused), but the main problem was due to
execute permission on the pages: one chunk of asm was in the data
section, and the patched code was not marked as being executable (due to
such a thing not existing when quake was written).
This ensures that fov_y is not calculated until after the render view
size is known and thus doesn't become some crazy angle (that happens to
result in a negative tan). Fixes upside-down-quake :)
vid.aspect is removed (for now) as it was not really the right idea (I
really didn't know what I was doing at the time). Nicely, this *almost*
fixes the fov bug on fresh installs: the view is now properly
upside-down rather than just flipped vertically (ie, it's now rotated
180 degrees).
Not only does it makes sense to centralize the setting of viewport and
scissor, but it's actually necessary in order to fix the upside-down
rendering on windows.
Only 64-bit windows is tested, and there are still various failures, but
QF is limping along in windows again.
nq-sdl works for sw, and sw32, gl and glsl are mostly black (but not
entirely for gl?), vulkan is not supported with sdl.
nq-win works for sw and sw32, and sort of for vulkan (very dark and
upside-down?). gl and glsl complain about vid mode,
qw-client-[sdl,win] seem to be the same, but something is wrong with the
console (reading keyboard input).
While this caused some trouble for pr_strings and configurable strftime
(evil hacks abound), it's the result of discovering an ancient (from
maybe as early as 2004, definitely before 2012) bug in qwaq's printing
that somehow got past months of trial-by-fire testing (origin understood
thanks to the warning finding it).
This separate the FOV calculations from other refdef calcs, cleaning up the
renderer proper and making it easier for other parts of the engine (eg,
csqc) to update the fov.
Loading is broken for multi-file image sets due to the way images are
loaded (this needs some thought for making it effecient), but the
Blender environment map loading works.
They're unlit (fullbright, but that's nothing new for quake), but
working nicely. As a bonus, sort out the sky pass (forced to due to the
way command buffers are used).
There were actually several problems: translucency wasn't using or
depending on the depth buffer, and the depth buffer wasn't marked as
read-only in the g-buffer pass. Getting that correct seems to have given
bigass1 a 0.5% boost (hard to say, could be the usual noise).
While being able to write pipeline specs like this was the end goal of
the parsing sub-project, I didn't realize it was already usable. This
sure makes going through the pipeline specs much easier.
That was... easier than expected. A little more tedious that I would
have liked, but my scripting system isn't perfect (I suspect it's best
suited as the output of a code generator), and the C side could do with
a little more automation.
Other than dealing with shader data alignment issues, that went well :).
Nicely, the implementation gets the explicit scaling out of the shader,
and allows for a directional flag.
I never liked that some of the macros needed the type as a parameter
(yay typeof and __auto_type) or those that returned a value hid the
return statement so they couldn't be used in assignments.
Still "some" more to go: a pile to do with transforms and temporary
entities, and a nasty one with host_cbuf. There's also all the static
block-alloc lists :/