It turned out the bindless approach wouldn't work too well for my design
of the sprite objects, but I don't think that's a big issue at this
stage (and it seems bindless is causing problems for brush/alias
rendering via renderdoc and on my versa pro). However, I have figured
out how to make effective use of descriptor sets (finally :P).
The actual normal still needs checking, but the sprites are currently
unlit so not an issue at this stage.
I'm not at all sure what I was thinking when I designed it, but I
certainly designed it wrong (to the point of being fairly useless). It
turns out memory requirements are already aligned in size (so just
multiplying is fine), and what I really wanted was to get the next
offset aligned to the given requirements.
This adds the shaders and the pipeline specs. I'm not sure that the
deferred rendering side of the render pass is appropriate, but I thought
I'd give it a go, since quake sprites are really cutoff rather than
translucent.
With the switch to multi-layer textures for brush models, the bsp and
alias texture descriptor sets became identical and thus the definitions
shareable. However, due to complications I don't want to address yet,
they're still separately identified, but I should be able to use the
texture set for most, if not all, pipelines.
The vertices and frame images are loaded into the one memory object,
with the vertices first followed by the images.
The vertices are 2D xy+uv sets meant to be applied to the model
transform frame, and are pre-computed for the sprite size (this part
does support sprites with varying frame image sizes).
The frame images are loaded into one image with each frame on its own
layer. This will cause some problems if any sprites with varying frame
image sizes are found, but the three sprites in quake are all uniform
size.
As much as it can be since the texture data is interleaved with the
model data in the files (I guess not that bad a design for 25 years ago
with the tight memory constraints), but this paves the way for
supporting sprites in Vulkan.
As the sw renderer's implementation was the closest to id's, it was used
as the model (thus a fair bit of cleanup is still needed). This fixes
some incorrect implementations in glsl and gl.
I'd forgotten (when doing the original brush texture loader) that
turbulent surfaces were unlit and thus always full-bright, then never
wrote the turb shader to take care of it. The best solution seems to be
to just mix the two colors in the shader as it will allow turb surfaces
to be lit in the future (probably with severely limited light counts due
to being a forward renderer).
This gets the alias pipeline in line with the bsp pipeline, and thus
everything is about as functional as it was before the rework (minus
dealing with large texture sets).
I guess it's not quite bindless as the texture index is a push constant,
but it seems to work well (and I may have fixed some full-bright issues
by accident, though I suspect that's just my imagination, but they do
look good).
This should fix the horrid frame rate dependent behavior of the view
model.
They are also in their own descriptor set so they can be easily shared
between pipelines. This has been verified to work for Draw.
BSP textures are now two-layered with the albedo and emission in the two
layers rather than two separate images. While this does increase memory
usage for the textures themselves (most do not have fullbright pixels),
it cuts down on image and image view handles (and shader resources).
Smashing everything in the process :P (need to work on the C side).
However, while bindless is supposedly good for performance, the biggest
gain this will bring is portability: the texture counts are
automatically limited to what the hardware can handle, and the reliance
on push descriptors is removed (though they were nice and did help get
things up and running).
I had forgotten that the parameters are in reverse order, and even if I
had remembered, I forgot to reset offset before the second loop.
Pre-decrementing offset takes care of both issues at once.
My VersaPro doesn't support more than 32 per-stage samplers (lavapipe).
This is a small part of getting Vulkan to run on lavapipe and even in
itself is rather incomplete.
Fixes the warning about parse_fixed_array not being used (oops, the
problem with partial commits), but more importantly, gives access to
things like maxDescriptorSetSamplers.
This will make property list expressions easier to work with. The
library is rather limited right now (trig, dot, min/max/bound) but even
just min adds a lot of functionality.
I want to support reading VkPhysicalDeviceLimits but it has some arrays.
While I don't need to parse them (VkPhysicalDeviceLimits should be
treated as read-only), I do need to be able to access them in property
list expressions, and vkgen generates the cexpr type descriptors too.
However, I will probably want to parse arrays some time in the future.
This ensures that unused parser blocks do not get emitted. In the
testing of the upcoming support for fixed arrays, the blend color
constants were being double emitted (both as custom and normal parser)
due to being an array. gcc did not like that (what with all those
warning flags).
Multiple render passes are needed for supporting shadow mapping, and
this is a huge step towards breaking the Vulkan render free of Quake,
and hopefully will lead the way for breaking the GL renderers free as
well.
This is actually a better solution to the renderer directly accessing
client code than provided by 7e078c7f9c.
Essentially, V_RenderView should not have been calling R_RenderView, and
CL_UpdateScreen should have been calling V_RenderView directly. The
issue was that the renderers expected the world entity model to be valid
at all times. Now, R_RenderView checks the world entity model's validity
and immediately bails if it is not, and R_ClearState (which is called
whenever the client disconnects and thus no longer has a world to
render) clears the world entity model. Thus R_RenderView can (and is)
now called unconditionally from within the renderer, simplifying
renderer-specific variants.
While using binary data objects for specialization data works for bools
(as they can be 0 or -1), they don't work so well for numeric values due
to having to get the byte order correct and thus are not portable, and
difficult to get right.
Binary data is still supported, but the data can be written as a string
with an array(...) "constructor" expression taking any number of
parameters, with each parameter itself being an expression (though
values are limited at this stage).
Due to the plist format, quotes are required around the expression
("array(...)")
Sets never shrink, so assigning a dynamically created set to a
statically created set after the working size has reduced (going from
demo2 to demo3) causes the set code to attempt to resize the statically
created set, which leads to libc having a bad time.
Why nvidia's drivers accepted double-destroyed framebuffers is beyond
me, but this fixes the Intel drivers complaining about such (and the
subsequent segfault).
When I changed the matrices from an array of floats to an array of
vec4f_t, I forgot to update the flush offsets. Yay for having a
Vulkan-capable Intel device with its different alignment requirements.
When allocating memory for multiple objects that have alignment
requirements, it gets tedious keeping track of the offset and the
alignment. This is a simple function for walking the offset respecting
size and alignment requirements, and doubles as a size calculator.
While using barriers is a zillion times better than actually grabbing
the mouse and keyboard, they're still a pain when debugging as qf is not
able to respond to the barrier-hit events. All the other logic is still
there so even when "grabbing", the mouse will not be blocked if the
window doesn't have focus.
The stack is arbitrary strings that the validation layer debug callback
prints in reverse order after each message. This makes it easy to work
out what nodes in a pipeline/render pass plist are causing validation
errors. Still have to narrow down the actual line, but the messages seem
to help with that.
Putting qfvPushDebug/qfvPopDebug around other calls to vulkan should
help out a lot, tool.
As a bonus, the stack is printed before debug_breakpoint is called, so
it's immediately visible in gdb.
Rather than just 0/1, it now acts as flags to control what messages are
printed. In addition to the Vulkan enum names (long and short), none and
all are supported (as well as raw numbers, but they're not checked for
validity). This makes vulkan_use_validation a bit easier to use and less
verbose by default.
Now, if only it was easier to remember the name :P
It seems X11 does not like creating barriers entirely off the screen,
though the error seems to be a little unreliable (however, off the left
edge was definitely bad).
For now, only the first two axis (mouse X and Y) are supported (XInput
treats the scroll wheel events as axes too, so mice have up to 4!), but
most importantly, this prevents the scroll wheel from being seen as the
X axis. Oops.
With the old headers removed, X11_SetGamma became a stub and gcc
complained about it wanting the const attribute. On investigation, it
turned out the X_XF86VidModeSetGamma was a holdover from the initial
implementation of hardware gamma support.
UI key presses are still handled by regular X events, but in-game
"button" presses arrive via raw keyboard events. This gives transparent
handling of keyboard repeat (UI keys see repeat, game keys do not),
without messing with the server's settings (yay, that was most annoying
when it came to debugging), and the keyboard is never grabbed, so this
is a fairly user-friendly setup.
At first, I wasn't too keen on capturing them from the root window
(thinking about the user's security), but after a lot of investigation,
I found a post by Peter Hutterer
(http://who-t.blogspot.com/2011/09/whats-new-in-xi-21-raw-events.html)
commenting that root window events were added to XInput2 specifically
for games. Since application focus is tracked and unfocused key events
are dropped very early on, there's no way for code further down the
food-chain to know there even was an event, abusing the access would
require modifying the x11 input code, in which case all bets are off
anyway and any attempt at security anywhere in the code will fail,
meaning that nefarious progs code and the like shouldn't be a problem.
After a lot of thought, it really doesn't make sense to have an option
to block mouse input in x11 (not grabbing or similar does make sense, of
course). Not initializing mouse input made perfect sense in DOS and even
console Linux (SVGA) what with the low level access.
It turns out that if the barriers are set on the app window, and the app
grabs the pointer (even passively), barrier events will no longer be
sent to the app. However, creating the barriers on the root window and
the events are selected on the root window, the barrier events are sent
regardless of the grab state.
Other subsystems, especially low-level input drivers, need to know when
the app has input focus. eg, as the evdev driver uses the raw stream
from the kernel, which has no idea about X application focus (in fact,
it seems the events are shared across multiple apps without any issue),
the evdev driver sees all the events thus needs to know when to drop
them.
It turns out to be possible to get a barrier event at the same time as a
configure notify event (which rebuilds the barriers), and trying to
release the pointer at such a time results in a bad barrier error and
program crash. Thus check the event barrier against the currently
existing barriers before attempting to release the pointer.
This does mean that a better mechanism for sequencing window
repositioning and barrier creation may be required.
This should be a much friendlier way of "grabbing" input, though I
suspect that using raw keyboard events will result in a keyboard grab,
which is part of the reason for wanting a friendly grab.
There does seem to be a problem with the mouse sneaking out of the
top-right and bottom-left corners. I currently suspect a bug in the X
server, but further investigation is needed.
This is needed for getting window position info into in_x11 without
exposing more globals, and is likely to be useful for other things,
especially as it doubles as a resize event when that's eventually
supported.
This is necessary in focus-follows-mouse environments (at least for
openbox, but it wouldn't surprise me if most other WMs behave the same
way) because the WMs don't set focus when the pointer is grabbed (which
XInput does before the WM sees the enter event). This is especially
important when the window is fullscreen on a multi-monitor setup as
there is no border to *maybe* catch the mouse before it enters the
window.
Right now, only raw pointer motion and button events are handled, and
the mouse escapes the window, and there are some issues with focus in
focus-follows-mouse environments. However, this should be a much nicer
setup than DGA.
The current limit is still 32. Dealing with it properly will take some
rather advanced messing with XInput, and will be necessary assuming
non-XInput support is continued.
There's now IN_X11_Preinit, IN_X11_Postinit (both for want of better
names), and in_x11_init. The first two are for taking care of
initialization that needs to be done before window creation and between
window creation and mapping (ie, are very specific to X11 stuff) while
in_x11_init takes care of the setup for the input system. This proved
necessary in my XInput experimentation: a passive enter grab takes
effect only when the pointer enters the window, thus setting up the grab
with the pointer already in the window has no effect until the pointer
leaves the window and returns.
This was always a horrible hack just to get the screen centered on the
window back when we were doing fullscreen badly. With my experiments
with XInput, it has proven to be a liability (I'd forgotten it was even
there until it started imposing a 2s delay to QF's startup).
Input driver can now have an optional init_cvars function. This allows
them to create all their cvars before the actual init pass thus avoiding
some initialization order interdependency issues (in this case, fixing a
segfault when starting x11 clients fullscreen due to the in_dga cvar not
existing yet).
Well... it could be done better, but this works for now assuming it's in
/usr/include (and it's correct for mxe builts). Does need proper
autoconfiscation, though.
Seems to work nicely for keyboard (though key bindings are not
cross-platform). Mouse not tested yet, and I expect there are problems
with it for absolute inputs (yay mouse warp :P).
Mouse axis and button names are handled internally (and thus
case-insensitive).
Key names are handled by X11. Case-sensitivity is currently determined
by Xlib.
The cooked inputs (ie_key, ie_mouse) are intended for UI interaction, so
generally should have priority over the raw events, which are intended
for game interaction.
This has smashed the keydest handling for many things, and bindings, but
seems to be a good start with the new input system: the console in
qw-client-x11 is usable (keyboard-only).
The button and axis values have been removed from the knum_t enum as
mouse events are separate from key events, and other button and axis
inputs will be handled separately.
keys.c has been disabled in the build as it is obsolute (thus much of
the breakage).
I'm undecided on how to handle application focus (probably gain/lose
events), and the destination handler has been a stub for a while. One less
dependency on the "old" key handling code.
I'm undecided if the pasted text should be sent as a string rather than
individual key events, but this will do the job for now as it gets me
closer to being able to test everything.
It seems that under certain circumstances (window managers?), select is not
reliable for getting key events, so use of select has been disabled until I
figure out what's going on and how to fix it.
For the mouse in x11, I'm not sure which is more cooked: deltas or
window-relative coordinates, but I don't imagine that really matters too
much. However, keyboard and mouse events suitable for 2D user interfaces
are sent at the same time as the more game oriented button and axis events.
The x11 keyboard and mouse devices are really core input devices rather
than x11 input devices in that keyboard and mouse will be present on most
systems and thus not specific to the main user interface (x11, windows,
etc).
Now nothing works at all ;) However, that's only because the binding
system is incomplete: the X11 input events are getting through to the
binding system, so now it's just a matter of getting that to work.
The common input code (input outer loop and event handling) has been
moved into libQFinput, and modified to have the concept of input drivers
that are registered by the appropriate system-level code (x11, win,
etc).
As well, my evdev input library code (with hotplug support) has been
added, but is not yet fully functional. However, the idea is that it
will be available on all systems that support evdev (Linux, and from
what I've read, FreeBSD).
For now, the functions check for a null hunk pointer and use the global
hunk (initialized via Memory_Init) if necessary. However, Hunk_Init is
available (and used by Memory_Init) to create a hunk from any arbitrary
memory block. So long as that block is 64-byte aligned, allocations
within the hunk will remain 64-byte aligned.
The fact that numleafs did not include leaf 0 actually caused in many
places due to never being sure whether to add 1. Hopefully this fixes
some of the confusion. (and that comment in sv_init didn't last long :P)
Modern maps can have many more leafs (eg, ad_tears has 98983 leafs).
Using set_t makes dynamic leaf counts easy to support and the code much
easier to read (though set_is_member and the iterators are a little
slower). The main thing to watch out for is the novis set and the set
returned by Mod_LeafPVS never shrink, and may have excess elements (ie,
indicate that nonexistent leafs are visible).
-999999 seems to be a hold-over from the software renderer passed
through both gl renderers. I guess it didn't matter in the gl renderers
due to various draw hacks, but it made quite a difference in vulkan.
Fixes the view model covering the hud.
Quake just looked wrong without the view model. I can't say I like the
way the depth range is hacked, but it was necessary because the view
model needs to be processed along with the rest of the alias models
(didn't feel like adding more command buffers, which I imagine would be
expensive with the pipeline switching).
The recent changes to key handling broke using escape to get out of the
console (escape would toggle between console and menu). Thus take care
of the menu (escape) part of the coupling FIXME by implementing a
callback for the escape key (and removing key_togglemenu) and sorting
out the escape key handling in console. Seems to work nicely
Without shadows, this is quite the cheat, but noclip is a cheat anyway,
so probably not that big a deal. It does, however, make noclip usable
for debugging.
Since vulkan supports 32-bit indexes, there's no need for the
shenanigans the EGL-based glsl renderer had to go through to render bsp
models (maps often had quite a bit more than 65536 vertices), though the
reduced GPU memory requirements of 16-bit indices does have its
advantages.
Any sun (a directional light) is in the outside node, which due to not
having its own PVS data is visible to all nodes, but that's a tad
excessive. However, any leaf node with sky surfaces will potentially see
any suns, and leaf nodes with no sky surfaces will see the sun only if
they can see a leaf that does have sky surfaces. This can be quite
expensive to calculate (already known to be moderately expensive for
just the camera leaf node (singular!) when checking for in-map lights)
Getting close to understanding (again) how it all works. I only just
barely understood when I got vulkan's renderer running, but I really
need to understand for when I modify things for shadows. The main thing
hurdle was tinst, but that was dealt with in the previous commit, and
now it's just sorting out the mess of elechains and elementss.
Its sole purpose was to pass the newly allocated instsurf when chaining
an instance model (ammo box, etc) surface, but using expresion
statements removes the need for such shenanigans, and even makes
msurface_t that little bit smaller (though a separate array would be
much better for cache coherence).
More importantly, the relevant code is actually easier to understand: I
spent way too long working out what tinst was for and why it was never
cleared.
The renderer's LineGraph now takes a height parameter, and netgraph now
uses cl_* cvars instead of r_* (which never really made sense),
including it's own height cvar (the render graphs still use
r_graphheight).
The render plugins have made a bit of a mess of getting at the data and
thus it's a tad confusing how to get at it in different places. Really
needs a proper cleanup :(
conwidth and conheight have been moved into vid.conview (probably change
the name at some time), and scr_vrect has been replaced by a view as
well. This makes it much easier to create 2d elements that follow the
screen size (taking advantage of a view's gravity) which will, in the
end, make changing the window size easier.
It now processes 4 pixels at a time and uses a bit mask instead of a
conditional to set 3 of the 4 pixels to black. On top of the 4:1 pixel
processing and avoiding inner-loop conditional jumps, gcc unrolls the
loop, so Draw_FadeScreen itself is more than 4x as fast as it was. The
end result is about 5% (3fps) speedup to timedemo demo1 on my 900MHz
EEE Pc when nq has been hacked to always draw the fade-screen.
qwaq-curses has its place, but its use for running vkgen was really a
placeholder because I didn't feel like sorting out the different
initialization requirements at the time. qwaq-cmd has the (currently
unnecessary) threading power of qwaq-curses, but doesn't include any UI
stuff and thus doesn't need curses. The work also paves the way for
qwaq-x11 to become a proper engine (though sorting out its init will be
taken care of later).
Fixes#15.
This refactors (as such) keys.c so that it no longer depends on console
or gib, and pulls keys out of video targets. The eventual plan is to
move all high-level general input handling into libQFinput, and probably
low-level (eg, /dev/input handling for joysticks etc on Linux).
Fixes#8
Standard quake has just linear, but the modding community added inverse,
inverse-square (raw and offset (1/(r^2+1)), infinite (sun), and
ambient (minlight). Other than the lack of shadows, marcher now looks
really good.
Because LoadImage uses Hunk_TempAlloc, the face images need to be copied
individually. Really, what's neeeded is to be able to load the image
data into a pre-allocated buffer (ideally, the staging buffer for
vulkan, but that's for later).
Mostly, this gets the stage flags in with the barrier, but also adds a
couple more barrier templates. It should make for slightly less verbose
code, and one less opportunity for error (mismatched barrier/stages).
This gets the shaders needed for creating shadow maps, and the changes
to the lighting pipeline for binding the shadow maps, but no generation
or reading is done yet. It feels like parts of various systems are
getting a little big for their britches and I need to do an audit of
various things.
The built up "path" name of the handle resource was not always surviving
the intervening call to cexpr_eval_string (in particular, when other
handles were created in the process of creating a handle). Rather than
simply increase the number of va buffers (where would it end?), just
regenerate the path when adding the new handle. It's probably quick
enough, and the code is not usually not on a critical path.
I was reading about multi-pass rendering on mobile devices
(https://developer.oculus.com/blog/loads-stores-passes-and-advanced-gpu-pipelines/)
and discovered that I had used the wrong flags (but then, I think Graham
Sellers had, too, since used his Vulkan Programming Guide as a
reference). Doesn't seem to make any difference on desktop, but as
there's no loss there, but potential gains on mobile, I'd say it's a
win.
I'm not sure that the mismatch between refdef_t and the assembly defines
was a problem (many fields unused), but the main problem was due to
execute permission on the pages: one chunk of asm was in the data
section, and the patched code was not marked as being executable (due to
such a thing not existing when quake was written).
This ensures that fov_y is not calculated until after the render view
size is known and thus doesn't become some crazy angle (that happens to
result in a negative tan). Fixes upside-down-quake :)
vid.aspect is removed (for now) as it was not really the right idea (I
really didn't know what I was doing at the time). Nicely, this *almost*
fixes the fov bug on fresh installs: the view is now properly
upside-down rather than just flipped vertically (ie, it's now rotated
180 degrees).
Not only does it makes sense to centralize the setting of viewport and
scissor, but it's actually necessary in order to fix the upside-down
rendering on windows.
This gets the GL and GLSL renderers working for the -win targets... sort
of: they are upside down and GLSL's bsp surfaces are black (same as
Vulkan). However, with this, all 5 renderers at least limp along for
-win, 4/5 work for -sdl.
It turns out the dd and dib "driver" code is very specific to the
software renderer. This does not fix the segfault on changing video
mode, but I do know where the problem lies: the window is being
destroyed and recreated without recreating the buffers. I suspect a
clean solution to this will allow for window resizing in X as well.
Only 64-bit windows is tested, and there are still various failures, but
QF is limping along in windows again.
nq-sdl works for sw, and sw32, gl and glsl are mostly black (but not
entirely for gl?), vulkan is not supported with sdl.
nq-win works for sw and sw32, and sort of for vulkan (very dark and
upside-down?). gl and glsl complain about vid mode,
qw-client-[sdl,win] seem to be the same, but something is wrong with the
console (reading keyboard input).
While this caused some trouble for pr_strings and configurable strftime
(evil hacks abound), it's the result of discovering an ancient (from
maybe as early as 2004, definitely before 2012) bug in qwaq's printing
that somehow got past months of trial-by-fire testing (origin understood
thanks to the warning finding it).
It looks like choosing a visual is not necessary (at least for normal
apps, VR might be another matter). Still no idea if anything works (for
-win support in general, let alone vulkan).
This separate the FOV calculations from other refdef calcs, cleaning up the
renderer proper and making it easier for other parts of the engine (eg,
csqc) to update the fov.
Loading is broken for multi-file image sets due to the way images are
loaded (this needs some thought for making it effecient), but the
Blender environment map loading works.
They're unlit (fullbright, but that's nothing new for quake), but
working nicely. As a bonus, sort out the sky pass (forced to due to the
way command buffers are used).
There were actually several problems: translucency wasn't using or
depending on the depth buffer, and the depth buffer wasn't marked as
read-only in the g-buffer pass. Getting that correct seems to have given
bigass1 a 0.5% boost (hard to say, could be the usual noise).
While being able to write pipeline specs like this was the end goal of
the parsing sub-project, I didn't realize it was already usable. This
sure makes going through the pipeline specs much easier.
That was... easier than expected. A little more tedious that I would
have liked, but my scripting system isn't perfect (I suspect it's best
suited as the output of a code generator), and the C side could do with
a little more automation.
Other than dealing with shader data alignment issues, that went well :).
Nicely, the implementation gets the explicit scaling out of the shader,
and allows for a directional flag.
I never liked that some of the macros needed the type as a parameter
(yay typeof and __auto_type) or those that returned a value hid the
return statement so they couldn't be used in assignments.
Still "some" more to go: a pile to do with transforms and temporary
entities, and a nasty one with host_cbuf. There's also all the static
block-alloc lists :/
Light styles and shadows aren't implemented yet.
The map's entities are used to create the lights, and the PVS used to
determine which lights might be visible (ie, the surfaces they light).
That could do with some more improvements (eg, checking if a leaf is
outside a spotlight's cone), but the concept seems to work.
This is the first step towards component-based entities.
There's still some transform-related stuff in the struct that needs to
be moved, but it's all entirely client related (rather than renderer)
and will probably go into a "client" component. Also, the current
components are directly included structs rather than references as I
didn't want to deal with the object management at this stage.
As part of the process (because transforms use simd) this also starts
the process of moving QF to using simd for vectors and matrices. There's
now a mess of simd and sisd code mixed together, but it works
surprisingly well together.
It's not used yet as work needs to be done to better support generic
entities, but this is the next step to real-time lighting (though, to be
honest, I expect it will be too slow to be usable).
There's still the memory management itself to clean up, but the main
code no longer uses any static/global variables (holdover from when the
function was recursive rather).
The static libs are used to build the plugins, but make it easy to use
only those modules needed for tests. Fixes the link error when running
"make check" with non-static plugins.
Static lights are yet to come (so the screen is black most of the time),
but dynamic lights work very nicely (and look very good) despite the
falloff being incorrect.
While I could reconstruct the position from the screen coords and depth,
this is easier and good enough for now. Reconstruction is an
optimization thing.
Lighting doesn't actually do lights yet, but it's producing pixels.
Translucent seems to be working (2d draw uses it), and compose seems to
be working.
This gets the alias model render pass and pipeline passing validation.
I don't know why I didn't add the subpass field to the
VkGraphicsPipelineCreateInfo parser def, though it could be I simply
missed it, or I thought I wouldn't need it at the time.
Due to wanting to access array sizes when parsing uint32_t type values,
parse_uint32_t needs to handle size_t values even though it throws out
any excess bits.
After getting lights even vaguely working for alias models, I realized
that it just wasn't going to be feasible to do nice lighting with
forward rendering. This gets the bulk of the work done for deferred
rendering, but still need to sort out the shaders before any real
testing can be done.
Never really wanted in the first place (back when I did the plugin
renderers), but I didn't feel like doing the required work to avoid it
at the time. At least with Vulkan being a fresh start in an environment
that's already plugin-friendly, there was no real work involved. I'll
get to the other renderers eventually (especially now that I know gdb
does the right thing when there are multiple functions with the same
name).
It turns out I had conflated frame buffers with frames and wound up
making a minor mess when separating the number of frames the renderer
could have in flight from the number of swap-chain images. This is the
first step towards correcting that mistake.
It's not entirely there yet, but the basics are working. Work is still
needed for avoiding duplication of objects (different threads will have
different contexts and thus different tables, so necessary per-thread
duplication should not become a problem) and general access to arbitrary
fields (mostly just parsing the strings)
The node struct was 72 bytes thus two cache line. Moving the pointer
into the brush model data block allows nodes to fit in a single cache
line (not that they're aligned yet, but that's next). It doesn't seem to
have made any difference to performance (at least in the vulkan
renderer), but it hasn't hurt, either, as the only place that needed the
parent pointer was R_MarkLeaves.
It's not quite as expected, but that may be due to one of msaa, the 0-15
range in the palette not being all the way to white, the color gradients
being not quite linear (haven't checked yet) or some combination of the
above. However, it's that what should be yellow is more green. At least
the zombies are no longer white and the ogres don't look like they're
wearing skeleton suits.
Doesn't seem to make much difference performance-wise, but speed does
seem to be fill-rate limited due to the 8x msaa. Still, it does mean
fewer bindings to worry about.
This is a big step towards a cleaner api. The struct reference in
model_t really should be a pointer, but bsp submodel(?) loading messed
that up, though that's just a matter of taking more care in the loading
code. It seems sensible to make that a separate step.
I've decided that alias model skins should be in a single four-level
array texture rather than spread over four textures, but there's no way
I want to write that code again: getting it right was hard enough the
first time :P
It now takes a context pointer (opaque data) that holds the buffers it
uses for the temporary strings. If the context pointer is null, a static
context is used (making those uses of va NOT thread-safe). Most calls to
va use the static context, but all such calls have been formatted
consistently so they are easy to find when it comes time to do a full
audit.
I had messed up my index array creation, but once that was fixed the
textures worked well other than a lot of pixels are shades of grey due
to being in the top or bottom color map range.
I don't really know why (I need to do some research), but this fixes the
lockups when accessing the matrices UBO. It has made a mess of my
carefully designed uniform binding layout, so I hope I can get bound
descriptor sets working the way I want, but I really need to progress on
the rest of the project.
It's a tad bogus as it's the lights close to the camera, but it should
at least be a good start once things are working. There's currently
something very wrong with the state of things.
This makes tex_t more generally useable and probably more portable. The
goal was to be able to use tex_t with data that is in a separate chunk
of memory.
The sky texture is loaded with black's alpha set to 0. While this does
hit both layers, the screen is cleared to black so it shouldn't be a
problem (and will allow having a skybox behind the sheets).
Glow map and sky sheet and cube need to wait until I can get some
default textures going, but the world is rendering correctly otherwise
(though a tad dark: need to do a gamma setting).
It now uses the ring buffer code I wrote for qwaq (and forgot about,
oops) to handle the packets themselves, and the logic for allocating and
freeing space from the buffer is a bit simpler and seems to be more
reliable. The automated test is a bit of a joke now, though, but coming
up with good tests for it... However, nq now cycles through the demos
without obvious issue under the same conditions that caused the light
map update code to segfault.
Needed to use an rgba format to use floats (and optimal layout), but
having to set the alpha to 1 even for full-dark luxels is not very
efficient. Better to just ignore the alpha in the shader. Fixes the
occasional transparent surface in shadowed areas.
Many surfaces are missing (I suspect it's due to transform stage
management in the index emitter), and currently only the light maps are
rendered (still not binding the correct textures), but the basics are
working.
Vulkan validation (quite rightly) doesn't like it when the flush range
goes past the end of the buffer, but also doesn't like it when the flush
range isn't cache-line aligned, so align the size of the buffer, too.
Copying data from the wrong buffer was the cause of the corrupted brush
model vertices, and then lots of little errors (mostly forgetting to
multiply by bpp) for textures.
I had originally planned on mixing the stage management with general
texture support code like I did in glsl, but I think that was a mistake
and I did keep looking for scrap.[ch] when I wanted to edit something to
do with the scrap...
There's still a problem with the vertex data itself not getting sent to
the GPU properly, but vulkan is now happy with my tiny test map (which
required disabling skies entirely until I get null textures working).