The pic is scaled to fill the specified rect (then clipped to the
screen (effectively)). Done just for the console background for now, but
it will be used for slice-pics as well.
Not implemented for vulkan yet as I'm still thinking about the
descriptor management needed for the instanced rendering.
Making the conback rendering conditional gave an approximately 3% speed
boost to glsl with the GL stub (~12200fps to ~12550fps), for either
conback render method.
The wording might seem a little odd, but cl_screen is really the full 2D
client HUD while the console is completely independent of the client and
shouldn't know that the client even exists. Ideally, the resize events
would be handled by the canvas system, to which end this is a small
step.
This is the beginning of supporting 2d rendering in 3d space. The idea
is that a canvas can be in 2d orthographic space (not attached to any
entity with a 3d transform), or in 3d perspective space (attached to an
entity with a 3d transform, either as a child of the camera, or of some
object in 3d space).
It will replace the current HUD code when it's working.
I found I needed the subrange start as well as the end, but I liked that
the subpools themselves used only the end of the range, so switching to
just a unint32_t for the value and adding a function to return a tuple
made sense. I had kept the struct because I thought I might want to
store additional information (eg, the entity "owning" the subpool), but
found that I didn't need such information as the systems using subpools
that way would have access to the entity by other means.
Interestingly, the change found a bug in subpool creation: I really
don't know why things worked before, but they work better now :)
Subpools are for grouping components by some criterion. Any component
that has a rangeid callback will be grouped with other components that
return the same render id. Note that the ordering of components within a
group will be affected by adding a component into a group that comes
before that group (or removing a component).
Component pools can have multiple groups, added and removed dynamically,
but removing a group should (currently) be done only when empty.
While "set" is a tad strong (there's just the one component for now), I
had missed the changes when adding ECS systems. Fixes the segfault at
the end of demo1 (ie, when any center text is printed).
Instead of creating new entities for the text views. This approximately
halves the number of entities required to display flowed text, but also
tests the ability to have an entity in multiple hierarchies (the goal of
the ECS component and system changes).
The system struct bundles the registry and component base together,
making it easier to reuse systems in multiple registries, or really,
easier to separate one set of ECS system components from those of other
systems.
While this does require an extra call after registering components, it
allows for multiple component sets (ie, sub-systems) to be registered
before the component pools are created. The base id for the registered
component set is returned so it can be passed to the subsystem as
needed.
There's now a main ecs.h file that includes the sub-system headers,
removing the need to explicitly include several header files, but the
sub-systems are a less cluttered.
This means that the component id used for hierarchy references must be
passed to Hierarchy_New and Hierarchy_Copy, but does all an entity to
have more than one hierarchy, which is useful for canvases (hierarchies
of views) in the 3d world (the canvas root would have a 3d hierarchy
reference and a 2d (view) hierarchy reference).
While Draw_Glyph does draw only one glyph at a time, it doesn't shape
the text every time, so is a major win for performance (especially
coupled with pre-shaped text).
Font cannot be overridden yet, but script attributes (language, script
type, direction) and features can be set at all three levels in a
passage. Attributes on the root level act as defaults for the paragraph
and word levels, and paragraph attributes act as defaults for the word
level.
This causes some problems with linking if libQFgui is linked with
libQFrenderer (which is necessary in the long run), but it seems
everything gets away with it for now (which, tbh, I don't like).
And add a function to process a passage into a set of views with glyphs.
The views can be flowed: they have flow gravity and their sizes set to
contain all the glyphs within each view (nominally, words). Nothing is
tested yet, and font rendering is currently broken completely.
Font and text handling is very much part of user interface and at least
partially independent of rendering, but does fit it better with GUI than
genera UI (ie, both graphics and text mode), thus libQFgui as well as
libQFui are built in the ui directory.
The existing font related builtins have been moved into the ruamoko
client library.
In theory, it supports all the non-palette formats, but only luminance
and alpha (tex_l and tex_a) have been tested. Fixes the rather broken
glyph rendering.
World scale can only be approximate if non-uniform scales and
non-orthogonal rotations are involved, but it is still useful
information sometimes.
However, the calculation is expensive (needs a square root), so remove
world scale as a component and instead calculate it on an as-needed
basis because it is quite expensive to do for every transform when it is
used only by the legacy-GL alias model renderer.
Thanks to the 3d frame buffer output being separate from the swap chain,
it's possible to have a different frame buffer size from the window
size, allowing for a smaller buffer and thus my laptop can cope (mostly)
with the vulkan renderer.
I had debated putting the blending in the compose subpass or a separate
pass but went with the separate pass originally, but it turns out that
removing the separate pass gains 1-3% (5-15/545 fps in a timedemo of
demo1).
It's a bit flaky for particles, especially at higher frame rates, but
that's due to supporting only 64 overlapping pixels. A reasonable
solution is probably switching to a priority heap for the "sort" and
upping the limit.
I don't yet know whether they actually work (not rendering yet), but the
system isn't locking up, and shutdown is clean, so at least resources
are handled correctly.
This splits up render pass creation so that the creation of the various
resources can be tailored to the needs of the actual render pass
sub-system. In addition, it gets window resizing mostly working (just
some problems with incorrect rendering).
This is the minimum maximum count for sampled images, and with layered
shadow maps (with a minimum of 2048 layers supported), that's really way
more than enough.
Things are a bit of a mess with interdependence between sub-module
initialization and render pass initialization, and window resizing is
broken, but the main render pass rendering to an image that is then
post-processed (currently just blitted) is working. This will make it
possible to implement fisheye and water warp (and other effects, of
course).
When working, this will handle the output to the swap-chain images and
any final post-processing effects (gamma correction, screen scaling,
etc). However, currently the screen is just black because the image
for getting the main render pass output isn't hooked up yet.
Now each (high level) render pass can have its own frame buffer. The
current goal is to get the final output render pass to just transfer the
composed output to the swap chain image, potentially with scaling (my
laptop might be able to cope).
While the HUD and status bar don't cut out a lot of screen (normally),
they might start to make a difference when I get transparency working
properly. The main thing is this is a step towards pulling the 2d
rendering into another render pass so the main deferred pass is
world-only.
Using swizzles in an image view allows the same shader to be used with
different image "types" (eg, color vs coverage).
Of course, this needed to abandon QFV_CreateImageView, but that is
likely for the best.
While simple component pools can be cleared simply by zeroing their
counts, ones that have a delete function need that function to be called
for all the components in the pool otherwise leaks can happen.
It's currently used only by the vulkan renderer, as it's the only
renderer that can make good use of it for alias models, but now vulkan
show shirt/pants colors (finally).
As the RGB curves for many of the color rows are not linearly related,
my idea of scaling the brightest color in the row just didn't work.
Using a masked palette lookup works much better as it allows any curves.
Also, because the palette is uploaded as a grid and the coordinates are
calculated on the CPU, the system is extendable beyond 8-bit palettes.
This isn't quite complete as the top and bottom colors are still in
separate layers but their indices and masks can fit in just one, but
this requires reworking the texture setup (for another commit).
It turns out my approach to alias skin coloring just doesn't work for
the quake data due to the color curves not having a linear relationship,
especially the bottom colors.
It works on only one layer and one mip, and assumes the provided texture
data is compatible with the image, but does support sub-image updates
(x, y location as parameters, width and height in the texture data).
Currently only for gl/glsl/vulkan. However, rather than futzing with
con_width and con_height (and trying to guess good values), con_scale
(currently an integer) gives consistent pixel scaling regardless of
window size.
Well, sort of: it's still really in the renderer, but now calling
R_AddEfrags automatically updates the visibility structure as necessary,
and deleting an entity cleans up the efrags automatically. I wanted this
over twenty years ago.
While an entity with a null registry is null regardless of the id,
setting the id to nullent is useful for other purposes.
Fixes the disappearing brush models in the vulkan renderer (it uses
entity id + 1 for indexing), and prevents similar issues in the future.
It should have always been here, but when I first created the hierarchy
and transform objects, I didn't know where things would go. Having two
chunks of code for setting an entity's parent was too already too much,
and I expect to have other hierarchy types. Doesn't fix the issues
encountered with sbar, of course.
The text object covering the whole passage was not being initialized,
thus center print tried to print rubbish when (incorrectly) printing the
entire message.
I'm not particularly happy with the way onresize is handled, but at this
stage a better way of dealing with resizing views and getting the child
views to flow correctly hasn't come to mind. However, the system should
at least be usable.
This includes moving the related cvars from botn nq and qw into the
client hud code. In addition, the hud code supports update and
update-once function components. The update component is for updates
that occur every frame, but update-once components (not used yet) are
for one-shot updates (eg, when a value updates very infrequently).
Much of the nq/qw HUD system is quite broken, but the basic status bar
seems to be working nicely. As is the console (both client and server).
Possibly the biggest benefit is separating the rendering of HUD elements
from the updating of them, and much less traversing of invisible views
whose only purpose is to control the positioning of the visible views.
The view flow tests are currently disabled until I adapt the flow code
to ECS.
There seems to be a problem with view resizing in that some gravities
don't follow resizing correctly.
As the bookkeeping data is spread between three arrays, sorting a
component pool is not trivial and thus not something to duplicate around
the codebase.
It doesn't check that the entity itself is valid, but it does at least
check that the index fetched from the sparse array is valid. Fixes a
segfault when a valid entity never had the component.
It's not quite complete in that entities need to be created for the
objects, and passage text object might get additional components in the
hierarchy, but the direct use of views has been replaced by the use of a
hierarchy object with the same tree structure, and now has text objects
for paragraphs and the entire passage.
Another step towards moving all resource creation into the one place.
The motivation for doing the change was getting my test scene to work
with only ambient lights or no lights at all.
While the libraries are probably getting a little out of hand, the
separation into its own directory is probably a good thing as an ECS
should not be tied to scenes. This should make the ECS more generally
useful.
This fixes the segfault due to the world entity not actually existing,
without adding a world entity. It takes advantage of the ECS in that the
edge renderer needs only the world matrix, brush model pointer, and the
animation frame number (which is just 0/1 for brush models), thus the
inherent SOA of ECS helps out, though benchmarking is needed to see if
it made any real difference.
With this, all 4 renderers are working again.
Since entity_t has a pointer to the registry owning the entity, there's
no need to access a global to get at the registry. Also move component
getting closer to where it's used.
It no longer initializes the new component. For that, use
Ent_SetComponent which will copy in the provided data or call the
component's create function if the data pointer is null (in which case,
Ent_SetComponent acts as Ent_SetComponent used to).
This puts the hierarchy (transform) reference, animation, visibility,
renderer, active, and old_origin data in separate components. There are
a few bugs (crashes on grenade explosions in gl/glsl/vulkan, immediately
in sw, reasons known, missing brush models in vulkan).
While quake doesn't really need an ECS, the direction I want to take QF
does, and it does seem to have improved memory bandwidth a little
(uncertain). However, there's a lot more work to go (especially fixing
the above bugs), but this seems to be a good start.
Hierarchies are now much closer to being more general in that they are
not tied to 3d transforms. This is a major step to moving the whole
entity/transform system into an ECS.
That does feel a little redundant, but I think the System in ECS is
referring to the systems that run on the components, while the other
system is the support code for the ECS. Anyway...
This is based heavily on the information provided by @skipjack in his
github blog about EnTT. Currently, entity recycling and sparse arrays
for component pools have been implemented, and adding components to an
entity has been tested.
The inconsistencies in clang's handling of casts was bad enough, and its
silliness with certain extensions, but discovering that it doesn't
support raw strings was just too much. Yes, it gives a 3s boost to qfvis
on gmsp3v2.bsp, but it's not worth the hassle.
This is the beginning of adding ECS to QF. While the previous iteration
of hierarchies was a start in the direction towards ECS, this pulls most
of the 3d-specific transform stuff out of the hierarchy "objects",
making all the matrices and vectors/quaternions actual components (in
the ECS sense). There's more work to be done with respect to the
transform and entity members of hierarchy_t (entity should probably go
away entirely, and transform should become hierref_t (or whatever its
final name becomes), but I wanted to get things working sooner than
later.
The motivation for the effort was to allow views to use hierarchy_t,
which should be possible once I get entity and transform sorted out.
I am really glad I already had automated tests for hierarchies, as
things proved to be a little tricky to get working due to forgetting why
certain things were there.
Its value on input is ignored. QFV_CreateResource writes the resource
object's offset relative to the beginning of the shared memory block.
Needed for the Draw overhaul.
I got tired of writing the same 13 or so lines of code over and over (it
actually put me off experimenting with Vulkan). Thus...
QFV_PacketCopyBuffer does the work of handling barriers and a (full
packet) copy from the staging buffer to a GPU buffer.
QFV_PacketCopyImage does a similar job, but for images. However, it
still needs a lot of work, but it does make getting a basic texture onto
the GPU much less of a hassle.
Both functions should make staging data much less error-prone.
This moves the qfv_resobj_t image initialization code from the IQM
loader into the resource management code. This will allow me to reuse
the code for setting up glyph data. As a bonus, it cleans up the IQM
code nicely.
A passage object has a list of all the text objects in the given string,
where the objects represent either white space or "words", as well as a
view_t object representing the entire passage, with paragraphs split
into child views of the passage view, and each paragraph has a child
view for every text/space object in the paragraph.
Paragraphs are split by '\n' (not included in any object).
White space is grouped into clumps such that multiple adjacent spaces
form a single object. The standard ASCII space (0x20) and all of the
Unicode characters marked "WS;<compat> 0020" are counted as white space.
Unless a white space object is the first in the paragraph, its view is
marked for suppression by the view flow code.
Contiguous non-white space characters are grouped into single objects,
and their views are not suppressed.
All text object views (both white space and "word") have their data
pointer set to the psg_text_t object representing the text for that
view. This should be suitable for simple text-mode unattributed display.
More advanced rendering would probably want to create suitable objects
and set the view data pointers to those objects.
No assumption is made about text direction.
Passage and paragraph views need to have their primary axis sizes set
appropriately, as well as their resize flags. Their xlen and ylen are
both set to 10, and xpos,ypos is 0,0. Paragraph views need their
setgeometry pointer set to the appropriate view_flow_* function.
However, they are set up to have their secondary axis set automatically
when flowed.
Text object views are set up for automatic flowing: grav_flow, 0,0 for
xpos,ypos. However, xlen and ylen are also both 0, so need to be set by
the renderer before attempting to flow the text.
Adjusting the size of the parent (container) view to the views it
contains will be useful for automatic layout and knowing how large the
view is for scrolling. New tests added so testing both with and without
the option is still possible.
This should be suitable for laying out text objects with word-wrap,
where each view is a "word" or break between "words". This should be
useful for any other objects that could benefit from similar layout
rules. All eight flows are supported left-right-top-down (English and
most European languages), right-left-top-down (Arabic and similar),
top-down-right-left (Chinese, Japanese, Korean), top-down-left-right,
as well as bottom-up variants of those four.
More work is needed for support of things like views being centered on
the flow line rather than on one edge (depends on flow direction),
offset views, and others. Suppression of "spaces" at the beginning of a
line is supported but not tested.
I had missed that vkCmdCopyImage requires the source and destination
images to have exactly the same size, and I guess assumed that the
swapchain images would always be the size they said they were, but this
is not the case for tiled-optimal images. However,
vkCmdCopyImageToBuffer does the right thing regardless of the source
image size.
This fixes the skewed screenshots when the window size is not a multiple
of 8 (for me, might differ for others).
There's a problem with screenshot capture in that the image is sheared
after window resize, but the screen view looks good, and vulkan is happy
with the state changes.
I've found and mostly isolated the parts of the code that will be
affected by window resizing, minus pipelines but they use dynamic
viewport and scissor settings and thus shouldn't be affected so long as
the swapchain format doesn't change (how does that happen?)
This did involve changing some field names and a little bit of cleanup,
but I've got a better handle on what's going on (I think I was in one of
those coding trances where I quickly forget how things work).
Just head and tail are atomic, but it seems to work nicely (at least on
intel). I actually had more trouble with gcc (due to accidentally
testing lock-free with the wrong ring buffer... oops, but yup, gcc will
happily optimize your loop to spin really really fast). Also served as a
nice test for C11 threading.
This makes bsp traversal even more re-entrant (needed for shadows).
Everything needed for a "pass" is taken from bsp_pass_t (or indirectly
through bspctx_t (though that too might need some revising)).
There are some issues with the light renderers getting mangled, and only
the first light is even touched (just begin and end of render pass), but
this gets a lot of the framework into place.
Sounds odd, but it's part of the problem with calling two different
things with essentially the same name. The "high level" render pass in
question may be a compute pass, or a complex series of (Vulkan) render
passes and so won't create a Vulkan render pass for the "high level"
render pass (I do need to come up with a better name for it).
I really don't remember why I made it separate, though it may have been
to do with r_ent_queue. However, putting it together with the rest is
needed for the "render pass" rework.
It now lives in vulkan_renderpass.c and takes most of its parameters
from plist configs (just the name (which is used to find the config),
output spec, and draw function from C). Even the debug colors and names
are taken from the config.
QFV_CreateRenderPass is no longer used, and QFV_CreateFramebuffer hasn't
been used for a long time. The C file is still there for now but is
basically empty.
The software renderer uses Bresenham's line slice algorithm as presented
by Michael Abrash in his Graphics Programming Black Book Special Edition
with the serial numbers filed off (as such, more just so *I* can read
the code easily), along with the Chen-Sutherland line clipping
algorithm. The other renderers were more or less trivial in comparison.
Surfaces marked with SURF_DRAWALPHA but not SURF_DRAWTURB are put in a
separate queue for the water shader and run with a turb scale of 0.
Also, entities with colormod alpha < 1 are marked to go in the same
queue as SURF_DRAWALPHA surfaces (ie, no SURF_DRAWTURB unless the
model's texture indicated such).
Textures whose names start with a { are meant to be rendered with
transparency. Surfaces using those textures are marked with
SURF_DRAWALPHA.
Unfortunately, the mip levels of ad_tears' transparent textures use the
wrong color so only the highest LOD works properly, but those textures
are meant to be loaded from external files anyway, it seems.
A listener is used instead of (really, as well as) ie_app_window events
because systems that need to know about windows sizes may not have
anything to do with input and the event system.
This breaks console scaling for now (con_width and con_height are gone),
but is a major step towards window resize support as console stuff
should never have been in viddef_t in the first place.
The client screen init code now sets up a screen view (actually the
renderer's scr_view) that is passed to the client console so it can know
the size of the screen. The same view is used by the status bar code.
Also, the ram/cache/paused icon drawing is moved into the client screen
update code. A bit of duplication, but I do plan on merging that
eventually.
view_new sets the geometry, but any setgeometry that need a valid data
pointer would get null. It might be better to always have the data
pointer, but I didn't feel like doing such a change at this stage as
there are quite a lot of calls to view_new. Thus view_new_data which
sets the data pointer before calling setgeometry.
This replaces old_console_t with con_buffer_t for managing scrollback,
and draw_charbuffer_t for actual character drawing, reducing the number
of calls into the renderer. There are numerous issues with placement and
sizing, but the basics are working nicely.
I really don't know why I tried to do ring-buffers without gaps, the
code complication is just not worth the tiny savings in memory. In fact,
just the switch from pointers to 32-bit offsets saves more memory than
not having gaps (on 64-bit systems, no change on 32-bit).
It handles basic cursor motion respecting \r \n \f and \t (might be a
problem for id chars), wraps at the right edge, and automatically
scrolls when the cursor tries to pass the bottom of the screen.
Clearing the buffer resets its cursor to the upper left.
This is intended for the built-in 8x8 bitmap characters and quake's
"conchars", but could potentially be used for any simple (non-composed
characters) mono-spaced font. Currently, the buffers can be created,
destroyed, cleared, scrolled vertically in either direction, and
rendered to the screen in a single blast.
One of the reasons for creating the buffer is to make it so scaling can
be supported in the sw renderer.
PR_Debug_ValueString prints the value at the given offset using the
provided type to format the string. The formatted string is appended to
the provided dstring.
PR_Debug_ValueString prints the value at the given offset using the
provided type to format the string. The formatted string is appended to
the provided dstring.
If no handler has been registered, then the corresponding parameter is
printed as a pointer but with surrounding brackets (eg, [0xfc48]). This
will allow the ruamoko runtime to implement object printing.
If no handler has been registered, then the corresponding parameter is
printed as a pointer but with surrounding brackets (eg, [0xfc48]). This
will allow the ruamoko runtime to implement object printing.
While VRect_Difference worked for subrect allocation, it wasn't ideal as
it tended to produce a lot of long, narrow strips that were difficult to
reuse and thus wasted a lot of the super-rectangle's area. This is
because it always does horizontal splits first. However, rewriting
VRect_Difference didn't seem to be the best option.
VRect_SubRect (the new function) takes only width and height, and splits
the given rectangle such that if there are two off-cuts, they will be
both the minimum and maximum possible area. This does seem to make for
much better utilization of the available area. It's also faster as it
does only the two splits, rather than four.
It's implemented only in the Vulkan renderer, partly because there's a
lot of experimenting going on with it, but the glyphs do get transferred
to the GPU (checked in render doc). No rendering is done yet: still
thinking about whether to do a quick-and-dirty test, or to add HarfBuzz
immediately, and the design surrounding that.
The software renderer uses Bresenham's line slice algorithm as presented
by Michael Abrash in his Graphics Programming Black Book Special Edition
with the serial numbers filed off (as such, more just so *I* can read
the code easily), along with the Chen-Sutherland line clipping
algorithm. The other renderers were more or less trivial in comparison.
Most were pretty easy and fairly logical, but gib's regex was a bit of a
pain until I figured out the real problem was the conditional
assignments.
However, libs/gamecode/test/test-conv4 fails when optimizing due to gcc
using vcvttps2dq (which is nice, actually) for vector forms, but not the
single equivalent other times. I haven't decided what to do with the
test (I might abandon it as it does seem to be UD).
This gets ambient sounds (in particular, water and sky) working again
for quakeworld after the recent sound changes, and again for nq after I
don't know how long.
Getting the tag is possibly useful in general and definitely in
debugging. Setting, I'm not so sure as it should be done when allocated,
but that's not always possible.
Also, correct the return type of z_block_size, though it affected only
Z_Print. While an allocation larger than 4GB is... big for zone, the
blocks do support it, so printing should too.
And use it for Ruamoko object reference counts.
I need reference counts for dealing with block sound buffers since they
can be shared by many channels. I figured I take care of Ruamoko's
reference count location at the same time.
Fixes#27.
sfx_t is now private, and cd_file no longer accesses channel_t's
internals. This is necessary for hiding the code needed to make mixing
and channel management *properly* lock-free (I've been getting away with
murder thanks to x86's strong memory model and just plain luck with
gcc).
And make Sys_MaskPrintf take the developer enum rather than just a raw
int.
It was actually getting some nasty hunk corruption errors when under
memory pressure that made it clear the sound system needs some work.
I always wanted it there, there were dependency issues at the time. I
guess they got cleaned up for the most part since then (other than
cd_file, but it's on my hit-list).
The texture animation data is compacted into a small struct for each
texture, resulting in much less data access when animating the texture.
More importantly, no looping over the list of frames. I plan on
migrating this to at least the other hardware renderers.
The models are broken up into N sub-(sub-)models, one for each texture,
but all faces using the same texture are drawn as an instance, making
for both reduced draw calls and reduced index buffer use (and thus,
hopefully, reduced bandwidth). While texture animations are broken, this
does mark a significant milestone towards implementing shadows as it
should now be possible to use multiple threads (with multiple index and
entid buffers) to render the depth buffers for all the lights.
This allows the use of an entity id to index into the entity data and
fetch the transform and colormod data in the vertex shader, thus making
instanced rendering possible. Non-world brush entities are still not
rendered, but the world entity is using both the entity data buffer and
entid buffer.
Sub-models and instance models need an instance data buffer, but this
gets the basics working (and the proof of concept). Using arrays like
this actually simplified a lot of the code, and will make it easy to get
transparency without turbulence (just another queue).
The gl water warp ones have been useless since very early on due to not
doing water warp in gl (vertex warping just didn't work well), and the
recent water warp implementation doesn't need those hacks. The rest of
the removed flags just aren't needed for anything. SURF_DRAWNOALPHA
might get renamed, but should be useful for translucent bsp surfaces
(eg, vines in ad_tears).
One more step towards BSP thread-safety. This one brought with it a very
noticeable speed boost (ie, not lost in the noise) thanks to the face
visframes being in tightly packed groups instead of 128 bytes apart,
though the sw render's boost is lost in the noise (but it's very
fill-rate limited).
This is next critical step to making BSP rendering thread-safe.
visframe was replaced with cluster (not used yet) in anticipation of BSP
cluster reconstruction (which will be necessary for dealing with large
maps like ad_tears).
The main goal was to get visframe out of mnode_t to make it thread-safe
(each thread can have its own visframe array), but moving the plane info
into mnode_t made for better data access patters when traversing the bsp
tree as the plane is right there with the child indices. Nicely, the
size of mnode_t is the same as before (64 bytes due to alignment), with
4 bytes wasted.
Performance-wise, there seems to be very little difference. Maybe
slightly slower.
The unfortunate thing about the change is the plane distance is negated,
possibly leading to some confusion, particularly since the box and
sphere culling functions were affected. However, this is so point-plane
distance calculations can be done with a single 4d dot product.
GCC does a nice enough job compiling the more readable form (though
admittedly, hadd is possibly more readable than what's there for
dot[fd], hadd is supposedly slower than the shuffles and adds, and qfvis
seems to support that).
This fixes the annoying persistence of inputs when respawning and
changing levels. Axis input clearing is hooked up but does nothing as of
yet. Active device input clearing has always been hooked up, but also
does nothing in the evdev and x11 drivers.
It was added only because FitzQuake used it in its pre-bsp2 large-map
support. That support has been hidden in bspfile.c for some time now.
This doesn't gain much other than having one less type to worry about.
Well tested on Conflagrant Rodent (the map that caused the need for
mclipnode_t in the first place).
This was one of the biggest reasons I had trouble understanding the bsp
display list code, but it turns out it was for dealing with GLES's
16-bit limit on vertex indices. Since vulkan uses 32-bit indices,
there's no need for the extra layer of indirection. I'm pretty sure it
was that lack of understanding that prevented me from removing it when I
first converted the glsl bsp code to vulkan (ie, that 16-bit indices
were the only reason for elements_t).
It's hard to tell whether the change makes much difference to
performance, though it seems it might (noisy stats even over 50 timedemo
loops) and the better data localization indicate it should at least be
just as good if not better. However, the reason for the change is
simplifying the data structures so I can make bsp rendering thread-safe
in preparation for rendering shadow maps.
They should probably be cause leafsurfaces since they are the actual
surfaces of the leaf: ie, the faces of the leaf mesh if each leaf was
sub-sub-model.
For now, at least (I have some ideas to possibly reduce the numbers and
also to avoid the need for actual limits). I've seen gmsp3v2 use over
500 lights at once (it has over 1300), and I spent too long figuring out
that weird light behavior was due to the limit being hit and lights
getting dropped (and even longer figuring out that more weird behavior
was due to the lack of shadows and the world being too bright in the
first place).
Since the staging buffer allocates the command buffers it uses, it
needs to free them when it is freed. I think I was confused by the
validation layers not complaining about unfreed buffers when shutting
down, but that's because destroying the pool (during program shutdown,
when the validation layers would complain) frees all the buffers. Thus,
due to staging buffers being created and destroyed during the level load
process, (rather large) command buffers were piling up like imps in a
Doom level.
In the process, it was necessary to rearrange some of the shutdown code
because vulkan_vid_render_shutdown destroys the shared command pool, but
the pool is required for freeing the command buffers, but there was a
minor mess of long-lived staging buffers being freed afterwards. That
didn't end particularly well.
While gcc was quite correct in its warning, all I needed was to
explicitly truncate the string. I don't remember why I didn't do that
back when I made the changes in 4f58429137, but it works now, and the
surrounding code does expect the string to be no more than 15 chars
long. This fixes yet another memory leak (but timedemo over multiple
runs still leaks like a sieve).
This is meant for a "permanent" tear-down before freeing the memory
holding the VM state or at program shutdown. As a consequence, builtin
sub-systems registering resources are now required to pass a "destroy"
function pointer that will be called just before the memory holding
those resources is freed by the VM resource manager (ie, the manager
owns the resource memory block, but each subsystem is responsible for
cleaning up any resources held within that block).
This even enhances thread-safety in rua_obj (there are some problems
with cmd, cvar, and gib).
This gives a rather significant speed boost to timedemo demo1: from
about 2300-2360fps up to 2520-2600fps, at least when using
multi-texture.
Since it was necessary for testing the scrap, gl got the ability to set
the console background texture, too.
It's down to 128 bytes from 184, which fits nicely in two cache lines.
This made a nice difference to glsl, unknown to vulkan (it crashed after
about 31/51 timedemo loops), and was a was for sw and gl.
While it takes one extra step to grab the marksurface pointer,
R_MarkLeaves and R_MarkLights (the two actual users) seem to be either
the same speed or fractionally faster (by a few microseconds). I imagine
the loss gone to the extra fetch is made up for by better bandwidth
while traversing the leafs array (mleaf_t now fits in a single cache
line, so leafs are cache-aligned since hunk allocations are aligned).
It copies an entire hierarchy (minus actual entities, but I'm as yet
unsure how to proceed with them), even across scenes as the source scene
is irrelevant and the destination scene is used for creating the new
transforms.
Brush models looked a little too tricky due to the very different style
of command queue, so that's left for now, but alias, iqm and sprite
entities are now labeled. The labels are made up of the lower 5 hex
digits of the entity address, the position, and colored by the
normalized position vector. Not sure that's the best choice as it does
mean the color changes as the entity moves, and can be quite subtle
between nearby entities, but it still helps identify the entities in the
command buffer.
And, as I suspected, I've got multiple draw calls for the one ogre. Now
to find out why.
The bones aren't animated yet (and I realized I made the mistake of
thinking the bone buffer was per-model when it's really per-instance (I
think this mistake is in the rest of QF, too)), skin rendering is a
mess, need to default vertex attributes that aren't in the model...
Still, it's quite satisfying seeing Mr Fixit on screen again :)
I wound up moving the pipeline spec in with the rest of the pipelines as
the system isn't really ready for separating them.
Abyss of Pandemonium uses global ambient light a lot, but doesn't
specify it in every map (nothing extracting entities and adding a
reasonable value can't fix). I imagine some further tweaking will be
needed.
The parsing of light data from maps is now in the client library, and
basic light management is in scene. Putting the light loading code into
the Vulkan renderer was a mistake I've wanted to correct for a while.
The client code still needs a bit of cleanup, but the basics are working
nicely.
This replaces *_NewMap with *_NewScene and adds SCR_NewScene to handle
loading a new map (for quake) in the renderer, and will eventually be
how any new scene is loaded.
This leaves only the one conditional in the shader code, that being the
distance check. It doesn't seem to make any noticeable difference to
performance, but other than explosion sprites being blue, lighting
quality seems to have improved. However, I really need to get shadows
working: marcher is just silly-bright without them, and light levels
changing as I move around is a bit disconcerting (but reasonable as
those lights' leaf nodes go in and out of visibility).
Id Software had pretty much nothing to do with the vulkan renderer (they
still get credit for code that's heavily based on the original quake
code, of course).
The model system is rather clunky as it is focused around caching, so
unloading is more of a suggestion than anything, but it was good enough
for testing loading and unloading of IQM models in Vulkan.
Despite the base IQM specification not supporting blend-shapes, I think
IQM will become the basis for QF's generic model representation (at
least for the more advanced renderers). After my experience with .mu
models (KSP) and unity mesh objects (both normal and skinned), and
reviewing the IQM spec, it looks like with the addition of support for
blend-shapes, IQM is actually pretty good.
This is just the preliminary work to get standard IQM models loading in
vulkan (seems to work, along with unloading), and they very basics into
the renderer (most likely not working: not tested yet). The rest of the
renderer seems to be unaffected, though, which is good.
The resource subsystem creates buffers, images, buffer views and image
views in a single batch operation, using a single memory object to back
all the buffers and images. I had been doing this by hand for a while,
but got tired of jumping through all those vulkan hoops. While it's
still a little tedious to set up the arrays for QFV_CreateResource (and
they need to be kept around for QFV_DestroyResource), it really eases
calculation of memory object size and sub-resource offsets. And
destroying all the objects is just one call to QFV_DestroyResource.
I might need to do similar for other formats, but i ran into the problem
of the texture type being tex_palette instead of the expected tex_rgba
when pre-(no-)loading a tga image resulting in Vulkan not liking my
attempt at generating mipmaps.
Having to refigure out what values are going into the vectors got old
very fast. The comments don't help with verifying the values, but at
least I can tell at a glance where 2(xy - wz) goes and thus determine
the "orientation" of the matrix.
pr_type_t now contains only the one "value" field, and all the access
macros now use their PACKED variant for base access, making access to
larger types more consistent with the smaller types.
They're really redundant, and removing the next pointer makes for a
slightly smaller cvar struct. Cvar_Select was added to allow finding
lists of matching cvars.
The tab-completion and config saving code was reworked to use the hash
table DO functions. Comments removed since the code was completely
rewritten, but still many thanks to EvilTypeGuy and Fett.
Hash_Select returns a list of elements that match a given criterion
(select callback returning non-0).
Hash_ForEach simply calls a function for every element.
And use it for hud_scoreboard_gravity. Putting the enum def in view made
the most sense as view does own the base type and the enum is likely to
be by useful for other settings.
My script didn't know what type to make the cvars since they're not used
directly by the code, so they got treated as strings instead of ints or
floats.
This is an extremely extensive patch as it hits every cvar, and every
usage of the cvars. Cvars no longer store the value they control,
instead, they use a cexpr value object to reference the value and
specify the value's type (currently, a null type is used for strings).
Non-string cvars are passed through cexpr, allowing expressions in the
cvars' settings. Also, cvars have returned to an enhanced version of the
original (id quake) registration scheme.
As a minor benefit, relevant code having direct access to the
cvar-controlled variables is probably a slight optimization as it
removed a pointer dereference, and the variables can be located for data
locality.
The static cvar descriptors are made private as an additional safety
layer, though there's nothing stopping external modification via
Cvar_FindVar (which is needed for adding listeners).
While not used yet (partly due to working out the design), cvars can
have a validation function.
Registering a cvar allows a primary listener (and its data) to be
specified: it will always be called first when the cvar is modified. The
combination of proper listeners and direct access to the controlled
variable greatly simplifies the more complex cvar interactions as much
less null checking is required, and there's no need for one cvar's
callback to call another's.
nq-x11 is known to work at least well enough for the demos. More testing
will come.
The prefix gives more context to the error messages, making the system a
lot easier to use (it was especially helpful when getting my cvar revamp
into shape).
Based on the flags type used in vkparse (difference is the lack of
support for plists). Having this will make supporting named flags in
cvars much easier (though setting up the enum type is a bit of a chore).
This allows for easy (and safe) printing of cexpr values where the type
supports it. Types that don't support printing would be due to being too
complex or possibly write-only (eg, password strings, when strings are
supported directly).
Really, this won't make all that much difference because alias models
with more than one skin are quite rare, and those with animated skin
groups are even rarer. However, for those models that do have more than
one skin, it will allow for reduced allocation overheads, and when
supported (glsl, vulkan, maybe gl), loading all the skins into an array
texture (since all skins are the same size, though external skins may
vary), but that's not implemented yet, this just wraps the old one skin
at a time code.
This makes much more sense as they are intimately tied to the frame
buffer on which a render pass is working. Now, just the window width
and height are stored in vulkan_ctx_t. As a side benefit,
QFV_CreateSwapchain no long references viddef (now just palette and
conview in vulkan_draw.c to go).
While I have trouble imagining it making that much performance
difference going from 4 verts to 3 for a whopping 2 polygons, or even
from 2 triangles to 1 for each poly, using only indices for the vertices
does remove a lot of code, and better yet, some memory and buffer
allocations... always a good thing.
That said, I guess freeing up a GPU thread for something else could make
a difference.
I think I had gotten lucky with captures not being corrupt due to them
being much bigger than all but the L3 cache (and then they're over 1/2
the size), so the memory was being automatically invalidated by other
activity. Don't want to trust such luck, though.
This means that a tex_t object is passed in instead of just raw bytes
and width and height, but it means the texture can specify whether it's
flipped or uses BGR instead of RGB. This fixes the upside down
screenshots for vulkan.
This fixes (*ahem*) the vulkan renderer segfaulting when attempting to
take a screenshot. However, the image is upside down. Also, remote
snapshots and demo capture are broken for the moment.
QFS_NextFilename was renamed to QFS_NextFile to reflect the fact it now
returns a QFile pointer for the newly created file (as well as the
name). This necessitated updating WritePNG to take a file pointer
instead of a file name, with the advantage that WritePNGqfs is no longer
necessary and callers have much more control over the creation of the
file.
This makes QFS_NextFile much more secure against file system race
conditions and attacks (at least in theory). If nothing else, it will
make it more robust in a multi-threaded environment.
QF currently uses unique file names for screenshots and server-side
demos (and remote snapshots), but they're generally useful.
QFS_NextFilename has been filling this role, but it is highly insecure
in its current implementation. This is the first step to taking care of
that.
clang doesn't like the same variable name being used in nested
expression statements, so give the "safety" variables in reused macros
semi-meaningful (based on macro name) tails to keep them separate.
gcc and clang have rather different swizzle builtins, but both do a nice
job of optimizing the intuitive initializer swizzle (I think gcc 8(?)
didn't do such a good job thus my use of __builtin_shuffle).
Viewport and FOV updates are now separate so updating one doesn't cause
recalculations of the other. Also, perspective setup is now done
directly from the tangents of the half angles for fov_x and fov_y making
the renderers independent of fov/aspect mode. I imagine things are a bit
of a mess with view size changes, and especially screen size changes
(not supported yet anyway), and vulkan winds up updating its projection
matrices every frame, but everything that's expected to work does
(vulkan errors out for fisheye or warp due to frame buffer creation not
being supported yet).
Definitely not something for the renderer to care about directly (ie, at
most, a post-process filter setting or palette update, which is how it
actually is currently).
I meant to do this a while ago but forgot about it. Things are a bit of
a mess in that the renderer knows too much about entities, but
eventually the renderer will know about only things to render (meshes,
particles, etc).
The quake-specific enums are now in the client header, and the particle
system now has a gravity field rather than getting it from
vid_render_data (which I hope to eventually get rid of entirely).
r_refdef is really meant for holding the various screen "constants" for
the software renderer rather than the more generic scene stuff. All the
fields referenced by the low level rendering code (especially assembly)
have been moved to the beginning of the struct (and nicely fit within 64
bytes). The other fields should be moved elsewhere, but not this commit.
On top of that, R_ViewChanged is much easier to read, and there are
fewer static globals.
Of course, it's not as correct as glsl or sw due to using polygons and
uvs rather than a fragment shader (not that such is out of the question
since GL 3.0 is requested, but I don't feel like getting shaders going
just for a couple of post-processing effects in an obsolete renderer).
While it's not where I want it to be, it at least now no longer messes
with frame buffer binding or the view ports. This involved switching
around buffers in D_WarpScreen so that the main buffer could be bound
before post-processing.
The code dealing with state is a bit of a mess, but everything is
working nicely. Get around 400fps when all 6 faces need to be rendered
(no surprise: it should be about 1/6 of that for normal rendering). The
messy state handling code did not come as a surprise as I suspected
there were various mistakes in my scene rendering "recipe", and fisheye
highlighted them nicely (I'm sure getting this stuff working in Vulkan
will highlight even more issues).
Finally, after a decade :P Looks pretty good, too, and is (almost)
properly scaled to the resolution (almost because the effect is a little
squashed, but I think the sw renderer does the same).
Again, gl/vulkan not working yet (on the assumption that sw would be
trickier).
Fisheye overrides water warp because updating the projection map every
frame is far too expensive.
I've added a post-process pass to the interface in order to hide the
implementation details, but I'm not sure I'm happy about how the
multi-pass rendering for cube maps is handled (or having the frame
buffers as exposed as they are), but mainly because Vulkan will make
implementation interesting.
For now, OpenGL and Vulkan renderers are broken as I focused on getting
the software renderer working (which was quite tricky to get right).
This fixes a couple of issues: the segfault when warping the screen (due
to the scene rendering move invalidating the warp buffer), and warp
always having 320x200 resolution. There's still the problem of the
effect being too subtle at high resolution, but that's just a matter of
updating the tables and tweaking the code in D_WarpScreen.
Another issue is the Draw functions should probably write directly to
the main frame buffer or even one passed in as a parameter. This would
remove the need for binding the main buffer at the beginning and end of
the frame.