Commit graph

11499 commits

Author SHA1 Message Date
Bill Currie
8a5c3c1ac1 [util] Add sys function to get cpu count
And use it in qfvis.
2021-08-13 21:26:48 +09:00
Bill Currie
d88bd96390 [qfvis] Use cmem for portal flow stack node allocation
The portal flow stack nodes contain a simd vector, which requires
16-byte alignment. However, on 32-bit Windows, malloc returns 8-byte
aligned memory, leading to eventual segfaults. Since pstack_t is 48
bytes on 32-bit systems, it fits nicely into a 64-byte aligned cache
line (or two on 64-bit systems due to being 80 bytes).
2021-08-13 11:33:03 +09:00
Bill Currie
6fb6885b88 [qfvis] Allocate only 128MB for the main hunk
Even ad_tears didn't really need 1GB, and 32-bit machines can't really
handle 1GB (at least on windows).
2021-08-13 11:33:03 +09:00
Bill Currie
a01cafe972 [util] Minimize set growth
At the low level, only unions can cause a set to grow. Of course, things
get interesting at the higher level when infinite (inverted) sets are
mixed in.
2021-08-11 12:31:03 +09:00
Bill Currie
37a5b475c0 [util] Minimize the string for infinite sets
Instead of printing every representable member of an infinite set (ie,
up to element 63 in a set that can hold 64 elements), only those
elements up to one after the last non-member are listed. For example,

    {...} - {2 3} -> {0 1 4 ...}

This makes reading (and testing!) infinite sets much easier.
2021-08-11 12:31:03 +09:00
Bill Currie
a6a273bb07 [vulkan] Fix up test function api
Whil I can't automatically run the tests in windows builds, at least I
can make sure they build (and run individual ones by hand as necessary).
2021-08-11 12:31:03 +09:00
Bill Currie
aa72f1dc31 [util] Fix reversed finite-infinite set union ops
It looks like I tried to test it, but my tests weren't so good This
seems to cover everything for the three main set ops.
2021-08-11 12:31:03 +09:00
Bill Currie
c81f2d4b52 [video] Mark dga funcs as const when dga not available
Fixes a compile issue (warning about attribute const) when dga is not
available, thanks to spiritiit for finding it :)
2021-08-11 12:09:07 +09:00
Bill Currie
2828500f04 [qfvis] Delay freeing of winding memory
If anything, this is probably a nano-optimization, depending on how
often portals are vis-rejected. I couldn't see any actual difference.
2021-08-08 12:34:18 +09:00
Bill Currie
9a93bf8d4a [qfvis] Make cluster reconstruction O(N)
For most (if not all) maps. The heapsort is needed only if the clustered
leafs are not contiguous, but most bsp compilers output contiguous leaf
clusters, so is just a bit of protection. The difference isn't really
noticeable on a fast machine, but no point in doing more work than
necessary.
2021-08-08 12:34:18 +09:00
Bill Currie
b320c3352f [util] Make set_t endian-agnostic
Most of the set ops were always endian-agnostic since they were simply
operating on multiple bits in parallel, but individual element
add/remove/test was very endian-dependent. For the most part, this
didn't matter, but it does matter very much when loading external data
into a set or writing the data out (eg, for PVS).
2021-08-08 12:34:18 +09:00
Bill Currie
648ae3f877 [qfvis] Clean up the code and output a little
Dead code removed, and the job progress lines are now consistent and
have a job completion time when done.
2021-08-03 21:52:24 +09:00
Bill Currie
fe998f41b4 [qfvis] Convert leaf vis to cluster vis
Now that only 3852 clusters need to be checked for each cluster, fat-pvs
construction for ad_tears completes in about 0.7s, most of which seems
to be loading, conversion, compression and writing. O(N^3) cuts both
ways (hurts like crazy when N increases, does wonders when N decreases,
especially by a factor of 25). And then throw in improved cache
performance...

I suspect having an off-line compiler is still useful, but even if
qfvis's implementation never actually gets used, if cluster
reconstruction is put in the engine, large maps will be feasible even
for quakeworld. Just the reduced memory requirements alone will be a
huge benefit (~3GB down to 1.8MB).
2021-08-03 15:48:45 +09:00
Bill Currie
8da019e31c [qfvis] Reconstruct the leaf clusters in a bsp
This is only the first half (vertical) in that the vis bits are still
for the leafs rather than the clusters, but ad_tears goes from 500s to
7s for calculating the fat pvs (3852 clusters).
2021-08-03 11:37:24 +09:00
Bill Currie
42dc30ec29 [vulkan] Increase main staging buffer to 32MB
1k 32bpp sky textures need 24MB to load. Though there's always better
handling of running out of stating buffer (ie, flushing and trying
again).
2021-08-02 23:17:55 +09:00
Bill Currie
421047328a [qfvis] Catch a missed winding mark stat 2021-08-02 23:17:55 +09:00
Bill Currie
d56d8ac707 [util] Loosen up the epsilon on simd seb tests
It seems my eeepc's SSE units don't get quite the same answers as does
my i7's (maybe due to lack of hadd?).
2021-08-02 23:15:20 +09:00
Bill Currie
80b17623b1 [util] Fix an out-by-one in pqueue tests
Showed up only when the data arrays were packed.
2021-08-02 23:08:14 +09:00
Bill Currie
ec54c54226 [build] Fix some windows bitrot 2021-08-02 14:02:41 +09:00
Bill Currie
d99fb01b65 [build] Fix some distcheck bitrot 2021-08-02 13:47:00 +09:00
Bill Currie
f76964b86b [util] Add an priority queue implementation
Done via macros (like darray and ringbuffer). Might prove useful for
qfvis and maybe dynamic lights.
2021-08-02 13:29:55 +09:00
Bill Currie
4f2113bc05 [util] Enable accidentally disabled seb tests 2021-08-02 12:44:08 +09:00
Bill Currie
e4984aad17 [util] Add functions for binary heaps
Sink, swim, build and sort, both "simple" and with a data parameter for
the compare function.
2021-08-02 12:44:08 +09:00
Bill Currie
f514345d77 [qfbsp] Show correct object counts for bsp29 files
Yes, this was the goal of that size_t change, but it made sense anyway.
2021-08-01 22:05:31 +09:00
Bill Currie
674ffa0941 [util] Make bsp_t counts size_t
and other bsp data counts unsigned, and clean up the resulting mess.
2021-08-01 21:54:05 +09:00
Bill Currie
40367e5bca [qfvis] Thread the ambient sounds computation 2021-08-01 18:14:28 +09:00
Bill Currie
e671b3f230 [qfvis] Thread the portal vis compaction
The compaction deals with merging all the portal visibility into cluster
visibility, expanding out to leafs, and final compression.
2021-08-01 17:06:13 +09:00
Bill Currie
80a89c5e1e [util] Write the correct bsp format id for bsp2
Oops :P
2021-08-01 14:07:24 +09:00
Bill Currie
523ab007d6 [qfvis] Produce more details base-vis stats
And nicely, things add up (after fixing 32-bit overflows :P)
2021-07-30 23:07:12 +09:00
Bill Currie
ca8dcf3fa9 [qfvis] Use cluster sphere culling for base vis
While this doesn't give as much of a boost as does basic sphere culling
(since it's just culling sphere tests), it took ad_tears' base vis from
1000s to 720s on my machine.
2021-07-30 18:52:47 +09:00
Bill Currie
9d819254d4 [util] Make a number of improvements to SEB
Attempting to vis ad_tears drags a few lurking bugs out of
SmallestEnclosingBall_vf: poor calculation of 2-point affine space, poor
handling of duplicate points and dropped support points, poor
calculation of the new center (related to duplicate points), and
insufficient iterations for large point sets. qfvis (modified for
cluster spheres) now loads ad_tears.
2021-07-30 14:57:47 +09:00
Bill Currie
9461779ba7 [qfvis] Remove the cluster portals limit
This removes the last of the arbitrary limits from qfvis. The goal is
not so much supporting crazy maps, but more about better data usage
(cluster_t is now 24 (or 16) bytes instead of 1048 (or 528). And
passages isn't used (yet?)...
2021-07-29 21:03:07 +09:00
Bill Currie
fe98a513bc [util] Add a function to check hunk pointers
Its only real utility is to check that a pointer is not pointing into
freed space.
2021-07-29 15:27:48 +09:00
Bill Currie
756214ca8e [qfvis] Use unsigned for the plane side tests
Doesn't make any difference to the number of instructions, but seeing
sar instead of shr bothers me when working with bit patterns.
2021-07-29 15:25:37 +09:00
Bill Currie
6d312aaa63 [simd] Check the distance to the affine point
As per usual, fp math finds a way to confound any epsilon test. So
rather than relying entirely on test_support_points, check the distance
from the sphere center to the affine point and break out of the loop if
the distance is small enough (< 1% of the current radius). This allows
qfvis to load ad_tears without hacks.
2021-07-29 15:15:14 +09:00
Bill Currie
45aa8e6504 [util] Loosen affine test epsilon for SEB
Scaling the checks by 1e-6 was a little too tight for very small
triangles, but 1e-5 seems to work well. This fixes SEB getting stuck for
a ridiculously small (for quake) triangle in ad_tears (probably resulted
from some bad math in qfbsp when generating the portal file from the
bsp).
2021-07-29 15:03:54 +09:00
Bill Currie
4f51a3b406 [utils] Fix set tests for 32-bit machines 2021-07-29 14:10:18 +09:00
Bill Currie
72a1fef714 [qfvis] Use hunk to manage winding memory
It turns out cmem is not so good for many large allocations (probably a
bug in handling the blocks), but was really meant for lots of little
churning allocations anyway. After an analysis of winding lifetimes, it
became clear that the hunk allocator would work very well. The base
windings are allocated from a global hunk (currently 1GB, plenty for
even ad_tears), and ephemeral windings are allocated from a per-thread
hunk of 1MB (seems to be way more than enough: gmsp3v2 uses a maximum of
only 56064 bytes, and ad_tears got through 30% before I gave up on it).
Any speed difference (for gmsp3v2) seems to be lost in the noise: still
completing in 38.4s on my machine.
2021-07-29 11:49:18 +09:00
Bill Currie
8f376a48f8 [util] Add raw versions of hunk alloc and free
They do not clear memory and thus are good for situations where speed is
more critical.
2021-07-29 11:44:10 +09:00
Bill Currie
ca63c0360a Do an audit of hunk mark usage
I realized that after making the hunk 64-bit clean, I had forgotten to
go through and convert all the saved marks to size_t.
2021-07-29 11:43:27 +09:00
Bill Currie
54604d9aa2 [util] Make hunk (optionally) thread-safe
For now, the functions check for a null hunk pointer and use the global
hunk (initialized via Memory_Init) if necessary. However, Hunk_Init is
available (and used by Memory_Init) to create a hunk from any arbitrary
memory block. So long as that block is 64-byte aligned, allocations
within the hunk will remain 64-byte aligned.
2021-07-29 11:43:27 +09:00
Bill Currie
8fdd9c1f5a [util] Write some tests for utf8 r/w
And fix some errors with 5-byte encodings.
2021-07-27 23:29:14 +09:00
Bill Currie
e39bc83a6a [qfvis] Optionally use utf8 to encode run lengths
Adds 50 bytes to marcher's fat-pvs, but removes about 4.7MB from
ad_tear's fat-pvs.
2021-07-27 23:29:14 +09:00
Bill Currie
5b4428420e [utils] Get utf-8 writing working for up to 11 bits
I need to write some automated tests for this, and reading of course,
but 1 and two byte outputs look correct. Kind of sad it took sixteen
years to get around to attempting to use the code :(
2021-07-27 23:29:02 +09:00
Bill Currie
b9d2882e02 [qfvis] Write out the fat-pvs file
The output fat-pvs data is the *difference* between the base pvs and fat
pvs. This currently makes for about 64kB savings for marcher.bsp, and
about 233MB savings for ad_tears.bsp (or about 50% (470.7MB->237.1MB)).
I expect using utf-8 encoding for the run lengths to make for even
bigger savings (the second output fat-pvs leaf of marcher.bsp is all 0s,
or 6 bytes in the file, which would reduce to 3 bytes using utf-8).
2021-07-27 20:04:19 +09:00
Bill Currie
9e2c474d38 [model] Ensure the pvs is not inverted
Mod_DecompressVis_set (via Mod_LeafPVS_set) can be used to recycle pvs
sets, but the set may have been set to everything at some stage, which
is implemented by inverting the set (making the set infinite) and having
1-bits remove elements from the set. This is most definitely not wanted
for pvs :)

Currently undecided what to do about Mod_DecompressVis_mix, thus the
fixme.

Fixes the flickering lights in any map where the camera is out of the
map for a single frame (eg, start.bsp, The Catacombs (hipnotic, hip2m3)).
2021-07-27 17:54:50 +09:00
Bill Currie
1f57f81a20 [qw] Use set_count to count the pvs/phs leafs 2021-07-27 14:01:08 +09:00
Bill Currie
163d147044 [util] Give set_count a >8x speed boost
I knew counting bits individually was slow, but it never really mattered
until now. However, I didn't expect such a dramatic boost just by going
to mapping bytes to bit counts. 16-bit words would be faster still, but
the 64kB lookup table would probably start hurting cache performance,
and 32-bit words (4GB table) definitely would ruin the cache. The
universe isn't big enough for 64-bits :)
2021-07-27 13:54:22 +09:00
Bill Currie
50740c1014 [model] Remove the confusion about numleafs
The fact that numleafs did not include leaf 0 actually caused in many
places due to never being sure whether to add 1. Hopefully this fixes
some of the confusion. (and that comment in sv_init didn't last long :P)
2021-07-27 12:38:08 +09:00
Bill Currie
49c3dacbbc [util] Rename set_size to set_count
After seeing set_size and thinking it redundant (thought it returned the
capacity of the set until I checked), I realized set_count would be a
much better name (set_count (node->successors) in qfcc does make much
more sense).
2021-07-27 11:52:21 +09:00