Commit Graph

167 Commits

Author SHA1 Message Date
Bill Currie 8a5c3c1ac1 [util] Add sys function to get cpu count
And use it in qfvis.
2021-08-13 21:26:48 +09:00
Bill Currie d88bd96390 [qfvis] Use cmem for portal flow stack node allocation
The portal flow stack nodes contain a simd vector, which requires
16-byte alignment. However, on 32-bit Windows, malloc returns 8-byte
aligned memory, leading to eventual segfaults. Since pstack_t is 48
bytes on 32-bit systems, it fits nicely into a 64-byte aligned cache
line (or two on 64-bit systems due to being 80 bytes).
2021-08-13 11:33:03 +09:00
Bill Currie 6fb6885b88 [qfvis] Allocate only 128MB for the main hunk
Even ad_tears didn't really need 1GB, and 32-bit machines can't really
handle 1GB (at least on windows).
2021-08-13 11:33:03 +09:00
Bill Currie 2828500f04 [qfvis] Delay freeing of winding memory
If anything, this is probably a nano-optimization, depending on how
often portals are vis-rejected. I couldn't see any actual difference.
2021-08-08 12:34:18 +09:00
Bill Currie 9a93bf8d4a [qfvis] Make cluster reconstruction O(N)
For most (if not all) maps. The heapsort is needed only if the clustered
leafs are not contiguous, but most bsp compilers output contiguous leaf
clusters, so is just a bit of protection. The difference isn't really
noticeable on a fast machine, but no point in doing more work than
necessary.
2021-08-08 12:34:18 +09:00
Bill Currie 648ae3f877 [qfvis] Clean up the code and output a little
Dead code removed, and the job progress lines are now consistent and
have a job completion time when done.
2021-08-03 21:52:24 +09:00
Bill Currie fe998f41b4 [qfvis] Convert leaf vis to cluster vis
Now that only 3852 clusters need to be checked for each cluster, fat-pvs
construction for ad_tears completes in about 0.7s, most of which seems
to be loading, conversion, compression and writing. O(N^3) cuts both
ways (hurts like crazy when N increases, does wonders when N decreases,
especially by a factor of 25). And then throw in improved cache
performance...

I suspect having an off-line compiler is still useful, but even if
qfvis's implementation never actually gets used, if cluster
reconstruction is put in the engine, large maps will be feasible even
for quakeworld. Just the reduced memory requirements alone will be a
huge benefit (~3GB down to 1.8MB).
2021-08-03 15:48:45 +09:00
Bill Currie 8da019e31c [qfvis] Reconstruct the leaf clusters in a bsp
This is only the first half (vertical) in that the vis bits are still
for the leafs rather than the clusters, but ad_tears goes from 500s to
7s for calculating the fat pvs (3852 clusters).
2021-08-03 11:37:24 +09:00
Bill Currie 421047328a [qfvis] Catch a missed winding mark stat 2021-08-02 23:17:55 +09:00
Bill Currie ec54c54226 [build] Fix some windows bitrot 2021-08-02 14:02:41 +09:00
Bill Currie 674ffa0941 [util] Make bsp_t counts size_t
and other bsp data counts unsigned, and clean up the resulting mess.
2021-08-01 21:54:05 +09:00
Bill Currie 40367e5bca [qfvis] Thread the ambient sounds computation 2021-08-01 18:14:28 +09:00
Bill Currie e671b3f230 [qfvis] Thread the portal vis compaction
The compaction deals with merging all the portal visibility into cluster
visibility, expanding out to leafs, and final compression.
2021-08-01 17:06:13 +09:00
Bill Currie 523ab007d6 [qfvis] Produce more details base-vis stats
And nicely, things add up (after fixing 32-bit overflows :P)
2021-07-30 23:07:12 +09:00
Bill Currie ca8dcf3fa9 [qfvis] Use cluster sphere culling for base vis
While this doesn't give as much of a boost as does basic sphere culling
(since it's just culling sphere tests), it took ad_tears' base vis from
1000s to 720s on my machine.
2021-07-30 18:52:47 +09:00
Bill Currie 9461779ba7 [qfvis] Remove the cluster portals limit
This removes the last of the arbitrary limits from qfvis. The goal is
not so much supporting crazy maps, but more about better data usage
(cluster_t is now 24 (or 16) bytes instead of 1048 (or 528). And
passages isn't used (yet?)...
2021-07-29 21:03:07 +09:00
Bill Currie 756214ca8e [qfvis] Use unsigned for the plane side tests
Doesn't make any difference to the number of instructions, but seeing
sar instead of shr bothers me when working with bit patterns.
2021-07-29 15:25:37 +09:00
Bill Currie 72a1fef714 [qfvis] Use hunk to manage winding memory
It turns out cmem is not so good for many large allocations (probably a
bug in handling the blocks), but was really meant for lots of little
churning allocations anyway. After an analysis of winding lifetimes, it
became clear that the hunk allocator would work very well. The base
windings are allocated from a global hunk (currently 1GB, plenty for
even ad_tears), and ephemeral windings are allocated from a per-thread
hunk of 1MB (seems to be way more than enough: gmsp3v2 uses a maximum of
only 56064 bytes, and ad_tears got through 30% before I gave up on it).
Any speed difference (for gmsp3v2) seems to be lost in the noise: still
completing in 38.4s on my machine.
2021-07-29 11:49:18 +09:00
Bill Currie e39bc83a6a [qfvis] Optionally use utf8 to encode run lengths
Adds 50 bytes to marcher's fat-pvs, but removes about 4.7MB from
ad_tear's fat-pvs.
2021-07-27 23:29:14 +09:00
Bill Currie b9d2882e02 [qfvis] Write out the fat-pvs file
The output fat-pvs data is the *difference* between the base pvs and fat
pvs. This currently makes for about 64kB savings for marcher.bsp, and
about 233MB savings for ad_tears.bsp (or about 50% (470.7MB->237.1MB)).
I expect using utf-8 encoding for the run lengths to make for even
bigger savings (the second output fat-pvs leaf of marcher.bsp is all 0s,
or 6 bytes in the file, which would reduce to 3 bytes using utf-8).
2021-07-27 20:04:19 +09:00
Bill Currie 49c3dacbbc [util] Rename set_size to set_count
After seeing set_size and thinking it redundant (thought it returned the
capacity of the set until I checked), I realized set_count would be a
much better name (set_count (node->successors) in qfcc does make much
more sense).
2021-07-27 11:52:21 +09:00
Bill Currie 946867c82e [qfvis] Start work on an off-line fat pvs compiler
Extremely large maps take a very long time to process their PVS sets for
PHS or shadows, so having an off-line compiler seems like a good idea.
The data isn't written out yet, and the fat pvs code may not be optimal
for cache access, but it gets through ad_tears in about 500s (12
threads, compared to 2100s single-threaded in the qw server).
2021-07-26 22:42:03 +09:00
Bill Currie 0a847f92f1 [util] Use mmap/munmap for cmem internal alloc/free
This reduces the overhead needed to manage the memory blocks as the
blocks are guaranteed to be page-aligned. Also, the superblock is now
alllocated from within one of the memory blocks it manages. While this
does slightly reduce the available cachelines within the first block (by
one or two depending on 32 vs 64 bit pointers), it removes the need for
an extra memory allocation (probably via malloc) for the superblock.
2021-07-12 16:33:47 +09:00
Bill Currie 778c07e91f [util] Get vectors working for non-SSE archs
GCC does a fairly nice job of producing code for vector types when the
hardware doesn't support SIMD, but it seems to break certain math
optimization rules due to excess precision (?). Still, it works well
enough for the core engine, but may not be well suited to the tools.
However, so far, only qfvis uses vector types (and it's not tested yet),
and tools should probably be used on suitable machines anyway (not
forces, of course).
2021-06-01 18:53:53 +09:00
Bill Currie 574a123716 [qfvis] Remove obsolete notes file
While some of it is still correct, I'd rather start afresh next time I
need to sort that stuff out.
2021-03-28 21:14:17 +09:00
Bill Currie 634219ea06 [qfvis] Add set debug prints (disabled)
They were useful for narrowing down why mightsee wasn't being updated.
2021-03-28 21:11:13 +09:00
Bill Currie 9f42943589 [qfvis] Reset portal status after base vis
This fixes the mightsee updates never occurring, but it doesn't make a
huge difference (though I suppose it might have back in the 90s, or with
a different map).
2021-03-28 21:06:50 +09:00
Bill Currie 0fa65be106 [qfvis] Fix stats collection for mightseeupdate
The stats were being updated before UpdateMightsee was getting called,
and it was incrementing the wrong value (so it would not have been
thread-safe).
2021-03-28 21:06:50 +09:00
Bill Currie ff4cd84891 [qfvis] Use simd vector code
While whether it's any faster is debatable (it's slightly slower, but
many more portals are being tested due to different rounding in the base
vis stage), it's certainly easier to read.
2021-03-28 19:55:47 +09:00
Bill Currie eb325376b1 [qfvis] Collect base vis culling stats
Specifically, just how many are culled by sphere and winding tests.
2021-03-28 12:17:15 +09:00
Bill Currie d072a7b99c [qfvis] Add stats for memory usage
Verbosity levels probably need more tweaking, but -v is at least a
little more usable.
2021-03-27 23:04:13 +09:00
Bill Currie 3ef38188ce [qfvis] Add an option to limit the processed portals
It's not documented as I needed it for debugging memory allocations and
it causes qfvis to error out due to unprocessed portals.
2021-03-27 20:59:56 +09:00
Bill Currie f2b6b23acc [qfvis] Switch to unsigned for various counts 2021-03-27 20:55:15 +09:00
Bill Currie 72280186bf [qfvis] Use cmem for memory management
While the main bulk of the improvement (36s down from 42s for
gmsp3v2.bsp on my i7-6850K) comes from using a high-tide allocator for
the windings (which necessitated using a fixed size), it is ever so
slightly faster than using malloc as the back-end.
2021-03-27 20:30:35 +09:00
Bill Currie 238e80c89b [build] Fix selective build of tools
A couple of things get built when they shouldn't (eg, vkgen) but this
gets the build system back to its pre-non-recursive-make
configurability.
2021-03-26 16:11:29 +09:00
Bill Currie c901fe74f9 [qfvis] Fix pthread portability macros
Those that were defined were incorrectly defined (didn't swallow the
parameter), and portal lock macros were missing.
2021-03-26 15:27:48 +09:00
Bill Currie e3444b726f [model] Add a re-entrant Mod_LeafPVS
Double benefit, actually: faster when building a fat PVS (don't need to
copy as much) and can be used in multiple threads. Also, default visiblity
can be set, and the buffer size has its own macro.
2021-03-20 12:13:58 +09:00
Bill Currie 6d5ffa9f8e [build] Move to non-recursive make
There's still some cleanup to do, but everything seems to be working
nicely: `make -j` works, `make distcheck` passes. There is probably
plenty of bitrot in the package directories (RPM, debian), though.

The vc project files have been removed since those versions are way out
of date and quakeforge is pretty much dependent on gcc now anyway.

Most of the old Makefile.am files  are now Makemodule.am.  This should
allow for new Makefile.am files that allow local building (to be added
on an as-needed bases).  The current remaining Makefile.am files are for
standalone sub-projects.a

The installable bins are currently built in the top-level build
directory. This may change if the clutter gets to be too much.

While this does make a noticeable difference in build times, the main
reason for the switch was to take care of the growing dependency issues:
now it's possible to build tools for code generation (eg, using qfcc and
ruamoko programs for code-gen).
2020-06-25 11:35:37 +09:00
Bill Currie 34bcf7faab Do a pure/const/noreturn/format attribute pass.
I always wanted these, but as gcc now provides warnings for functions that
could do with such attributes, finding all the functions is much easier.
2018-10-09 12:42:21 +09:00
Bill Currie aebd9288cd Force thread count to 1 when pthreads is unavailable.
Don't want the thread count being misreported.
2018-09-09 13:41:22 +09:00
Bill Currie fa1514798b Print the number of threads used by qfvis. 2018-09-09 13:41:00 +09:00
Bill Currie 06ab36de3d Slight cleanup of winding allocation.
It seems gcc doesn't care if the & is present when calculating field
offsets, but it not being there bothered me very much and might as well use
our "standard" macro anyway.
2018-09-09 13:38:32 +09:00
Bill Currie c71eccfb10 Remove MAX_THREADS.
This fixes a buffer overflow with more than 4 threads.
2015-08-14 10:57:51 +09:00
Bill Currie f5501fbf24 Fix a pile of automake deprecation warnings.
s/INCLUDES/AM_CPPFLAGS/g

I <3 sed :)
2013-11-24 13:11:50 +09:00
Bill Currie 125ef1f0ff Move the whole separator test/creation into a function.
This will make the next stage easier. (except that seems to be slower)
2013-03-19 20:39:01 +09:00
Bill Currie f2452eb3c3 Rewrite the inner-loop of FindSeparators.
For the most part, it's just refactoring the code so the plane creation and
testing are in separate functions, but there is one important difference:
the plane test now checks only the two points on either side of the point
used to create the plane.

Because the portal winding is guaranteed to be convex and planar, if both
points are on the plane, all points are, and if neither point is behind the
plane, no points are.a

This shaved about 5 seconds off the level 4 run using 4 threads (~198s to
~193s) and about 12s from the single threaded run (~682s to ~670s (hmm,
gained some time in recent changes)).
2013-03-19 17:00:00 +09:00
Bill Currie d7c1bc8d02 Correct a comment.
I had gotten confused between figuring out the windings and writing the
comments, I guess.
2013-03-19 16:23:47 +09:00
Bill Currie 8938870e46 Make the default output a little nicer. 2013-03-19 13:07:44 +09:00
Bill Currie dff0b89a6c Detect the number of CPUs available.
Now qfvis will default to multi-threaded on multi-core machines.
2013-03-19 12:05:50 +09:00
Bill Currie 88e5adcec6 Make the base vis multi-threaded.
Now multi-threaded qfvis is on par with tyrutils vis (differences usually
<1s, sometimes more, sometimes less).
2013-03-19 11:42:09 +09:00