Commit graph

134 commits

Author SHA1 Message Date
Bill Currie
0a847f92f1 [util] Use mmap/munmap for cmem internal alloc/free
This reduces the overhead needed to manage the memory blocks as the
blocks are guaranteed to be page-aligned. Also, the superblock is now
alllocated from within one of the memory blocks it manages. While this
does slightly reduce the available cachelines within the first block (by
one or two depending on 32 vs 64 bit pointers), it removes the need for
an extra memory allocation (probably via malloc) for the superblock.
2021-07-12 16:33:47 +09:00
Bill Currie
778c07e91f [util] Get vectors working for non-SSE archs
GCC does a fairly nice job of producing code for vector types when the
hardware doesn't support SIMD, but it seems to break certain math
optimization rules due to excess precision (?). Still, it works well
enough for the core engine, but may not be well suited to the tools.
However, so far, only qfvis uses vector types (and it's not tested yet),
and tools should probably be used on suitable machines anyway (not
forces, of course).
2021-06-01 18:53:53 +09:00
Bill Currie
574a123716 [qfvis] Remove obsolete notes file
While some of it is still correct, I'd rather start afresh next time I
need to sort that stuff out.
2021-03-28 21:14:17 +09:00
Bill Currie
634219ea06 [qfvis] Add set debug prints (disabled)
They were useful for narrowing down why mightsee wasn't being updated.
2021-03-28 21:11:13 +09:00
Bill Currie
9f42943589 [qfvis] Reset portal status after base vis
This fixes the mightsee updates never occurring, but it doesn't make a
huge difference (though I suppose it might have back in the 90s, or with
a different map).
2021-03-28 21:06:50 +09:00
Bill Currie
0fa65be106 [qfvis] Fix stats collection for mightseeupdate
The stats were being updated before UpdateMightsee was getting called,
and it was incrementing the wrong value (so it would not have been
thread-safe).
2021-03-28 21:06:50 +09:00
Bill Currie
ff4cd84891 [qfvis] Use simd vector code
While whether it's any faster is debatable (it's slightly slower, but
many more portals are being tested due to different rounding in the base
vis stage), it's certainly easier to read.
2021-03-28 19:55:47 +09:00
Bill Currie
eb325376b1 [qfvis] Collect base vis culling stats
Specifically, just how many are culled by sphere and winding tests.
2021-03-28 12:17:15 +09:00
Bill Currie
d072a7b99c [qfvis] Add stats for memory usage
Verbosity levels probably need more tweaking, but -v is at least a
little more usable.
2021-03-27 23:04:13 +09:00
Bill Currie
3ef38188ce [qfvis] Add an option to limit the processed portals
It's not documented as I needed it for debugging memory allocations and
it causes qfvis to error out due to unprocessed portals.
2021-03-27 20:59:56 +09:00
Bill Currie
f2b6b23acc [qfvis] Switch to unsigned for various counts 2021-03-27 20:55:15 +09:00
Bill Currie
72280186bf [qfvis] Use cmem for memory management
While the main bulk of the improvement (36s down from 42s for
gmsp3v2.bsp on my i7-6850K) comes from using a high-tide allocator for
the windings (which necessitated using a fixed size), it is ever so
slightly faster than using malloc as the back-end.
2021-03-27 20:30:35 +09:00
Bill Currie
238e80c89b [build] Fix selective build of tools
A couple of things get built when they shouldn't (eg, vkgen) but this
gets the build system back to its pre-non-recursive-make
configurability.
2021-03-26 16:11:29 +09:00
Bill Currie
e3444b726f [model] Add a re-entrant Mod_LeafPVS
Double benefit, actually: faster when building a fat PVS (don't need to
copy as much) and can be used in multiple threads. Also, default visiblity
can be set, and the buffer size has its own macro.
2021-03-20 12:13:58 +09:00
Bill Currie
6d5ffa9f8e [build] Move to non-recursive make
There's still some cleanup to do, but everything seems to be working
nicely: `make -j` works, `make distcheck` passes. There is probably
plenty of bitrot in the package directories (RPM, debian), though.

The vc project files have been removed since those versions are way out
of date and quakeforge is pretty much dependent on gcc now anyway.

Most of the old Makefile.am files  are now Makemodule.am.  This should
allow for new Makefile.am files that allow local building (to be added
on an as-needed bases).  The current remaining Makefile.am files are for
standalone sub-projects.a

The installable bins are currently built in the top-level build
directory. This may change if the clutter gets to be too much.

While this does make a noticeable difference in build times, the main
reason for the switch was to take care of the growing dependency issues:
now it's possible to build tools for code generation (eg, using qfcc and
ruamoko programs for code-gen).
2020-06-25 11:35:37 +09:00
Bill Currie
aebd9288cd Force thread count to 1 when pthreads is unavailable.
Don't want the thread count being misreported.
2018-09-09 13:41:22 +09:00
Bill Currie
fa1514798b Print the number of threads used by qfvis. 2018-09-09 13:41:00 +09:00
Bill Currie
06ab36de3d Slight cleanup of winding allocation.
It seems gcc doesn't care if the & is present when calculating field
offsets, but it not being there bothered me very much and might as well use
our "standard" macro anyway.
2018-09-09 13:38:32 +09:00
Bill Currie
c71eccfb10 Remove MAX_THREADS.
This fixes a buffer overflow with more than 4 threads.
2015-08-14 10:57:51 +09:00
Bill Currie
f5501fbf24 Fix a pile of automake deprecation warnings.
s/INCLUDES/AM_CPPFLAGS/g

I <3 sed :)
2013-11-24 13:11:50 +09:00
Bill Currie
125ef1f0ff Move the whole separator test/creation into a function.
This will make the next stage easier. (except that seems to be slower)
2013-03-19 20:39:01 +09:00
Bill Currie
f2452eb3c3 Rewrite the inner-loop of FindSeparators.
For the most part, it's just refactoring the code so the plane creation and
testing are in separate functions, but there is one important difference:
the plane test now checks only the two points on either side of the point
used to create the plane.

Because the portal winding is guaranteed to be convex and planar, if both
points are on the plane, all points are, and if neither point is behind the
plane, no points are.a

This shaved about 5 seconds off the level 4 run using 4 threads (~198s to
~193s) and about 12s from the single threaded run (~682s to ~670s (hmm,
gained some time in recent changes)).
2013-03-19 17:00:00 +09:00
Bill Currie
d7c1bc8d02 Correct a comment.
I had gotten confused between figuring out the windings and writing the
comments, I guess.
2013-03-19 16:23:47 +09:00
Bill Currie
8938870e46 Make the default output a little nicer. 2013-03-19 13:07:44 +09:00
Bill Currie
dff0b89a6c Detect the number of CPUs available.
Now qfvis will default to multi-threaded on multi-core machines.
2013-03-19 12:05:50 +09:00
Bill Currie
88e5adcec6 Make the base vis multi-threaded.
Now multi-threaded qfvis is on par with tyrutils vis (differences usually
<1s, sometimes more, sometimes less).
2013-03-19 11:42:09 +09:00
Bill Currie
32b6d15931 Use a sorted queue for portals.
qsort is used to sort the queue by nummightsee. At ~4ms for 20k portals, I
think it's affordable. Using a queue rather than scanning the portal list
each time loses the dynamic sorting when mightsee gets updated, but it
seemed to shave off 4s anyway (~207s to ~203s (maybe, yay random times)).

Another step towards threaded base-vis.
2013-03-18 21:14:12 +09:00
Bill Currie
7e40981dcd Move the LeafThread setup to its own generic function.
This is for threading base-vis.
2013-03-18 21:11:46 +09:00
Bill Currie
cb096c601d Use a per-portal rwlock for portal updates.
This should make qfvis scale a little better with cpu count.
2013-03-18 15:03:11 +09:00
Bill Currie
c824e668ed Rework some of the pthread stuff.
Init/uninit is now separate from portal vising.
The global lock has a better name and is now a rwlock.
Use a separate lock for the stats.
2013-03-18 14:26:52 +09:00
Bill Currie
134381f79b Reduce the locking in the portal completion code.
It doesn't seem to make much difference, but the less room for contention,
the better.
2013-03-18 13:45:19 +09:00
Bill Currie
ffb6d628bd Simplify the pthreads detection macros. 2013-03-18 13:31:35 +09:00
Bill Currie
1c20a49dba Use the recursive set allocator for mightsee.
This completely removes the lock used to protect the set allocation code
while keeping the use of the set api clean.
2013-03-18 13:30:50 +09:00
Bill Currie
a28ec8aa82 Revert "Allocate stack blocks and mightsee in one block."
This reverts commit 1ea79e8626.

Conflicts:
	tools/qfvis/include/vis.h
	tools/qfvis/source/flow.c

I've decided to do reentrant versions of the set allocators and I didn't
particularly like the invasiveness of allocating sets this way.
2013-03-18 12:47:59 +09:00
Bill Currie
ad247fa12d Rename some variables and remove some comments.
The old variable names were confusing ("target" winding comes from
"portal"?), and the comments were from when I really didn't understand
concepts like separating planes. While they weren't wrong, they were quite
inadequate and I want to write new ones.
2013-03-17 21:52:08 +09:00
Bill Currie
ccc432a7ea Give the fields of pstack_t clearer names.
And some comments.
2013-03-17 19:18:38 +09:00
Bill Currie
1ea79e8626 Allocate stack blocks and mightsee in one block.
This bypasses set_new, but completely removes the use of the global lock
from within RecursiveClusterFlow. This seems to give a small speedup: 203
seconds threaded.
2013-03-17 16:37:27 +09:00
Bill Currie
1d262f7dea Clean up FindSeparators a little bit.
This was testing an idea I had to remove the plane flips. It seems to have
been good for the initial plane orientation, but was a slight slowdown for
the pass-portal test. However, this makes the code a little easier to work
with for my idea on improving the algorithm itself.
2013-03-17 10:16:47 +09:00
Bill Currie
5dba419233 Cache stack blocks and working mightsee sets.
Since the stack structure in the thread data is a linked list, move the
stack blocks off the program stack and into malloced memory. More
importantly, when the stack block is allocated, the mightsee working set is
allocated too, and as neither are freed, this greatly reduces contention
for the lock. Also, because the memory is kept, single threaded time for
gmsp3v2 dropped from 695s to 670s. Threaded is now about 207s (down from
350).
2013-03-16 22:58:59 +09:00
Bill Currie
2ea143283c Rewrite mightsee_more to manipulate the sets directly.
While using set operators was clearer, it was rather expensive (about 25s
for gmsp3v2). qfvis now completes the map in about 695s (single threaded).
About 15s faster than tyr for the same conditions (1 thread, level 4).
2013-03-16 21:51:41 +09:00
Bill Currie
195bdcb92f Rework FindSeparators to make use of the winding direction.
This is the second part of the separator search optimization from tyrutils
vis. With this, qfvis is getting close to tyrutils vis when
running single threaded (qfvis is suffering some nasty thread contention
and thus can't get below about 350 seconds with 4 threads). 808s vs 707s.
2013-03-15 22:05:01 +09:00
Bill Currie
9b10304c2f Make CopyWinding const-correct. 2013-03-15 19:25:24 +09:00
Bill Currie
5a2ee06787 Reverse the winding for backside portals.
This is part 1 of another optimization from tyrutils vis. It seems that
just reversing the winding gives a tiny speedup.
2013-03-15 19:22:57 +09:00
Bill Currie
46d41ad9ac Split up separator finding and winding clipping.
Interesting, it makes very little (maybe faster) difference to find all the
separators for levels 3 and 4. This might be due to the higher levels using
most of the planes to fully clip source away. Anyway, it makes the code a
little clearer (one function, one task).
2013-03-15 16:00:39 +09:00
Bill Currie
f80ae52828 Make vis's ClipWinding const-correct. 2013-03-15 15:28:25 +09:00
Bill Currie
77c858060d Add a bunch more statistics.
Now I know why sphere culling was a loss: 78% of all tested target portals
were trimmed by ClipToSeparators (50% eventually clipped away entirely).
2013-03-14 19:43:46 +09:00
Bill Currie
8032d1d4d1 Split out the mightsee intersection/subset tests.
Having the code in separate functions makes the flow in the main loop a
little easier to follow.
2013-03-14 19:43:46 +09:00
Bill Currie
eec87bd61b Remove thread from stack_t.
It really wasn't gaining anything and made reading the code a little
harder.
2013-03-14 19:43:46 +09:00
Bill Currie
057a5cc624 Make BasePortalVis another 17% faster.
I had forgotten to skip the refined tests when the sphere was entirely on
the relevant side of the plane. Now BasePortalVis for gmsp3v2 takes 11s on
my machine (it was 13 with the previous optimization and 15.9 before that).

Also, write some comments describing how BasePortalVis works.
2013-03-14 14:01:26 +09:00
Bill Currie
5d6df082f2 Move the vis stats vars into thread data.
This should make the stats more reliable when running multi-threaded
(chains is still random, but it seems there are set access issues).
2013-03-14 12:52:40 +09:00