qsort is used to sort the queue by nummightsee. At ~4ms for 20k portals, I
think it's affordable. Using a queue rather than scanning the portal list
each time loses the dynamic sorting when mightsee gets updated, but it
seemed to shave off 4s anyway (~207s to ~203s (maybe, yay random times)).
Another step towards threaded base-vis.
This reverts commit 1ea79e8626.
Conflicts:
tools/qfvis/include/vis.h
tools/qfvis/source/flow.c
I've decided to do reentrant versions of the set allocators and I didn't
particularly like the invasiveness of allocating sets this way.
The old variable names were confusing ("target" winding comes from
"portal"?), and the comments were from when I really didn't understand
concepts like separating planes. While they weren't wrong, they were quite
inadequate and I want to write new ones.
This bypasses set_new, but completely removes the use of the global lock
from within RecursiveClusterFlow. This seems to give a small speedup: 203
seconds threaded.
This was testing an idea I had to remove the plane flips. It seems to have
been good for the initial plane orientation, but was a slight slowdown for
the pass-portal test. However, this makes the code a little easier to work
with for my idea on improving the algorithm itself.
Since the stack structure in the thread data is a linked list, move the
stack blocks off the program stack and into malloced memory. More
importantly, when the stack block is allocated, the mightsee working set is
allocated too, and as neither are freed, this greatly reduces contention
for the lock. Also, because the memory is kept, single threaded time for
gmsp3v2 dropped from 695s to 670s. Threaded is now about 207s (down from
350).
While using set operators was clearer, it was rather expensive (about 25s
for gmsp3v2). qfvis now completes the map in about 695s (single threaded).
About 15s faster than tyr for the same conditions (1 thread, level 4).
This is the second part of the separator search optimization from tyrutils
vis. With this, qfvis is getting close to tyrutils vis when
running single threaded (qfvis is suffering some nasty thread contention
and thus can't get below about 350 seconds with 4 threads). 808s vs 707s.
Interesting, it makes very little (maybe faster) difference to find all the
separators for levels 3 and 4. This might be due to the higher levels using
most of the planes to fully clip source away. Anyway, it makes the code a
little clearer (one function, one task).
I had forgotten to skip the refined tests when the sphere was entirely on
the relevant side of the plane. Now BasePortalVis for gmsp3v2 takes 11s on
my machine (it was 13 with the previous optimization and 15.9 before that).
Also, write some comments describing how BasePortalVis works.
Representing the side of the plane on which the sphere lies is much more
useful as more complicated tests can be done using just the one call.
-1: the sphere is entirely on the back side of the plane
0: the sphere is intersecting the plane
1: the sphere is entirely on the front side of the plane
It was supposed to be 2, but for some reason I neglected to set it when I
set up the options parsing. However, level 4 is the standard for production
maps, and it happens to be faster than level 2 (at least for gmsp3v2.bsp)
I think the reason I didn't think of that when I tried to improve qfvis's
performance many years ago is I just simply did not understand
ClipToSeparators. However, the difference caching the separators makes is
phenomenal. Before the change, single threaded qfvis would get stuck on one
particular portal for at least a day (I gave up waiting), but now even a
debug build will complete gmsp3v2.bsp in less than 12 minutes (4 threads on
my quad-core). And that's at level 2! Getting stuck for a day was at level
0.
Rather than prefixing free_ to the supplied name, suffix _freelist to the
supplied name. The biggest advantage of this is it allows the free-list to
be a structure member. It also cleans up the name-space a little.
While noticeably slower than the previous expanded set manipulation code,
this is much easier to read. I can worry about optimizing the set code when
I get qfvis behaving better.
I'd forgotten that ED_ConvertToPlist mangled light into light_lev and
single component angle values into a vector. This fixes much of the
breakage in qflight (but not the light levels)
This removes a lot of redundant code from qflight (though it does become
dependent of libQFgamecode *shrug*). The nice thing is qflight now uses the
exact same code to load entities as does the server.
Something is funny with Ubuntu such that -ldl needs to be specifically
added even though QFutil's .la specifies it. I don't know if it's a libtool
issue or not, but this does work.
More will probably be necessary, but this was sufficient to get prover to
the point where qfcc segged building qwaq (0.7.2).
type_obj_class is no longer a class, so its ivars are not stored in
type_obj_class.t.class->ivars but rather type_obj_class.t.symtab.
This fixes the segfault Spirit and Randy were experiencing.
In passing, correct the unneeded emission of meta class ivars for non-root
classes. This should make for much smaller progs that use classes.
MOVEP's opc itself is always known and used, whether it's a constant
pointer or variable doesn't matter. This fixes the lost pointer calculation
for va_list.list[j] = object_from_plist (item);
Dead nodes are those that generate unused values (unassigned leaf nodes,
expressions or destinationless move(p) nodes). The revoval is done by the
flow analysis code (via the dags code) so that any pre and post removal
flow analysis and manipulation may be done (eg, available expressions).
assign_expr mangles the destination expression for dereferenced
assignments into something that is invalid as an lvalue, so simply use
new_binary_expr with the same opcode.
It turns out expression trees are (mostly?) valid DAGs, so all edges being
constrained works, though the graphs get a little tall (but easier to read).
This fixes the infinite loop in if ((x = self.heat && x))
Really, I think I need to revisit the whole expression tree code. It's
proving to be rather fragile.
The source of the assignment is used as the value to test, and the
assignment itself is inserted into the boolean expressions's block. This
fixes the inernal error for "if ((x = 0))".
Normally, it will happen only as a follow-on error, but I can think of a
way to force it without other errors, so treating it as an internal error
is a bit harsh.
But reset current_symtab to its prior value when done. This fixes a
segfault caused by initializing the class system while parsing a struct
(eg, one of the members is of type id).
The keywords table was rather awkward to edit (and sometimes confusing).
Worse, because the hash table used to look up the keywords was initialized
only once, changing modes in the same execution of qfcc would not work
properly as keywords would not be added or removed as appropriate.
Now there are four categories of keywords:
o "core" Always available. They form the core of QuakeC except for two
extensions.
o "@" In extended and advanced modes, the preceeding @ is optional,
but tranditional mode requires the keywords to be preceeded by
an @. They are the C keywords that QuakeC did not use, but can
be implemented in v6 progs under certain circumstances.
o "QF" These keywords require the QuakeForge VM to be usable.
o "Obj" These keywords form Ruamoko/Objective-QuakeC and require both
advanced mode and the QuakeForge VM.
This fixes the segfault/null pointer access in sendv.r. While I wanted to
use the edge setting code to set the live bit, I didn't expect it to be
this easy. def_visit_all is proving to be worth every bit it consumes :)
It seems dag_set_live_vars still served a purpose after all, but I don't
feel like bringing it as I'd rather implement its param handing in
dagnode_set_edges. I've now got a test case for it, though the test
currently causes the VM to segfault (even with pr_boundscheck 2!).
If the final block ends in a conditional statement, appending return to the
block will hide the conditional statement from the flow analyzer. This may
cause the conditional statement's destination node be become unreachable
according to the analyzer and thus eliminated. The label for the branch
then loses its target sblock and thus the code generator will produce a
zero-distance jump resulting in an infinite loop.
Thus, if the final block ends in a conditional statement (or, for
completeness, a call statement), append a new empty block before adding the
return statement.
If MOVEP's destination is variable, then the actual destination isn't (at
this stage) knowable, so it can't be attached to the dagnode and thus must
be a child.
Getting the operands directly from the statement was missing the
destination operand of movep when movep's op_c was a constant pointer and
thus the flowvar wasn't being counted/created early enough. This led to a
segfault in the set code when attempting to add -1 to the set.
It turns out the recent dead-block code "broke" vector component access
from objects. The breakage is really highlighting a problem with temporary
operands and aliasing. The problem was hiding behind a basic-block split
that the recent dead-block work mended and thus exposed the bug.
type_id is implemented as a pointer to "struct obj_object" (ie, not really
a class), so the correct check is to ensure the type is:
1 a pointer
2 to a struct
3 using the same symbol table as type_obj_object
Empty structs are now (correctly) invalid. The hack of using an empty
struct to represent a handle returned from a builtin has been unnecessary
since opaque structs were implemented: now a pointer to an opaque struct
can be used. This is mostly safe as handles are aways negative and thus
attempting to dereference such a pointer should result in a VM error. It
will be even safer once const is implemented and the pointers can be made
constant (eg, typedef struct handle * const handle;)
void foo (int); is fine for a prototype (or, presumably, a qc function
variable), but not for an actual function body. This fixes the segmentation
fault when the parameter name is omitted.
This is needed to allow compile-time protocol conformance checks, though
nothing along those lines has been implemented yet.
id has been changed from TYPE to OBJECT, required to allow id <proto> to be
parsed. OBJECT uses symbol, allowing id to be redefined once suitable work
has been done on the parser.
It uses the new block merge code. Now forgotten return statements are
detected properly (naive dead block removal) and all unreachable code is
eliminated (flow analysis unreachable node removal).
This reverts commit 83ead0842f.
Note: does not compile.
It turns out basic dead block removal is needed for the "control reaches
end of non-void function" warning to work correctly.
Empty sblocks are removed (unless it's the only sblock), and blocks that
are split unnecessarily are merged.
This mostly fixes bogus "no return" warnings.
Unreachable nodes will cause the first elements of the array to remain
unwritten by df_search. This fixes the segfaults caused by unreachable
nodes (the reason they were an internal error before).
The current implementation probably needs more work, but for the case where
I needed it, it does the job.
grid.r💯 vector size = {range, range, 0};
0115 store.f range, size
0116 store.f range, [$2ac]
0117 store.f .zero, [$2ad]
After all that effort getting the class def initialized early enough for
type encodings to work, it proved to be a problem: just including a header
with an interface in it would cause linker errors if there was no
implementation available (even if the class is never used).
I got MXE to build (took only an envvar and a couple packages, yay doc
reading), so I thought it time to update the scripts to use it (they assume
/opt/mxe).
qfcc now does local common subexpression elimination. It seems to work, but
is optional (default off): use -O to enable. Also, uninitialized variable
detection is finally back :)
The progs engine now has very basic valgrind-like functionality for
checking pointer accesses. Enable with pr_boundscheck 2
Temps aren't supported yet :P
The alias defs themselves aren't killed (still want any assignments to
occur) but rather, their nodes are. Also, edges to the alias defs' nodes
are added to the assigning node. Fixes structlive.r :)
I got fed up with using "int" types, but the members being "integer"
(hold-over from before the int rename).
Also, correct the names of those types and @va_list (error reporting was
chopping off part of the name).
MOVE (static move) and MOVEP to a pointer constant know exactly where their
data is going, so treat them similarly to assignments: save their
distination operands (the addressed def for MOVEP) and mark them as
defined.
The live var flow analysis doesn't check for aliases. Rather than changing
it to check for aliases (which might break uninitialized var analysis, as
it uses "use" from the live var analysis), make dag_remove_dead_vars do the
check. Fixes the misplaced text in the menus.
Nifty: if you pass a struct via reference to a function, and a field of
that struct may be both set and not set (eg, set only in an if statement),
gcc will report that field assuming that fields that are never set will be
set by the function (my interpretation).
* taniwha ponders the flow analysis for that
Nifty: if you pass a struct via reference to a function, and a field of
that struct may be both set and not set (eg, set only in an if statement),
gcc will report that field assuming that fields that are never set will be
set by the function (my interpretation).
* taniwha ponders the flow analysis for that
At the statement level, all pointer types are the same, so just return the
op obtained from the sub-expression when the low-level type of the alias
expression matches the low-level type of the type of type sub-expression
operand.
With this, the alias of a value code can be removed (I always thought it
was wrong), which is what broke calling obj_msgSend_super (type &.super
param lost the &).
Now I have to deal with pointer values in the optimizer :/
When an alais def (or aliased def) is used, any overlapping aliases that
have previously been assigned need to be marked as live, and edges to the
aliases added to the new node. However, when assigned to, live-forcing
needs to be turned off.
This fixes the lost assignments to .super.
This fixes the bogus temps for "*to = *from++;", but qfcc ices due to the
operand types being lost. It seems alias operands need to be resurrected,
if only for code output by dags.
I forgot to add func->num_statements :P. Fixes the weirdness where only
some alias temps were being (bogusly) detected as uninitialized. Now they
all are.
When the naive uninitialized variable detection finds a node with possible
uses of uninitialized variables, the statements in the node are scanned one
at a time checking each usage and removing uninitialized definitions as
appropriate. vectest.r now compiles without warnings. As an added bonus,
accurate line number information is reported for uninitialized variables.
Unfortunately, there is still a problem with uninitialized temps in
switch.r, but that might just be poor handling of temp op aliases.
Only definitions for the def used in the current statement (whether an
alias or not) are suitable for killing. Doing otherwise defeats the purpose
of this work :P
Fixes the false negatives found in a modified quattest.r (commented out the
"tq.s = 0;" line).
Nicely, the use sets from live_variable analysis can be used too, though
there are some problems with the naive implementation. For:
vector foo (float x, float y, float z)
{
vector v;
v.x = x;
v.y = y;
v.z = z;
return v;
}
qfcc thinks v is uninitialized, but if "if (x) return nil;" (or any other
basic-block splitter) is put just before the return v; qfcc correctly
detects that v is initialized. The reason is that the inits are in the same
basic block as the return, and thus aren't affecting the reaching
definitions, which are stored per-block.
The naive implementation should be good for a fast-cull before doing a
per-statement check.
The exit dummy block is setup to provide dummy uses of global variables to
the live variable analysis doesn't miss global variables. Much cleaner than
the previous code :) There may be some issues with aliases, though.
The entry dummy block is setup to provide dummy definitions of local
variables so the reaching definitions analysis can be used to detect
uninitialized variables (not implemented yet). Fake statement numbers
(func->num_statements + X) are used to represent the definitions. Local
variables (ie, not temp ops) use their offsets (ie, the offset range they
cover) for X. Temp ops use their flowvar number + the size of the
function's defspace for X. flow_kill_aliases() should take care of temp op
aliasing, while the use of the actual offsets spanned by the variable's def
should take care of any wild aliasing so structures and unions should
become a non-issue.
The dummy nodes are for detectining uninitialized variables (entry dummy)
and making globals live at function exit (exit dummy). The reaching defs
and live vars code currently seg because neither node has had its sets
initialized.
Fixed aliases are those that will never change through the life of the
code. They are generated from structure accesses and thus what they alias
is always known.
Also move the ALLOC/FREE macros from qfcc.h to QF/alloc.h (needed to for
set.c).
Both modules are more generally useful than just for qfcc (eg, set
builtins for ruamoko).
Set of everything is implemented by inverting the meaning of bits in the
bitmap: 1 becomes non-member, 0 member. This means that set_size and
set_first/set_next become inverted and represent non-members as counting
members becomes impossible :)
Aliasing the jump table to an integer broke statement_get_targetlist with
the new alias def handling, and was really wrong anyway. I probably did
that due to being fed up with things and wanting to get qfcc working again
rather than spending time getting jumpb right.
With the need to handle aliasing in the optimizer, it has become apparent
that having the flow data attached to symbols is not nearly as useful as
having it attached to defs (which are views of the actual variables).
This also involves a bit of a cleanup of operand types: op_pointer and
op_alias are gone (this seems to greatly simplify the optimizer)
There is a bit of a problem with enums in switch statements, but this might
actually be a sign that something is not quite right in the switch code
(other than enums not being recognized as ints for jump table
optimization).