The idea is to find th def that contains the address. Had to write my
own bsearch (well... lifted from wikipedia) because libc's is exact. The
defs are assumed to be sorted (which qfcc now ensures when it writes
progs and sym files).
Type encodings are used whenever they are available. For now, if they
are not, then everything is treated as void (which prints <void>, not
very useful). Most return statements and references to .return are now
very readable (excluding structs), and only params going through "..."
are a messy union.
The memset instructions now match the move* instructions other than the
first operand (always int). Probably breaks much, but fixed in next few
commits.
This "pushes" a temp string onto the callee's stack frame after removing
it from the caller's stack frame. This is so builtins can pass
auto-freed memory to called progs code. No checking is done, but mayhem
is likely to ensue if a string is pushed that was allocated in an
earlier frame.
PR_AllocTempBlock() works the same way as PR_SetTempString(), except
that it takes a size parameter and always allocates (never tries to
merge). This is, in a way, abusing the string system, but I needed a way
to allocate a block of progs memory that would be automatically freed
when the current frame ended. The biggest abuse is the need to cast away
the const of PR_GetString()'s return value.
This should speed up ruamoko code somewhat as hash table lookups have
been replaced with direct array indexing. As a bonus, support for
message forwarding has been added (though not tested).
As well as $prefix/include, of course. This fixes the problem with
external ruamoko builds failing due to keys.h and qfcc's "lockdown" on
system headers.
It proved to be too fragile in its current implementation. It broke
pointers to incomplete structs and switch enum checking, and getting it
to work for other things was overly invasive. I still want the encoding,
but need to come up with something more robust.a
Other than consistency with printf(), I'm not sure why we went with the
printed size as the return value; returning the resultant strings makes
much more sense as dsprintf() (etc) can then be used as a safe va()
Other than its blocking of access to certain files, it really wasn't
that useful compared to the functions in qfs, and pointless with access
to qfs anyway.
The progs execution code will call a breakpoint handler just before
executing an instruction with the flag set. This means there's no need
for the breakpoint handler to mess with execution state or even the
instruction in order to continue past the breakpoint.
The flag being set in a progs file is invalid.
The debug subsystem now uses the resources system to ensure it cleans
up, and its data is now semi-private. Unfortunately, PR_LoadDebug had to
remain public for qfprogs because using PR_RunLoadFuncs would cause
builtin resolution to complain.
It is now set to 0 when progs are loaded and every time
PR_ExecuteProgram() returns. This takes care of the default case, but
when setting parameters, pr_argc needs to be set correctly in case a
vararg function is called.
PR_SaveParams() is required for implementing the +initialize diversion
used by Objective-QuakeC because builtins do not have local def spaces
(of course, a normal stack calling convention would help). However, it
is entirely possible for a call to +initialize to trigger another call
to +initialize, thus the need for stacking parameter stashes. As a
bonus, this implementation cleans up some fields in progs_t.
I was originally going to put it in the debug syms file, but I realized
that the data persistence code would need access to both def type and
certainly correct def offsets for defs in far data.
This far better reflects the actual meaning. It is very likely that
ty_none is a holdover from long before there was full type encoding and
it meant that the union in qfcc's type_t had no data. This is still
true for basic types, but only if not a function, field or pointer type.
If the type was function, field or pointer, it was not true, so it was
misnamed pretty much from the start.
The engine now requires non-v6 progs to store the log2 alignment for the
param struct in .param_alignment.
PR_EnterFunction is clearer and possibly more efficient.
The encoding is 3:5 giving 3 bits for alignment (log2) and 5 bits for
size, with alignment in the 3 most significant bits. This keeps the
format backwards compatible as until doubles were added, all types were
aligned to 1 word which gets encoded as 0, and the size is unaffected.
Only as scalars, I still need to think about what to do for vectors and
quaternions due to param size issues. Also, doubles are not yet
guaranteed to be correctly aligned.
It turned out I needed access to the physical device from a buffer
object, so rather than storing the vulkan logical device directly in
buffer (and other) objects, store the qfv logical device.
I added Sys_RegisterShutdown years ago and never really did anything
with it: now any system that needs to be shutdown can ensure it gets
shutdown on program exit, and in the correct order (ie, reverse to init
order).
This paves the way for clean initialization of the Vulkan renderer, and
very much cleans up the older renderer initialization code as gl and sw
are no longer intertwined.
This fixes the segfault and pushes things very much in the desired
direction of proper system independence for rendering and presentation
separation (though things were headed in the right direction before).
Things are still a mess, but a proper cleanup will be a lot of work and
will, really, involve properly splitting quake-specific code* out from
the rest of the renderer.
* data loading and format specific stuff
A single graphics-capable queue should be enough for now. However, I'm
not sure I'm happy with a lot of the code: it's a bit difficult to write
flexibly configured code for Vulkan (or seems to be at this stage),
especially in C.
After messing with SIMD stuff for a little, I think I now understand why
the industry went with xyzw instead of the mathematical wxyz. Anyway, this
will make for less pain in the future (assuming I got everything).
I've decided that setting pr.max_edicts and pr.zone_size as part of the
local progs initialization rather than in PR_LoadProgsFile makes more
sense. For one, it is unlikely for the limits to change every time progs is
reloaded. Also, they seem to be a property of the VM rather than the progs.
However, there is nothing stopping the caller from updating max_edicts and
zone_size every call.
Also fix a bug where despite supporting 32 buttons, only 18 were actually
supported, and a similar issue for the number of axes.
My saitek x52 has 34 buttons and 10 axes. Whee.
_QFS_VOpenFile is actually _QFS_FOpenFile reimplemented to take vpath start
and end parameters so the search can be limited. QFS_VOpenFile,
_QFS_FOpenFile, and QFS_FOpenFile are all wrappers for _QFS_VOpenFile.
Again, based on The OpenGL Shader Wrangler. The wrangling part is not used
yet, but the shader compiler has been modified to take the built up shader.
Just to keep things compiling, a wrapper has been temporarily created.
The idea comes from The OpenGL Shader Wrangler
(http://prideout.net/blog/?p=11). Text files are broken up into chunks via
lines beginning with -- (^-- in regex). The chunks are optionally named
with tags of the form: [0-9A-Za-z._]+. Unnamed chunks cannot be found.
Searching for chunks looks for the longest tag that matches the beginning
of the search tag (eg, a chunk named "Vertex" will be found with a search
tag of "Vertex.foo"). Note that '.' forms the units for the searc
("Vertex.foo" will not find "Vertex.f").
Unlike glsw, this implementation does not have the concept of effects keys
as that will be separate. Also, this implementation takes strings rather
than file names (thus is more generally useful).
set_bits_t is now 64 bits for x86_64 machines (in linux, anyway). This gave
qfvis a huge speed boost: from ~815s to ~720s.
Also, expose some of the set internals so custom set operators can be
created.
Now we can get tight (<1e-6 * radius_squared error) bounding spheres. More
importantly (for qfvis, anyway) very quickly: 1.7Mspheres/second for a 5
point cloud on my 2.33GHz Core 2 :)
It "works" for lines, triangles and tetrahedrons. For lines and triangles,
it gives the barycentric coordinates of the perpendicular projection of the
point onto to features. Only tetrahedrons are guaranteed to reproduce the
original point.
Rather than prefixing free_ to the supplied name, suffix _freelist to the
supplied name. The biggest advantage of this is it allows the free-list to
be a structure member. It also cleans up the name-space a little.
I'd forgotten that ED_ConvertToPlist mangled light into light_lev and
single component angle values into a vector. This fixes much of the
breakage in qflight (but not the light levels)
Sys_LongTime returns time in microseconds as a 64-bit int. Sys_DoubleTime
uses Sys_LongTime, converts to double and offsets 0 time by 4G (2**32).
This gives us consistent sub-microsecond precision for a very long time.
See http://randomascii.wordpress.com/2012/02/13/dont-store-that-in-a-float/
First, this completely smashes joystick input: it will not work (though it
doesn't crash). This is because there is, as of yet, no means to configure
the system.
Each joystick axis has:
- per-axis amplification (both pre and post).
- per-axis offset (offset applied after pre-amp but before post amp)
- selectable destination:
- linear delta: position and angles (as before)
- axis button: if the value crosses the threshold, the given key is
pressed or released as appropriate.
The axis amplification still uses joy_amp and joy_pre_amp (and
in_amp/in_pre_amp), but now also has the per-axis settings.
The per-axis offset is most useful for axis buttons. For example, the xbox
360 controller triggers are analong but go "all the way to negative on 0
state". Offsetting the input keeps axis button thresholds simple.
Amplification and offset is applied before anything is done with the axis
value. The formula is:
joy_amp * in_amp * axis-amp *
(offset + value * joy_pre_amp * in_pre_amp * axis-pre_amp)
Axis button thresholds are very simple: if the sign of the value is the
same as the sign of the threshold and abs(value) >= abs(threshold), the
button is pressed. While multiple thresholds and keys can be placed on an
axis, only one can be pressed at a time. The threshold furthest from 0
wins.
This gives QF a consistent qualilty PRNG on all platforms. The
implementation is slightly different from the standard, but gives the same
results for the same speed (details in mersenne.c).
Now the user can create and destroy IMTs at will, though currently
destroying IMTs is currently all or nothing (imt_drop_all).
An IMT is created via imt_create which takes the keydest name (key_game
etc), the name of the IMT (must be unique for all IMTs) and optionally the
name of the IMT to which the key binding search will fall back if there is
no binding in the current IMT, but must be already defined and on the same
keydest. This means that IMTs now have user determined fallback paths. The
requirements for the fallback IMT prevent loops and other weird behaviour.
Actual key binding via in_bind is unaffected. This is why the IMT name must
be unique across all IMTs.
The "imt" command works with the key_game keydest, but imt_keydest is
provided for specifying the active IMT for a specific keydest.
At startup, default IMTs are setup to emulate the previous static IMTs so
old configs will continue to work (mostly). New config files will be
written with commands to drop all of the current IMTs and build new ones,
with the bindings and active IMT set as well.
This fixes the status bar refresh issues in sw. The problem was that with
two viddef's hanging around, things got a little confused and recalc_refdef
wasn't getting into the renderer.
It turns out gcc on little endian machines didn't guarantee the type of
ShortNoSwap due to it being a macro that just returned its parameter. At
the same time, LongNoSwap and FloatNoSwap have been fixed.
qfcc now does local common subexpression elimination. It seems to work, but
is optional (default off): use -O to enable. Also, uninitialized variable
detection is finally back :)
The progs engine now has very basic valgrind-like functionality for
checking pointer accesses. Enable with pr_boundscheck 2
Getting everything right with an enum proved to be too difficult if not
impossible. Also use better tests for equivalence and intersection.
Many more tests have been added. All pass :)
Also move the ALLOC/FREE macros from qfcc.h to QF/alloc.h (needed to for
set.c).
Both modules are more generally useful than just for qfcc (eg, set
builtins for ruamoko).
The depth limits in the gl and glsl renderers and in the trace code really
bothered me, but then the fix hit me: at load-time, recurse the trees
normally and record the depth in the appropriate place. The node stacks can
then be allocated as necessary (I chose to add a paranoia buffer of 2, but
I expect the maximum depth will rarely be used).
The attached patch (against quakeforge git) changes the [con]width,
[con]height, and most importantly the rowbytes members of viddef_t
from unsigned to signed int, like in q2. This allows for a properly
negative vid.rowbytes which may be needed in, e.g. a DIB sections
windows driver if needed. Along with it, I changed a few places
where unsigned int is used along with comparisons against the relevant
vid.* members.
One thing I am not 100% sure is the signedness requirements of
d_zrowbytes and d_zwidth: q2 has them as unsigned but I am not sure
whether that is because they are needed as unsigned or it was just an
oversight of the id developers. They do look like they should be OK
as signed int to me, though: comments?
==
Note from Bill Currie: I had to do some extra changes as many
signed/unsigned comparisons were somehow missed.
All of the nastiness is hidden in bspfile.c (including the old bsp29
specific data types). However, the conversions between bsp29 and bsp2 are
implemented but not yet hooked up properly. This commit just gets the data
structures in place and the obvious changes necessary to the rest of the
engine to get it to compile, plus a few obvious "make it work" changes.
This should make maintaining them a little easier.
The copyright block in most of the new headers (execpt vector.h) reflect
when the functions in the relevant header were first created.
Really, when cl_nodelta is in effect (eg, .qwd demo recording and thus
playback). QW now uses the new shared entity state block as I'd intended.
Thanks to the cleanup of ghost entities (ie, entities that have been
removed but continue to be rendered), glsl overkill has gone from 157 to
163 fps :)
It turns out glsl, sw and sw32 weren't getting any benefit from R_CullBox
because the frustum wasn't setup :P. Get another 8% out of bigass1
(174->184fps). bigass1 now runs 2x as fast as it did before I started this
optimisation run :)
This severely reduces the calles to BindTexture, and more importantly,
glUseProgram, EnableVertexAttribArray etc. The biggest changes are:
o icons and text are all in the one giant texture
o icons and text are mixed in the one queue
This gave ~9% speedup for bigass1 (159->174fps).
For certain values of "fix" ;). Both are brought back to life but
idealpitch is never set (always 0) and veiwheight is set in V_RenderView().
However, this brings the rest of the code in cl_view.c just that little bit
closer to merged :)
I didn't like the way client/server code was poking around at the
implementation. Instead, provide a couple of accessor functions for the
same information.
gl, sw and sw32 use blend palettes, so share the code. This also abandons
the optimization for transforming verts in sw (had all sorts of problems
anyway). sw still doesn't work, though.
There are still many issues to sort out, but the basics are working.
Problems:
rendered fullbright (no lighting done)
normals are ignored
extra textures (glow etc) not used/loaded
4 models on the screen don't seem to be a problem.
Since iqm vertex arrays are variable, and I don't want to calculate the
stride every time I render a model, cached the value used when building the
arrays.
VectorUnshear uses the exact same shear vector to remove shear from a
sheared vector. ie with:
VectorShear (shear, v, w);
VectorUnshear (shear, w, x);
x == v within fp math limits.
And the tests really exercised VectorShear (first attempt had things
messed up when more than one shear value was non-zero). Also,
Mat4Decompose wasn't orthogonalizing the z axis row. Oops. Anyway,
Mat4Decompose is now known to work well, and the usage of its output is
understood :)
I'd gotten the norm and magnitude mixed up (partly because the document I
was following got the names mixed up), and then munged the formulas
together.
Now it doesn't matter if you get 22 fps or 72, you jump the same height,
which actually happens to be slightly higher than the previous 72fps jump.
Effectively, you jump the height you would if you got infinite fps ;)
I got the idea from blender when I discovered by accident that quat * vect
produces the same result as quat * qvect * quat* and looked up the code to
check what was going on. While matrix/vector multiplication still beats the
pants off quaternion/vector multiplication, QuatMultVec is a slight
optimization over quat * qvect * quat* (17+,24* vs 24+,32*, plus no need to
to generate quat*).
This avoids sending invalid pose data to the renderer. The symptom was a
vertex array offset higher than the vertex array size. Discovered by calim
of nouveau while he was debugging a driver problem found by QF. Many
thanks.
This allows the vid module to load the render module and access render
specific functions before the renderer initializes, which happens to need
an initialized vid module...
The renderer now gets initialized and things sort of work (qw-client will
idle, though nothing is displayed). However, as the viddef stuff is broken,
it segs on trying to run the overkill demo.
Still, nothing will work: no plugins are loaded and they're all broken
anyway.
glx, sgl, glslx etc are going away, just the basics will be built: fbdev
(probably go away eventually), sdl, x11 and hopefully someday win. That's
actually the only reason anything links.
Where possible, symbols have been made static, prefixed with glsl_/GLSL_ or
moved into the code shared by all renderers. This will make doing plugins
easier but done now for link testing. The moving was done via the gl
commit.
Where possible, symbols have been made static, prefixed with gl_/GL_ or
moved into the code shared by all renderers. This will make doing plugins
easier but done now for link testing.
The api hides all the gory details of message buffer setup and usage
(particularly the differences between writing and reading). Most
importantly, the api provides a safe way to read and write binary data
(always little endian).
Most subsystems that depend on other subsystems now call the init functions
themselves. This makes for much cleaner client initialization (more work
needs to be done for the server).
The renderer should now be free of any direct access to client code. Even
3d rendering is now done via a function pointer.
The cshift code is done as a 2d screen function.
Unfortunately, the maximum point size on Intel hardwar seems to be 1, so I
can't tell if the colors are right.
This is largely just a hacked version of GL's particle code.
For now, only the glsl loader disables caching, but it stores the frame
vertices in GL memory, so its hunk usage is relatively lower (and will be
lower still when I get skins sorted out).