I was checking for GCC >= 4.7, while Clangs pretends to be GCC 4.2. Use a
feature test macro instead. The comment I made in r4161 regarding GCC vs.
Clang code was wrong. Now, Clang generates slightly faster code for these cases
(solid and masked 4-pixel wide vlines).
git-svn-id: https://svn.eduke32.com/eduke32@4182 1a8010ca-5511-0410-912e-c29ae57300e0
Compiling a 32-bit NOASM build resulted in code containing a MOVAPS instruction
that accessed a memory location not aligned to 16 bytes (MinGW, GCC 4.8).
git-svn-id: https://svn.eduke32.com/eduke32@4162 1a8010ca-5511-0410-912e-c29ae57300e0
The functions mvlineasm1, mvlineasm4 and tvlineasm2 can now be set to clamp
the vertical texture coordinate (vplc), preventing the unsightly stray lines
on the bottom of non-y-flipped sprites. (The first part of this effort, r3483,
handled their top).
However, this is only enabled for the mvlineasm ones: the vectorized variants
suffered almost no slowdown (even though a PADDUSD SSE instruction would be a
nice thing to have), while it was pretty significant for the sequential
translucent ones.
Summarizing, this leaves two cases where stray lines may appear in the non-ASM
build (the saturation is NYI for a.nasm):
- at the bottom of y-flipped sprites
- at the bottom of translucent sprites (can be toggled by #define)
Another observation is that recent GCC generates much faster code for this
stuff than Clang from SVN.
git-svn-id: https://svn.eduke32.com/eduke32@4161 1a8010ca-5511-0410-912e-c29ae57300e0
For a full 1680x1050 screen drawing a solid/masked wall, the FPS increases
from 118 to 133 and from 114 to 116 (respectively) for me.
Guarded by the macro USE_VECTOR_EXT in the source.
git-svn-id: https://svn.eduke32.com/eduke32@4160 1a8010ca-5511-0410-912e-c29ae57300e0
- Currently: only tvlineasm1 and tvlineasm2, but incomplete (no reverse
translucency, nonpow2 textures will crash)
- For System V AMD64 calling conventions; requires YASM
git-svn-id: https://svn.eduke32.com/eduke32@4066 1a8010ca-5511-0410-912e-c29ae57300e0
Note the type change of vplce[] in engine.c: int32_t -> uint32_t.
git-svn-id: https://svn.eduke32.com/eduke32@3172 1a8010ca-5511-0410-912e-c29ae57300e0
This reverts r3159..r3161.
Conflicts:
eduke32/build/include/compat.h
(Handled so that r3163's changes are kept applied.)
git-svn-id: https://svn.eduke32.com/eduke32@3165 1a8010ca-5511-0410-912e-c29ae57300e0
I think there's also a fix for the CON precache system breakage in here (lost it in my local tree when I started getting the C++ build working in MSVC, sorry!)
git-svn-id: https://svn.eduke32.com/eduke32@3159 1a8010ca-5511-0410-912e-c29ae57300e0
Related to that, it looks like out-of-bounds accesses when drawing such walls/
maskwalls or *sprites* are fixed, too. Sprites still show a stray lines on some
occasions, but Valgrind doesn't complain then.
git-svn-id: https://svn.eduke32.com/eduke32@2805 1a8010ca-5511-0410-912e-c29ae57300e0
That is, "clang -ftrapv" builds don't abort almost immediately after entering
a level.
There are various classes of overflow bugs, needing different handling:
- Some texture mapping code was written with signed integer wrapping semantics
in mind. In some places, we're able to get away with unsigned casts.
- sometimes, we really need a wider range, like when calculating distances or
dot products
- negating INT_MIN. Here, we cast to int64_t temporarily. Note that if the
result is 32-bit wide, no 64-bit code may actually need to be generated.
- shifting into a signed integer's sign bit. We cast to uint32 here.
- in hitscan(), at the "abyss crash prevention code" comment, it's clearly
the other code that is better...
This is not merely done for pedantry, but rather makes it easier to track down
overflow bugs having a real impact on the game.
git-svn-id: https://svn.eduke32.com/eduke32@2784 1a8010ca-5511-0410-912e-c29ae57300e0
The original patch was communicated to me by Hendricks, but since it didn't
apply cleanly (it's based on r2182) I took the liberty of slightly messing
with it for inclusion into EDuke32.
Info: http://wiibrew.org/wiki/User:Tueidj/Duke3D
This first part (which wasn't changed from the original patch) implements
scaling arithmetic and miscellaneous pragmas, some in PPC assembly and a part
of them in C. Of some interest is the fact that the Wii processor apparently
lacks support for 64-bit integers, so divscale() uses floating-point math.
git-svn-id: https://svn.eduke32.com/eduke32@2621 1a8010ca-5511-0410-912e-c29ae57300e0
With the same setup as before, a screen-filling translucent wall (with nothing
drawn behind it) renders at about 7 fps faster (from 60-something fps initially)
git-svn-id: https://svn.eduke32.com/eduke32@2498 1a8010ca-5511-0410-912e-c29ae57300e0
These two functions draw a vertical line 4 neighboring pixels at a time.
This gives a significant speed boost for a full screen solid and masked wall
scene for x86_64 (where we have plenty of registers), about 60 --> 76 fps.
git-svn-id: https://svn.eduke32.com/eduke32@2497 1a8010ca-5511-0410-912e-c29ae57300e0
- forgot a glogy --> logy in a-c.c
- comment out stretchhline and slopevlin2 in a.nasm, the former also in a-c.c
- make transmaskvline2 use a uintptr_t where appropriate
git-svn-id: https://svn.eduke32.com/eduke32@2448 1a8010ca-5511-0410-912e-c29ae57300e0
Hlines for masked and translucent masked ceiling/floor (sprites).
- apply the --> 'do { ... } while (--cnt)' transformation, making these
functions iterate cnt+1 times like the asm version. This also fixes an
off-by-one issue where sprites or masked ceilings/floors had a one-pixel
non-drawn line to the right.
- This time, only declare-as-local two 'extern' globals (asm1 and asm2).
It seems that I was too eager with "localing" all file-scoped vars earlier.
GCC is able to remove the loads from memory inside the loop by itself, whereas
clang is not. This is not trivial, since it has to prove that the 'screen'
pointer passed to the functions will never alias these globals.
git-svn-id: https://svn.eduke32.com/eduke32@2424 1a8010ca-5511-0410-912e-c29ae57300e0
Affected functions: hlineasm4, vlineasm1, mvlineasm1, tvlineasm1.
Optimizations:
- declare all used variables as possibly const-qualified locals in each
function. This removes unnecessary loads from memory in the loops.
- rewrite "for (; cnt>=0; cnt--) {...}" to "cnt++; do {...} while (--cnt);"
in the three last ones (yes, these function iterate cnt+1 times). This
makes them functionally equivalent to the asm versions (madness ensues for
cnt < 0) and allows the compiler to remove one 'test' instruction at the
end of each loop.
- in the translucence function, replace addition by ORing
Observations (system: Core2 Duo Linux x86_64):
With a 1680x1050 window fully covered by the respective type of wall (simple,
masked, trans. masked), fps increases by 3-4 from the baseline of approx. 60.
git-svn-id: https://svn.eduke32.com/eduke32@2405 1a8010ca-5511-0410-912e-c29ae57300e0
* Sprite cstat 2048 ('use own shade', [N]) now works more or less. (Issues may arise when combined with sector light effects.)
* Begin work on 'smart' tag labeling system for Mapster32. Right now, it only displays a '+' after tags with linking semantics.
*
git-svn-id: https://svn.eduke32.com/eduke32@1866 1a8010ca-5511-0410-912e-c29ae57300e0