registers AMD64 provides, this routine still needs to be written as self-
modifying code for maximum performance. The additional registers do allow
for further optimization over the x86 version by allowing all four pixels
to be in flight at the same time. The end result is that AMD64 ASM is about
2.18 times faster than AMD64 C and about 1.06 times faster than x86 ASM.
(For further comparison, AMD64 C and x86 C are practically the same for
this function.) Should I port any more assembly to AMD64, mvlineasm4 is the
most likely candidate, but it's not used enough at this point to bother.
Also, this may or may not work with Linux at the moment, since it doesn't
have the eh_handler metadata. Win64 is easier, since I just need to
structure the function prologue and epilogue properly and use some
assembler directives/macros to automatically generate the metadata. And
that brings up another point: You need YASM to assemble the AMD64 code,
because NASM doesn't support the Win64 metadata directives.
- Added an SSE version of DoBlending. This is strictly C intrinsics.
VC++ still throws around unneccessary register moves. GCC seems to be
pretty close to optimal, requiring only about 2 cycles/color. They're
both faster than my hand-written MMX routine, so I don't need to feel
bad about not hand-optimizing this for x64 builds.
- Removed an extra instruction from DoBlending_MMX, transposed two
instructions, and unrolled it once, shaving off about 80 cycles from the
time required to blend 256 palette entries. Why? Because I tried writing
a C version of the routine using compiler intrinsics and was appalled by
all the extra movq's VC++ added to the code. GCC was better, but still
generated extra instructions. I only wanted a C version because I can't
use inline assembly with VC++'s x64 compiler, and x64 assembly is a bit
of a pain. (It's a pain because Linux and Windows have different calling
conventions, and you need to maintain extra metadata for functions.) So,
the assembly version stays and the C version stays out.
- Removed all the pixel doubling r_detail modes, since the one platform they
were intended to assist (486) actually sees very little benefit from them.
- Rewrote CheckMMX in C and renamed it to CheckCPU.
- Fixed: CPUID function 0x80000005 is specified to return detailed L1 cache
only for AMD processors, so we must not use it on other architectures, or
we end up overwriting the L1 cache line size with 0 or some other number
we don't actually understand.
SVN r1134 (trunk)
arbitrary point. It has been replaced with a variant that takes a polyobject
as a source, since that was the only use that couldn't be rewritten with the
other variants. This also fixes the bug that polyobject sounds were not
successfully saved and caused a crash when reloading the game. Note that
this is a significant change to how equality of sound sources is determined,
so some things may not behave quite the same as before. (Which would be a
bug, but hopefully everything still sounds the same.)
SVN r1059 (trunk)
- Changed savegame versioning so that the written version is never lower
than the minmum one reported as compatible.
- Added mirrored movement modes for linked sectors.
- Added Eternity-style initialization for linked sectors as a new subtype
of Static_Init.
- Added linked sectors. The control sector determines how they move but if
any one of the linked sectors is blocked, movement for all linked sectors
will be affected. This will allow lifts consisting out of more than one
sector without the risk of breaking them if only one of the sectors is
blocked.
- Fixed: A_Mushroom created an actor on the stack.
SVN r825 (trunk)
palettes smaller than 256 entries with the shader I wrote for it. Is there
a list of gotchas like this listed some where? I'd really like to see it.
Well, when compiled with SM2.0, the PalTex shader seems to be every-so-
slightly faster on my GF7950GT than the SM1.4 version, so I guess it's a
minor win for cards that support it.
- Fixed: ST_Endoom() failed to free the bitmap it used.
- Added the DTA_ColorOverlay attribute to blend a color with the texture
being drawn. For software, this (currently) only works with black. For
hardware, it works with any color. The motiviation for this was so I could
rewrite the status bar calls that passed DIM_MAP to DTA_Translation to
draw darker icons into something that didn't require making a whole new
remap table.
- After having an "OMG! How could I have been so stupid?" moment, I have
removed the off-by-one check from D3DFB. I had thought the off-by-one error
was caused by rounding errors by the shader hardware. Not so. Rather, I
wasn't sampling what I thought I was sampling. A texture that uses palette
index 255 passes the value 1.0 to the shader. The shader needs to adjust the
range of its palette indexes, or it will end up trying to read color 256
from the palette texture when it should be reading color 255. Doh!
- The TranslationToTable() function has been added to map from translation
numbers used by actors to the tables those numbers represent. This function
performs validation for the input and returns NULL if the input value
is invalid.
- Major changes to the way translation tables work: No longer are they each a
256-byte array. Instead, the FRemapTable structure is used to represent each
one. It includes a remap array for the software renderer, a palette array
for a hardware renderer, and a native texture pointer for D3DFB. The
translationtables array itself is now an array of TArrays that point to the
real tables. The DTA_Translation attribute must also be passed a pointer
to a FRemapTable, not a byte array as previously.
- Modified DFrameBuffer::DrawRateStuff() so that it can do its thing properly
for D3DFB's 2D mode. Before, any fullscreen graphics (like help images)
covered it up.
SVN r640 (trunk)