mirror of
https://github.com/ZDoom/gzdoom.git
synced 2025-01-18 15:42:34 +00:00
dda5ddd3c2
registers AMD64 provides, this routine still needs to be written as self- modifying code for maximum performance. The additional registers do allow for further optimization over the x86 version by allowing all four pixels to be in flight at the same time. The end result is that AMD64 ASM is about 2.18 times faster than AMD64 C and about 1.06 times faster than x86 ASM. (For further comparison, AMD64 C and x86 C are practically the same for this function.) Should I port any more assembly to AMD64, mvlineasm4 is the most likely candidate, but it's not used enough at this point to bother. Also, this may or may not work with Linux at the moment, since it doesn't have the eh_handler metadata. Win64 is easier, since I just need to structure the function prologue and epilogue properly and use some assembler directives/macros to automatically generate the metadata. And that brings up another point: You need YASM to assemble the AMD64 code, because NASM doesn't support the Win64 metadata directives. - Added an SSE version of DoBlending. This is strictly C intrinsics. VC++ still throws around unneccessary register moves. GCC seems to be pretty close to optimal, requiring only about 2 cycles/color. They're both faster than my hand-written MMX routine, so I don't need to feel bad about not hand-optimizing this for x64 builds. - Removed an extra instruction from DoBlending_MMX, transposed two instructions, and unrolled it once, shaving off about 80 cycles from the time required to blend 256 palette entries. Why? Because I tried writing a C version of the routine using compiler intrinsics and was appalled by all the extra movq's VC++ added to the code. GCC was better, but still generated extra instructions. I only wanted a C version because I can't use inline assembly with VC++'s x64 compiler, and x64 assembly is a bit of a pain. (It's a pain because Linux and Windows have different calling conventions, and you need to maintain extra metadata for functions.) So, the assembly version stays and the C version stays out. - Removed all the pixel doubling r_detail modes, since the one platform they were intended to assist (486) actually sees very little benefit from them. - Rewrote CheckMMX in C and renamed it to CheckCPU. - Fixed: CPUID function 0x80000005 is specified to return detailed L1 cache only for AMD processors, so we must not use it on other architectures, or we end up overwriting the L1 cache line size with 0 or some other number we don't actually understand. SVN r1134 (trunk) |
||
---|---|---|
.. | ||
BUILDLIC.TXT | ||
classes.txt | ||
colors.txt | ||
commands.txt | ||
console.css | ||
console.html | ||
console.txt | ||
console2.html | ||
doomlic.txt | ||
foo.txt | ||
history.txt | ||
notes.txt | ||
README.asm | ||
README.Carmack | ||
README.gl | ||
rh-log.txt | ||
zdoom.txt |
README: glDOOM I never got around to do anything with respect to a Linux glDOOM port except for assembling a Linux3Dfx HOWTO (which, at that time, was a prerequisite to get permission to publicly distribute the already finished LinuxGlide port by Daryll Strauss). Linux q2test (and soon LinuxQuake2) demonstrate that Mesa with the MesaVoodoo driver is quite up to the requirements for a glDOOM port. If anybody wants to get into Linux glDOOM, please drop me a line. There is a Win32 GLDOOM port in the works, by Jim Dose. Quoting a recent posting by him: "I haven't had as much time lately to really work on the conversion. I currently have the renderer drawing the walls and floors as texture spans as the are in the software renderer. There's lighting on the walls, but not the floors, and sprites are being drawn, but not with the right texture. I figure that this is one nights work to get the game looking "normal". I haven't tested the game on less than a p200, so I'm not sure how it will perform under the average machine, but I don't expect it to be blindingly fast because of the number of spans that have to be drawn each frame. Rendering as polys is definitely the way to go. The reason I chose to do spans first was because it left the base renderer intact and I could concentrate on ironing out any Windows compatibility problems. Actually, the first version I had running was simply a blit of the 320x200 game screen through Open GL. Surprisingly, this actually was very playable, but certainly wasn't taking any advantage of 3D acceleration. Once the game was running, I started converting all the span routines over." Comment: for merging Linuxdoom with Win32, this is probably the best source for getting the Win32 environment done - before more significant changes occur. "One problem with drawing spans is that the engine doesn't calculate the texture coordinates with fractional accuracy, so the bilinear filtering works vertically, but not horizontally on the walls. I may try to fix this, but since I plan to use polys for the final version, it's not really high priority. Also, spans don't really allow for looking up and down." Comment: true looking up/down vs. Heretic-style y-shearing seems to require either a strange kind of transofrmation matrix (he probably does not use the OpenGL transformation at all), or rendering all the spans as textured rectangular slices instead of using glDrawBitmap. No, polys are the way to go. "When I tackle the conversion to polys, one big problem I'll encounter is drawing floors. Since the world is stored in a 2D bsp tree, there is no information on the shape of the floors. In fact the floors can be concave and may include holes (typically, most renderers break concave polys down into a collection of convex polys or triangles). In software, the floors are actually drawn using an algorithm that's similar to a flood fill (except that a list of open spans is kept instead of a buffer of pixels). This makes drawing the floors as polys fairly difficult." A polygon based approach will require significant changes to the data structures used in the refresh module. I recommend either separating a libref_soft.so first (a Quake2 like approach), and creating libref_gl afterwards, or abandoning the software rendering entirely. John Carmack wrote once upon a time: "... the U64 DOOM engine is much more what I would consider The Right Thing now -- it turns the subsector boundaries into polygons for the floors and ceilings ahead of time, then for rendering it walks the BSP front to back, doing visibility determination of subsectors by the one dimensional occlusion buffer and clipping sprites into subsectors, then it goes backwards through the visible subsectors, drawing floors, ceilings, walls, then sorted internal sprite fragments. It's a ton simpler and a ton faster, although it does suffer some overdraw when a high subsector overlooks a low one (but that is more than made up for by the simplicity of everything else)." Well, IMO compiling a separate list of floor/ceiling polygons after having read the WAD file, and thus introducing this as a completely separate data structure to the current code base might be the easiest thing to do. Jim Dose writes: "One method I may use to draw the floors as polys was suggested by Billy Zelsnack of Rebel Boat Rocker when we were working at 3D Realms together a few years ago. Basically, Billy was designing an engine that dealt with the world in a 2D portal format similar to the one that Build used, except that it had true looking up and down (no shearing). Since floors were basically implicit and could be concave, Billy drew them as if the walls extended downwards to infinity, but fixed the texture coordinates to appear that they were on the plane of the floor. The effect was that you could look up and down and there were no gaps or overdraw. It's a fairly clever method and allows you to store the world in a simpler data format. Had perspective texture mapping been fast enough back then, both Build and Doom could have done this in software." Perhaps the above is sufficient to get you started. Other Issues: 1. Occlusion DOOM uses a per-column lookup (top/bottom index) to do HLHSR. This works fine with span based rendering (well, getting horizontal spans of floors/ceilings into the picture is a separate story). It isn't really mindboggling with polygon based rendering. GLDOOM should abandon that. 2. Precalculated Visibility DOOM has the data used by Quake's PVS - in REJECT. During Quake development, lots of replacements for the occlusion buffer were tried, and PVS turned out to be best. I suggest usind the REJECT as PVS. There have been special effects using a utility named RMB. REJECT is a lump meant for enemy AI LoS calculation - a nonstandard REJECT will not work as a PVS, and introduce rendering errors. I suggest looking for a PVS lump in the WAD, and using REJECT if none is found. That way, it might be feasible to eat the cake and keep it. 3. Mipmaps DOOM does not have mipmapping. As we have 8bit palettized textures, OpenGL mipmapping might not give the desired results. Plus, composing textures from patches at runtime would require runtime mipmapping. Precalculated mipmaps in the WAD? 4. Sprites Partly transparent textures and sprites impose another problem related to mipmapping. Without alpha channel, this could give strange results. Precalculated, valid sprite mipmaps (w/o alpha)?