• Sam Lantinga's avatar
    Fixed bug #15 · 3c69f946
    Sam Lantinga authored
    SDL_blit_A.mmx-speed.patch.txt --
            Speed improvements and a bugfix for the current GCC inline mmx
            asm code:
            - Changed some ops and removed some resulting useless ones.
            - Added some instruction parallelism (some gain)
            The resulting speed on my Xeon improved upto 35% depending on
            the function (measured in fps).
            - Fixed a bug where BlitRGBtoRGBSurfaceAlphaMMX() was
            setting the alpha component on the destination surfaces (to
            opaque-alpha) even when the surface had none.
    
    SDL_blit_A.mmx-msvc.patch.txt --
            MSVC mmx intrinsics version of the same GCC asm code.
            MSVC compiler tries to parallelize the code and to avoid
            register stalls, but does not always do a very good job.
            Per-surface blending MSVC functions run quite a bit faster
            than their pure-asm counterparts (upto 55% faster for 16bit
            ones), but the per-pixel blending runs somewhat slower than asm.
    
    - BlitRGBtoRGBSurfaceAlphaMMX and BlitRGBtoRGBPixelAlphaMMX (and all
    variants) can now also handle formats other than (A)RGB8888. Formats
    like RGBA8888 and some quite exotic ones are allowed -- like
    RAGB8888, or actually anything having channels aligned on 8bit
    boundary and full 8bit alpha (for per-pixel alpha blending).
    The performance cost of this change is virtually 0 for per-surface
    alpha blending (no extra ops inside the loop) and a single non-MMX
    op inside the loop for per-pixel blending. In testing, the per-pixel
    alpha blending takes a ~2% performance hit, but it still runs much
    faster than the current code in CVS. If necessary, a separate function
    with this functionality can be made.
    
    This code requires Processor Pack for VC6.
    
    --HG--
    extra : convert_revision : svn%3Ac70aab31-4412-0410-b14c-859654838e24/trunk%401546
    3c69f946
SDL_blit_A.c 88.4 KB