Fixed bug 1128
Patrick Baggett 2011-02-16 22:58:33 PST This enhancement is for both x86/x64 Windows. The SDL implementation of mutexes uses the Win32 API interprocess synchronization primitive called a "Mutex". This implementation is subpar because it has a much higher overhead than an intraprocess mutex. The exact technical details are below, but my tests have shown that for reasonably high contention (10 threads on 4 physical cores), it has 13x higher overhead than the Win32 CriticalSection API. If this enhancement is accepted, I will write a patch to implement SDL mutexes using the critical section API, which should dramatically reduce overhead and improve scalability. -- Tech details -- Normally, Win32 Mutexes are used across process boundaries to synchronize separate processes. In order to lock or unlock them, a user->kernel space transition is necessary, even in the uncontented case on a single CPU machine. Win32 CriticalSection objects can only be used within the same process virtual address space and thus to lock one, does not require a user->kernel space transition for the uncontended case, and additionally may spin a short while before going into kernel wait. This small spin allows a thread to obtain the lock if the mutex is released shortly after the thread starts spinning, in effect bypassing the overhead of user->kernel space transition which has higher overhead than the spinning itself.
Showing
Please register or sign in to comment