While tracing a program using multiprocessing queues, I noticed that there were many calls to gettimeofday.
It turns out that acquire_timed, used by lock_PyThread_acquire_lock and rlock_acquire, always call gettimeofday, even if no timeout argument is given.
Here's an example of the performance impact (I know it's a contrived example :-):

$ cat /tmp/ 
import threading

lock = threading.Lock()

i = 0

def do_loop():
    global i
    for j in range(500000):
        i += 1

t1 = threading.Thread(target=do_loop)
t2 = threading.Thread(target=do_loop)

With current code:
$ time ./python /tmp/ 

real    0m5.200s
user    0m3.288s
sys     0m1.896s

Without useless calls to gettimeofday:
$ time ./python /tmp/ 

real    0m3.091s
user    0m3.056s
sys     0m0.020s

Note that the actual gain depends on the kernel, hardware and clocksource in use (the above measurements are on a Linux 2.6.32 kernel, using acpi_pm as clocksource).

Attached is a patch removing useless calls to gettimeofday.
Note that I also removed the check for expired timeout following trylock in case of PY_LOCK_INTR, since according to,  it seems that only sem_wait is interruptible, not sem_trywait (e.g. on Linux, sem_trywait is implemented using futex which handle non-contended case in user-space). Windows locking primitives can't return PY_LOCK_INTR. Anyway, even if it happend once in a blue moon, we would just retry a trylock, which kind of makes sense.
Antoine Pitrou (pitrou) Date: 2011-03-06 07:41
Pushed in 6ba9ba58499e, thank you.
