Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RLock's are SLOW #47251

Closed
jcea opened this issue May 29, 2008 · 20 comments
Closed

RLock's are SLOW #47251

jcea opened this issue May 29, 2008 · 20 comments
Assignees
Labels
performance Performance or resource usage stdlib Python modules in the Lib dir

Comments

@jcea
Copy link
Member

jcea commented May 29, 2008

BPO 3001
Nosy @gpshead, @jcea, @pitrou, @vstinner, @giampaolo
Files
  • rlock-v2.patch: Fixed C implementation of RLock
  • rlock2.patch
  • rlock3.patch
  • rlock4.patch
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = 'https://github.com/pitrou'
    closed_at = <Date 2009-11-10.18:52:30.470>
    created_at = <Date 2008-05-29.15:28:10.050>
    labels = ['library', 'performance']
    title = "RLock's are SLOW"
    updated_at = <Date 2009-11-10.18:52:30.468>
    user = 'https://github.com/jcea'

    bugs.python.org fields:

    activity = <Date 2009-11-10.18:52:30.468>
    actor = 'pitrou'
    assignee = 'pitrou'
    closed = True
    closed_date = <Date 2009-11-10.18:52:30.470>
    closer = 'pitrou'
    components = ['Library (Lib)']
    creation = <Date 2008-05-29.15:28:10.050>
    creator = 'jcea'
    dependencies = []
    files = ['11175', '15280', '15284', '15285']
    hgrepos = []
    issue_num = 3001
    keywords = ['patch']
    message_count = 20.0
    messages = ['67497', '67703', '68512', '68536', '71540', '71544', '71546', '71564', '71568', '74555', '95007', '95012', '95014', '95018', '95019', '95022', '95023', '95024', '95042', '95127']
    nosy_count = 9.0
    nosy_names = ['gregory.p.smith', 'jcea', 'Rhamphoryncus', 'pitrou', 'hgibson50', 'vstinner', 'giampaolo.rodola', 'kevinwatters', 'sserrano']
    pr_nums = []
    priority = 'normal'
    resolution = 'fixed'
    stage = 'patch review'
    status = 'closed'
    superseder = None
    type = 'performance'
    url = 'https://bugs.python.org/issue3001'
    versions = ['Python 3.2']

    @jcea
    Copy link
    Member Author

    jcea commented May 29, 2008

    threading.RLock acquire/release is very slow. A order of magnitude
    higher than no reentrant threading.Lock:

    """
    def RLockSpeed() :
    import time, threading
    t=time.time()
    result={}
    for i in xrange(1000000) :
    pass
    result["empty loop"]=time.time()-t
    l=threading.Lock()
    t=time.time()
    for i in xrange(1000000) :
    l.acquire()
    l.release()
    result["Lock"]=time.time()-t
    l=threading.RLock()
    t=time.time()
    for i in xrange(1000000) :
    l.acquire()
    l.release()
    result["RLock"]=time.time()-t
    return result

    if __name__=="__main__" :
      print RLockSpeed()
    """

    Result:
    {'empty loop': 0.074212074279785156, 'RLock': 10.144084215164185,
    'Lock': 1.2489800453186035}

    @jcea jcea added the performance Performance or resource usage label May 29, 2008
    @pitrou
    Copy link
    Member

    pitrou commented Jun 4, 2008

    You should investigate and try to diagnose where the speed difference
    comes from. ISTM the RLock class is implemented in Python while the Lock
    class is simply an alias to the builtin native lock type, which could
    explain most of the discrepancy.

    @sserrano
    Copy link
    Mannequin

    sserrano mannequin commented Jun 21, 2008

    Running with python -O the timing gets a little closer between Lock and
    RLock. This code won't be easy to improve in performance.
    The heaviest call is current_thread(), used at lines:
    117: me = current_thread()
    137: if self.__owner is not current_thread():

    and only consist on:
    788: def current_thread():
    789: try:
    790: return _active[_get_ident()]
    791: except KeyError:
    792: ##print "current_thread(): no current thread for", _get_ident()
    793: return _DummyThread()

    Simple profiler dump:
    $../python-trunk/python -O rlock.py
    time Lock 0.720541000366
    time RLock 4.90231084824
    400004 function calls in 0.982 CPU seconds

    Ordered by: internal time, call count

    ncalls tottime percall cumtime percall filename:lineno(function)
    100000 0.304 0.000 0.390 0.000 threading.py:116(acquire)
    100000 0.278 0.000 0.360 0.000 threading.py:136(release)
    1 0.232 0.232 0.982 0.982 rlock.py:27(testRLock)
    200000 0.168 0.000 0.168 0.000
    threading.py:788(current_thread)
    1 0.000 0.000 0.000 0.000 threading.py:103(init)
    1 0.000 0.000 0.000 0.000 threading.py:98(RLock)
    1 0.000 0.000 0.000 0.000 threading.py:76(init)
    0 0.000 0.000 profile:0(profiler)

    @pitrou
    Copy link
    Member

    pitrou commented Jun 21, 2008

    Le samedi 21 juin 2008 à 16:40 +0000, sebastian serrano a écrit :

    sebastian serrano <sebastian@devsar.com> added the comment:

    Running with python -O the timing gets a little closer between Lock and
    RLock. This code won't be easy to improve in performance.
    The heaviest call is current_thread(), used at lines:
    117: me = current_thread()
    137: if self.__owner is not current_thread():

    One could always try to rewrite RLock by replacing calls to
    threading.current_thread() with thread.get_ident().

    However, given the profile table you have appended, it will only save at
    most 30% of the time. If someone needs a more important speed-up, he
    should reimplement the RLock type in C (and contribute it back :-)).

    @gpshead gpshead added interpreter-core (Objects, Python, Grammar, and Parser dirs) stdlib Python modules in the Lib dir labels Jul 7, 2008
    @vstinner
    Copy link
    Member

    As suggested by pitrou, I wrote an implementation of RLock in C.
    Changes to Python version:

    • no debug message: i leave the message in #if 0 ... #endif
    • acquire() method argument "blocking" is not a keyword

    Notes:

    • RLock has no docstring!
    • In the Python version, RLock._release_save() replaces owner and
      counter attributes before release the lock. But releasing the lock may
      fails and no the object is in an inconsistent state

    @vstinner
    Copy link
    Member

    Oops, I forgot to update PyInit__Thread() with my new time:

    • Add PyType_Ready()
    • Register RLockType to threading dict

    Here is the new patch.

    @pitrou
    Copy link
    Member

    pitrou commented Aug 20, 2008

    Wow, that was quick. Did you try to replace threading.RLock with your
    implementation, and run the tests?

    By the way:

    • acquire() method argument "blocking" is not a keyword
    • In the Python version, RLock._release_save() replaces owner and
      counter attributes before release the lock. But releasing the lock may
      fails and no the object is in an inconsistent state

    Removing the debugging statements is fine, but apart from that the C
    implementation should mimick the current behaviour. Even if this
    behaviour has potential pitfalls.

    Speaking of which, it would be nice if RLock was subclassable. I don't
    know whether any software relies on this though.

    @gpshead
    Copy link
    Member

    gpshead commented Aug 20, 2008

    I doubt subclassability of RLock matters but who knows, people do code
    things.

    Regardless, using a C version wrapped in a simple python container class
    that calls the underlying C implementation's methods should be
    sufficient to allow sub-classing.

    Given the final 2.6 beta is scheduled for today, this won't make it into
    2.6/3.0 so we've got some time to polish up what we want.

    @pitrou
    Copy link
    Member

    pitrou commented Aug 20, 2008

    Gregory, would you have an advice on bpo-3618?

    @hgibson50
    Copy link
    Mannequin

    hgibson50 mannequin commented Oct 9, 2008

    I doubt subclassability of RLock matters but who knows, people do code
    things.

    I've recently done this to implement potential deadlock detection. I
    keep a record of the sequences of acquired locks, find unique
    sequences, then check for conflicts between each sequence. There's not
    much overhead and it highlighted some potential deadlocks where lock A
    and B were acquired AB in one route through code and BA in another
    route. The algorithm is a simplified version of that used in Linux -
    see http://www.mjmwired.net/kernel/Documentation/lockdep-design.txt

    Hugh

    @pitrou pitrou removed the interpreter-core (Objects, Python, Grammar, and Parser dirs) label Nov 6, 2009
    @pitrou pitrou self-assigned this Nov 6, 2009
    @pitrou
    Copy link
    Member

    pitrou commented Nov 7, 2009

    Here is a new patch against py3k. It passes all tests and is generally
    10-15x faster than the pure Python version.

    @gpshead
    Copy link
    Member

    gpshead commented Nov 7, 2009

    Reviewers: ,

    http://codereview.appspot.com/150055/diff/1/4
    File Modules/_threadmodule.c (right):

    http://codereview.appspot.com/150055/diff/1/4#newcode221
    Modules/_threadmodule.c:221: return PyBool_FromLong((long) r);
    This explicit (long) cast is unnecessary.

    http://codereview.appspot.com/150055/diff/1/4#newcode246
    Modules/_threadmodule.c:246: PyThread_release_lock(self->rlock_lock);
    reset self->rlock_owner to 0 before releasing the lock.

    Description:
    code review for http://bugs.python.org/issue3001

    Please review this at http://codereview.appspot.com/150055

    Affected files:
    M Lib/test/test_threading.py
    M Lib/threading.py
    M Modules/_threadmodule.c

    @vstinner
    Copy link
    Member

    vstinner commented Nov 7, 2009

    rlock_acquire_doc: "(...) return None once the lock is acquired".
    That's wrong, acquire() always return a boolean (True or False).

    rlock_release(): Is the assert(self->rlock_count > 0); really required?
    You're checking its value few lines before.

    rlock_acquire_restore(): raise an error if the lock acquire failed,
    whereas rlock_acquire() doesn't. Why not raising an error in
    rlock_acquire() (especially if blocking is True)? Or if the error can
    never occur, remove the error checking in rlock_acquire_restore().

    rlock_acquire_restore(): (maybe) set owner to 0 if count is 0.

    rlock_release_save(): may also reset self->rlock_owner to 0, as
    rlock_release().

    rlock_new(): why not reusing type instead of Py_TYPE(self) in
    "Py_TYPE(self)->tp_free(self)"?

    __exit__: rlock_release() is defined as __exit__() with METH_VARARGS,
    but it has no argument. Can it be a problem?

    Anything, thanks for the faster RLock!

    If your patch is commited, you can reconsider bpo-3618 (possible deadlock
    in python IO implementation).

    @pitrou
    Copy link
    Member

    pitrou commented Nov 7, 2009

    Thanks for the review. I will make the suggested modifications.

    http://codereview.appspot.com/150055/diff/1/4
    File Modules/_threadmodule.c (right):

    http://codereview.appspot.com/150055/diff/1/4#newcode221
    Modules/_threadmodule.c:221: return PyBool_FromLong((long) r);
    On 2009/11/07 07:48:05, gregory.p.smith wrote:

    This explicit (long) cast is unnecessary.

    Right.

    http://codereview.appspot.com/150055/diff/1/4#newcode246
    Modules/_threadmodule.c:246: PyThread_release_lock(self->rlock_lock);
    On 2009/11/07 07:48:05, gregory.p.smith wrote:

    reset self->rlock_owner to 0 before releasing the lock.

    We always check rlock_count before rlock_owner anyway but, yes, I could
    reset rlock_owner out of safety.

    http://codereview.appspot.com/150055

    @pitrou
    Copy link
    Member

    pitrou commented Nov 7, 2009

    rlock_acquire_doc: "(...) return None once the lock is acquired".
    That's wrong, acquire() always return a boolean (True or False).

    You're right, I should fix that docstring.

    rlock_release(): Is the assert(self->rlock_count > 0); really required?
    You're checking its value few lines before.

    Right again :) I forgot this was unsigned.

    rlock_acquire_restore(): raise an error if the lock acquire failed,
    whereas rlock_acquire() doesn't. Why not raising an error in
    rlock_acquire() (especially if blocking is True)?

    For rlock_acquire(), I mimicked what the Python code (_PyRLock.acquire)
    does: if locking fails, it returns False instead. It is part of the API.

    (and I agree this is not necessarily right, because failing to lock if
    blocking is True would probably indicate a low-level system error, but
    the purpose of the patch is not to change the API)

    But you're right that the Python version of rlock_acquire_restore()
    doesn't check the return code either, so I may remove this check from
    the C code, although the rest of the method clearly assumes the lock has
    been taken.

    rlock_acquire_restore(): (maybe) set owner to 0 if count is 0.

    rlock_release_save(): may also reset self->rlock_owner to 0, as
    rlock_release().

    As I said to Gregory, the current code doesn't rely on rlock_owner to be
    reset but, yes, we could still add it out of safety.

    rlock_new(): why not reusing type instead of Py_TYPE(self) in
    "Py_TYPE(self)->tp_free(self)"?

    Good point.

    __exit__: rlock_release() is defined as __exit__() with METH_VARARGS,
    but it has no argument. Can it be a problem?

    I just mimicked the corresponding declarations in the non-recursive lock
    object. Apparently it's safe on all architectures Python has been
    running on for years, although I agree it looks strange.

    If your patch is commited, you can reconsider bpo-3618 (possible deadlock
    in python IO implementation).

    Indeed.

    Thanks !

    @pitrou
    Copy link
    Member

    pitrou commented Nov 7, 2009

    Here is an updated patch. I addressed all review comments, except the
    one about acquire_restore() checking the return result of acquire(),
    because I think it's really a weakness in the Python implementation.

    @gpshead
    Copy link
    Member

    gpshead commented Nov 7, 2009

    Can you make the C implementation's repr() show something similar to the
    Python implementation?

    @pitrou
    Copy link
    Member

    pitrou commented Nov 7, 2009

    Yes, here is a new patch adding tp_repr.

    @vstinner
    Copy link
    Member

    vstinner commented Nov 8, 2009

    rlock4.patch looks correct and pass test_threading.py tests.

    @pitrou
    Copy link
    Member

    pitrou commented Nov 10, 2009

    I've committed the latest patch in r76189. Thanks for the reviews, everyone.

    @pitrou pitrou closed this as completed Nov 10, 2009
    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    performance Performance or resource usage stdlib Python modules in the Lib dir
    Projects
    None yet
    Development

    No branches or pull requests

    4 participants