Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Acquiring locks not interrupted by signals on musl libc #78185

Closed
josephcsible mannequin opened this issue Jun 29, 2018 · 8 comments
Closed

Acquiring locks not interrupted by signals on musl libc #78185

josephcsible mannequin opened this issue Jun 29, 2018 · 8 comments
Labels
extension-modules C modules in the Modules dir type-bug An unexpected behavior, bug, or error

Comments

@josephcsible
Copy link
Mannequin

josephcsible mannequin commented Jun 29, 2018

BPO 34004
Nosy @benjaminp, @miss-islington, @josephcsible
PRs
  • closes bpo-34004: Skip lock interruption tests on musl. #9224
  • [3.7] closes bpo-34004: Skip lock interruption tests on musl. (GH-9224) #9226
  • [3.6] closes bpo-34004: Skip lock interruption tests on musl. (GH-9224) #9227
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = None
    closed_at = <Date 2018-09-12.20:48:07.591>
    created_at = <Date 2018-06-29.21:43:37.453>
    labels = ['extension-modules', 'type-bug']
    title = 'Acquiring locks not interrupted by signals on musl libc'
    updated_at = <Date 2018-09-12.23:33:39.964>
    user = 'https://github.com/josephcsible'

    bugs.python.org fields:

    activity = <Date 2018-09-12.23:33:39.964>
    actor = 'benjamin.peterson'
    assignee = 'none'
    closed = True
    closed_date = <Date 2018-09-12.20:48:07.591>
    closer = 'benjamin.peterson'
    components = ['Extension Modules']
    creation = <Date 2018-06-29.21:43:37.453>
    creator = 'Joseph Sible'
    dependencies = []
    files = []
    hgrepos = []
    issue_num = 34004
    keywords = ['patch']
    message_count = 8.0
    messages = ['320742', '325178', '325183', '325184', '325200', '325202', '325208', '325211']
    nosy_count = 3.0
    nosy_names = ['benjamin.peterson', 'miss-islington', 'Joseph Sible']
    pr_nums = ['9224', '9226', '9227']
    priority = 'normal'
    resolution = 'fixed'
    stage = 'resolved'
    status = 'closed'
    superseder = None
    type = 'behavior'
    url = 'https://bugs.python.org/issue34004'
    versions = ['Python 3.6']

    @josephcsible
    Copy link
    Mannequin Author

    josephcsible mannequin commented Jun 29, 2018

    When Python is built on Alpine Linux or in any other configuration that uses musl libc, calls to Lock.acquire() can't be interrupted by signals. This bug is caught by test_lock_acquire_interruption and test_rlock_acquire_interruption in 3.6/Lib/test/test_threadsignals.py, both of which fail when Python is built on musl.

    POSIX explicitly says that sem_timedwait ever failing with EINTR is optional:
    http://pubs.opengroup.org/onlinepubs/9699919799/functions/sem_timedwait.html#tag_16_508_05

    And musl deliberately chooses not to support it:
    http://www.openwall.com/lists/musl/2018/02/24/3
    http://git.musl-libc.org/cgit/musl/commit/?id=c0ed5a201b2bdb6d1896064bec0020c9973db0a1

    However, we incorrectly rely on it for correct operation. Our documentation https://docs.python.org/3.6/library/threading.html#threading.Lock.acquire just says that "Lock acquires can now be interrupted by signals on POSIX", not that any optional POSIX features are required for it.

    A similar bug was bpo-11223, which was the same problem but with pthread_cond_timedwait, which is used when semaphores aren't available.

    @josephcsible josephcsible mannequin added extension-modules C modules in the Modules dir type-bug An unexpected behavior, bug, or error labels Jun 29, 2018
    @benjaminp
    Copy link
    Contributor

    New changeset 5b10d51 by Benjamin Peterson in branch 'master':
    closes bpo-34004: Skip lock interruption tests on musl. (GH-9224)
    5b10d51

    @miss-islington
    Copy link
    Contributor

    New changeset b608fcd by Miss Islington (bot) in branch '3.7':
    closes bpo-34004: Skip lock interruption tests on musl. (GH-9224)
    b608fcd

    @miss-islington
    Copy link
    Contributor

    New changeset 5a435ea by Miss Islington (bot) in branch '3.6':
    closes bpo-34004: Skip lock interruption tests on musl. (GH-9224)
    5a435ea

    @josephcsible
    Copy link
    Mannequin Author

    josephcsible mannequin commented Sep 12, 2018

    How is this considered "fixed"? Why couldn't this be actually fixed by using eventfd instead of semaphores when they're available, for example?

    @benjaminp
    Copy link
    Contributor

    Reimplementing locks with eventfd can be another issue. Such a change can only land in a new Python version, though. We'll just have to consider musl unsupported for interrupting locks in our current maintenance releases as I have done.

    How likely is it that musl will change to allow returning EINTR from sem_timedwait given that the only stated reason not to is very old kernel versions?

    @josephcsible
    Copy link
    Mannequin Author

    josephcsible mannequin commented Sep 12, 2018

    Re musl changing their behavior, see https://www.openwall.com/lists/musl/2018/09/07/1 and the resulting thread.

    In addition to the old kernel version issue, two other issues were raised:

    1. EINTR makes programming mistakes more likely, as people won't think to handle it. I don't give much weight to this point.
    2. Most of the time, counting on receiving an EINTR results in race conditions. Our code seems to be affected by this too. Even on glibc, a signal at just the "right" time could result in it not being interrupted. This is why I think moving to an eventfd or something would be better, since we could then use pselect/ppoll/etc. to avoid the signal blocking race.

    @benjaminp
    Copy link
    Contributor

    On Wed, Sep 12, 2018, at 16:03, Joseph Sible wrote:

    1. Most of the time, counting on receiving an EINTR results in race
      conditions. Our code seems to be affected by this too. Even on glibc, a
      signal at just the "right" time could result in it not being
      interrupted. This is why I think moving to an eventfd or something would
      be better, since we could then use pselect/ppoll/etc. to avoid the
      signal blocking race.

    The race is a good point.

    However, switching to eventfd is not trivial. It would be the third locking implementation in python just for pthreads. Using eventfd would require reimplementing all the userspace parts of locking (atomics, appropriate memory barriers, and spinning) ourselves. futex is carefully optimized for userspace locking, whereas I've never heard of normal program locking being done with eventfd before.

    If the expected usecase for interrupting locks is user pressing C-c, being 99.99999% correct is probably enough.

    So, it's fine with me if you open another issue for reimplementing locks with eventfd. But, I don't personally have the desire to do it.

    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    extension-modules C modules in the Modules dir type-bug An unexpected behavior, bug, or error
    Projects
    None yet
    Development

    No branches or pull requests

    2 participants