Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

_Py_atomic_* not actually atomic on Windows with MSVC #74932

Closed
Paxxi mannequin opened this issue Jun 24, 2017 · 16 comments
Closed

_Py_atomic_* not actually atomic on Windows with MSVC #74932

Paxxi mannequin opened this issue Jun 24, 2017 · 16 comments
Labels
3.7 (EOL) end of life interpreter-core (Objects, Python, Grammar, and Parser dirs) type-bug An unexpected behavior, bug, or error

Comments

@Paxxi
Copy link
Mannequin

Paxxi mannequin commented Jun 24, 2017

BPO 30747
Nosy @pfmoore, @mdickinson, @pitrou, @vstinner, @tjguk, @zware, @serhiy-storchaka, @zooba, @ethanhs, @segevfiner, @Paxxi
PRs
  • bpo-30747: Attempt to fix atomic load/store #2383
  • bpo-9566 & bpo-30747: Silence warnings from pyatomic.h macros #3140
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = None
    closed_at = <Date 2017-08-20.22:46:23.595>
    created_at = <Date 2017-06-24.19:35:59.920>
    labels = ['interpreter-core', 'type-bug', '3.7']
    title = '_Py_atomic_* not actually atomic on Windows with MSVC'
    updated_at = <Date 2018-06-14.19:54:35.459>
    user = 'https://github.com/Paxxi'

    bugs.python.org fields:

    activity = <Date 2018-06-14.19:54:35.459>
    actor = 'Paxxi'
    assignee = 'none'
    closed = True
    closed_date = <Date 2017-08-20.22:46:23.595>
    closer = 'pitrou'
    components = ['Interpreter Core']
    creation = <Date 2017-06-24.19:35:59.920>
    creator = 'Paxxi'
    dependencies = []
    files = []
    hgrepos = []
    issue_num = 30747
    keywords = []
    message_count = 16.0
    messages = ['296786', '296793', '296816', '296820', '296823', '296846', '296856', '296857', '300191', '300310', '300330', '300609', '300611', '319482', '319539', '319541']
    nosy_count = 12.0
    nosy_names = ['paul.moore', 'mark.dickinson', 'pitrou', 'vstinner', 'jyasskin', 'tim.golden', 'zach.ware', 'serhiy.storchaka', 'steve.dower', 'ethan smith', 'Segev Finer', 'Paxxi']
    pr_nums = ['2383', '3140']
    priority = 'normal'
    resolution = 'fixed'
    stage = 'resolved'
    status = 'closed'
    superseder = None
    type = 'behavior'
    url = 'https://bugs.python.org/issue30747'
    versions = ['Python 3.7']

    @Paxxi
    Copy link
    Mannequin Author

    Paxxi mannequin commented Jun 24, 2017

    _Py_atomic_store and _Py_atomic_load are not implemented as atomic operations on Windows when using Visual Studio to build.

    This might cause hard to troubleshoot behaviour, especially in third party hosting applications..

    @Paxxi Paxxi mannequin added type-feature A feature request or enhancement 3.7 (EOL) end of life interpreter-core (Objects, Python, Grammar, and Parser dirs) labels Jun 24, 2017
    @pitrou pitrou added type-bug An unexpected behavior, bug, or error and removed type-feature A feature request or enhancement labels Jun 24, 2017
    @zooba
    Copy link
    Member

    zooba commented Jun 24, 2017

    Unless you've got an example of this causing actual issues, it should only go into 3.7.

    All platforms supported by Windows guarantee atomicity for aligned, pointer-sized reads and writes, but there appear to be memory fencing semantics in here too. I don't understand entirely why we need this code at all, so I'll let Victor or Serhiy comment.

    @pitrou
    Copy link
    Member

    pitrou commented Jun 25, 2017

    The general issue those macros want to prevent is that modern CPUs have a tendency to execute a lot of stuff out-of-order, *including* memory operations. From the perspective of a single hardware core (or thread, really), that's fine since it has a logically consistent view of the machine's state (it knows which operations have been reordered, which values have been committed to cache or not, etc.). But what happens when another hardware core examines in-memory state at the same time? It might find values changing in a different order than the programmer had intended. If it's important that the visible order hasn't been changed, you have a bug.

    Note that C "volatile" is not enough: it only prevents the *compiler* from re-ordering or eliding memory accesses, but not the CPU. As such, "volatile" is only useful if you have a single word-sized piece of state that you need to inspect from several threads at once. But the eval loop uses several of them, and therefore needs to prevent the CPU from writing or reading them in the wrong order (which may produce synchronization bugs such as deadlocks).

    Also note that traditionally, x86 has a "strong" memory ordering model which prevents nasty kinds of reorderings to happen. But other architectures such as ARM or SPARC can have much weaker memory models (IIRC, we had sporadic hangs on an ARM buildbot a long time ago, because of that, but I can't find a reference).

    I admit I'm unable to vouch that the current code is correct. Jeffrey Yasskin did the original change adding the Py_atomic types and using them in the GIL implementation.

    @Paxxi
    Copy link
    Mannequin Author

    Paxxi mannequin commented Jun 25, 2017

    Antoine said it best.

    It's very hard to prove that this code is correct or incorrect as it requires multiple threads accessing the same variable and very specific timings to produce an actual issue.

    My PR only solved half of the issue because I didn't really have a good idea for how to handle the load macro but I think it out today so I'll be updating the PR.

    Would there be any interest of implementing them for MSVC/ARM as well? It's basically the same code so not much work, however I don't have a platform to actually test it.

    @zooba
    Copy link
    Member

    zooba commented Jun 25, 2017

    Would there be any interest of implementing them for MSVC/ARM as well

    Sure, since you're there. It's not easy to test, but I know people who are doing it, so it'll get noticed eventually.

    Maybe there's some sort of stress test we can write that is likely to encounter issues? I've done that before to detect concurrency issues in particularly complex code.

    @jyasskin
    Copy link
    Mannequin

    jyasskin mannequin commented Jun 26, 2017

    Has enough time passed that you can use the C11 atomic types and operations instead of special-casing these for each compiler? (e.g. http://en.cppreference.com/w/c/atomic/atomic_store)

    @pitrou
    Copy link
    Member

    pitrou commented Jun 26, 2017

    Unfortunately not: """Python versions greater than or equal to 3.6 use C89 with several select C99 features: [...]""" (PEP-7).

    @Paxxi
    Copy link
    Mannequin Author

    Paxxi mannequin commented Jun 26, 2017

    Microsoft don't spend much time on the C compiler features, still lacking C99 features so I don't have much hope of getting C11 support anytime soon or at all.

    One could of course implement a cross platform stdatomic library that matches the C11 spec.

    @pitrou
    Copy link
    Member

    pitrou commented Aug 12, 2017

    New changeset e664d7f by Antoine Pitrou (Pär Björklund) in branch 'master':
    bpo-30747: Attempt to fix atomic load/store (bpo-2383)
    e664d7f

    @pitrou pitrou closed this as completed Aug 12, 2017
    @zooba
    Copy link
    Member

    zooba commented Aug 15, 2017

    This commit added *a lot* of warnings - http://buildbot.python.org/all/builders/AMD64%20Windows10%203.x/builds/1056

    Most are of the form:
    warning C4133: 'function': incompatible types - from 'volatile uintptr_t *' to 'volatile int *'

    I can't obviously see where they are coming from, but someone please fix them.

    @zooba zooba reopened this Aug 15, 2017
    @Paxxi
    Copy link
    Mannequin Author

    Paxxi mannequin commented Aug 16, 2017

    My guess would be the cast from uintptr_t to intptr_t in the return type.

    I'll look into this, when possible, should have some time later this week or over the weekend.

    @pitrou
    Copy link
    Member

    pitrou commented Aug 20, 2017

    New changeset 0267128 by Antoine Pitrou (Segev Finer) in branch 'master':
    bpo-9566 & bpo-30747: Silence warnings from pyatomic.h macros (bpo-3140)
    0267128

    @pitrou pitrou closed this as completed Aug 20, 2017
    @pitrou
    Copy link
    Member

    pitrou commented Aug 20, 2017

    Thanks to Segev Finer for submitting the PR which fixed the warnings!

    @ethanhs
    Copy link
    Mannequin

    ethanhs mannequin commented Jun 13, 2018

    When working on clang-cl support, I was advised here https://reviews.llvm.org/D47672#1131325 that we may be using hardware lock elision incorrectly. Copying from there:

    I also spoke to Andi Kleen here at Intel to make sure I got these inline assembly versions correct. And he's not sure CPython should be using these the way it is. It looks like they try to use the HLE versions anytime the memory order is acquire/release. But HLE isn't suitable for every acquire/release.

    I believe if we just use the simple _InterlockedExchange/_InterlockedCompareExchange intrinsics, things should be safer.

    @pitrou
    Copy link
    Member

    pitrou commented Jun 14, 2018

    I would be ok with reverting to the non-HLE variants. Does anyone want to test the performance implications on TSX-enabled hardware?

    @Paxxi
    Copy link
    Mannequin Author

    Paxxi mannequin commented Jun 14, 2018

    The HLE variants were simply chosen to match the semantics on other
    platforms with regard to aquire/release.
    If Intel engineers say the plain versions are better that's good enough for
    me.

    It would be interesting seeing some benchmarks but I don't have any idea on
    how to reliably test the non happy path.

    On Thu, 14 Jun 2018, 21:32 Antoine Pitrou, <report@bugs.python.org> wrote:

    Antoine Pitrou <pitrou@free.fr> added the comment:

    I would be ok with reverting to the non-HLE variants. Does anyone want to
    test the performance implications on TSX-enabled hardware?

    ----------


    Python tracker <report@bugs.python.org>
    <https://bugs.python.org/issue30747\>


    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    3.7 (EOL) end of life interpreter-core (Objects, Python, Grammar, and Parser dirs) type-bug An unexpected behavior, bug, or error
    Projects
    None yet
    Development

    No branches or pull requests

    2 participants