Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Crash in logging._after_at_fork_child_reinit_locks() #84272

Closed
vstinner opened this issue Mar 27, 2020 · 8 comments
Closed

Crash in logging._after_at_fork_child_reinit_locks() #84272

vstinner opened this issue Mar 27, 2020 · 8 comments
Labels
3.9 only security fixes stdlib Python modules in the Lib dir

Comments

@vstinner
Copy link
Member

BPO 40091
Nosy @gpshead, @vsajip, @vstinner, @melwitt
PRs
  • [WIP] bpo-40091: Fix a crash in logging after fork #19196
  • bpo-40089: Add _at_fork_reinit() method to locks #19195
  • bpo-40091: Fix a hang at fork in the logging module #19416
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = None
    closed_at = <Date 2020-04-13.22:29:10.792>
    created_at = <Date 2020-03-27.17:02:43.485>
    labels = ['library', '3.9']
    title = 'Crash in logging._after_at_fork_child_reinit_locks()'
    updated_at = <Date 2020-04-13.22:29:10.792>
    user = 'https://github.com/vstinner'

    bugs.python.org fields:

    activity = <Date 2020-04-13.22:29:10.792>
    actor = 'vstinner'
    assignee = 'none'
    closed = True
    closed_date = <Date 2020-04-13.22:29:10.792>
    closer = 'vstinner'
    components = ['Library (Lib)']
    creation = <Date 2020-03-27.17:02:43.485>
    creator = 'vstinner'
    dependencies = []
    files = []
    hgrepos = []
    issue_num = 40091
    keywords = ['patch']
    message_count = 8.0
    messages = ['365170', '365172', '365197', '365952', '365959', '365968', '366338', '366339']
    nosy_count = 4.0
    nosy_names = ['gregory.p.smith', 'vinay.sajip', 'vstinner', 'melwitt']
    pr_nums = ['19196', '19195', '19416']
    priority = 'normal'
    resolution = 'fixed'
    stage = 'resolved'
    status = 'closed'
    superseder = None
    type = None
    url = 'https://bugs.python.org/issue40091'
    versions = ['Python 3.9']

    @vstinner
    Copy link
    Member Author

    test_threading.ThreadJoinOnShutdown.test_reinit_tls_after_fork() does crash randomly on AIX in logging._after_at_fork_child_reinit_locks():
    https://bugs.python.org/issue40068#msg365028

    This function ends by releasing a lock:

        _releaseLock()  # Acquired by os.register_at_fork(before=.

    But it is not safe to use a lock after fork (see bpo-6721 and bpo-40089).

    The purpose of _after_at_fork_child_reinit_locks() is already to fix locks used by logging handles: see bpo-36533 and commit 64aa6d2. But the global logging._lock is not reset after fork.

    Attached PR fix the issue.

    @vstinner vstinner added 3.9 only security fixes stdlib Python modules in the Lib dir labels Mar 27, 2020
    @vstinner
    Copy link
    Member Author

    See also bpo-40089 "Add _at_fork_reinit() method to locks".

    @vstinner
    Copy link
    Member Author

    By the way, I'm not sure about the error handling code path:

        for handler in _at_fork_reinit_lock_weakset:
            try:
                handler.createLock()
            except Exception as err:
                # Similar to what PyErr_WriteUnraisable does.
                print("Ignoring exception from logging atfork", instance,
                      "._reinit_lock() method:", err, file=sys.stderr)

    Is it really a good idea to ignore such error?

    By the way, there is an obvious typo: "instance" name doesn't exist, it should be "handler" :-)

    @melwitt
    Copy link
    Mannequin

    melwitt mannequin commented Apr 7, 2020

    Hi, I've been following the work related to:

    https://bugs.python.org/issue6721
    https://bugs.python.org/issue40089

    because I encountered a problem where a standard library lock was held by a parent process at the time that child processes were forked, so the child processes got stuck behind the inherited held locks.

    But, if I'm understanding correctly, this issue is fixing something in python logging specifically and not all standard library locks in general.

    My question is, will there be a way to reinit standard library locks in general using _at_fork_reinit()? That is, should we expect a future fix in python to do this or is the recommendation to continue to ensure the application reinits locks during process start if we know the process could be a child?

    Thanks for your advice.

    @vstinner
    Copy link
    Member Author

    vstinner commented Apr 7, 2020

    because I encountered a problem where a standard library lock was held by a parent process at the time that child processes were forked, so the child processes got stuck behind the inherited held locks.

    Which lock from which module? You wrote details at:
    #19195 (comment)

    According to your comment #28 at https://bugs.launchpad.net/nova/+bug/1844929 the lock involved in the issue comes from _TransactionFactory of oslo.db:
    https://github.com/openstack/oslo.db/blob/b903d4e1ee07ef2ec454daa5b8418b3039e02774/oslo_db/sqlalchemy/enginefacade.py#L189

    So it's a bug in oslo.db, not in Python. Python doesn't provide any machinery to automatically reinitialize all locks created in Python at fork in the child process. os.register_at_fork() must be used explicitly.

    But, if I'm understanding correctly, this issue is fixing something in python logging specifically and not all standard library locks in general.

    This issue is specific to logging.

    My question is, will there be a way to reinit standard library locks in general using _at_fork_reinit()? That is, should we expect a future fix in python to do this or is the recommendation to continue to ensure the application reinits locks during process start if we know the process could be a child?

    Each module has to setup an os.register_at_fork() callback to reinitialize its locks. It's done by threading and logging modules for example. The multiprocessing has its own util.register_after_fork() machinery (see bpo-40221)..

    @melwitt
    Copy link
    Mannequin

    melwitt mannequin commented Apr 8, 2020

    Thank you for explaining those details and pointing me in the right direction about the proper way to solve the problem in oslo.db.

    We still have to support python 3.6, so for now we will still need to do something different (in nova) to handle this (clear our cache of oslo.db _TransactionContextManager during oslo.service start(), is what I have proposed).

    For those running python 3.7 and newer, I will try to fix the situation more generally in oslo.db with os.register_at_fork() using the examples you gave.

    Thank you again for your help, I have learned a lot.

    @vstinner
    Copy link
    Member Author

    New changeset 4c3da78 by Victor Stinner in branch 'master':
    bpo-40091: Fix a hang at fork in the logging module (GH-19416)
    4c3da78

    @vstinner
    Copy link
    Member Author

    I merged my fix into master.

    PR 19196 fix could be backported to 3.8, sadly I had to reject this PR since it introduced new problems :-( This approach (create a new lock at fork) doesn't work :-( So I'm not sure how to fix the issue on 3.8 without backporting the new _at_fork_reinit() method, but I would prefer to not backport it since I see it as a new feature.

    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    3.9 only security fixes stdlib Python modules in the Lib dir
    Projects
    None yet
    Development

    No branches or pull requests

    1 participant