Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

intern strings in deepfrozen modules #90588

Closed
kumaraditya303 opened this issue Jan 19, 2022 · 26 comments
Closed

intern strings in deepfrozen modules #90588

kumaraditya303 opened this issue Jan 19, 2022 · 26 comments
Labels
3.11 only security fixes type-bug An unexpected behavior, bug, or error

Comments

@kumaraditya303
Copy link
Contributor

BPO 46430
Nosy @gvanrossum, @vstinner, @tiran, @corona10, @miss-islington, @sobolevn, @kumaraditya303
PRs
  • bpo-46430: Intern strings in deep-frozen modules  #30683
  • Fix sphinx-lint after #31097 and b878b3a #31248
  • bpo-46430: fix memory leak in interned strings of deep-frozen modules #31549
  • bpo-46430: Add test for checking block leak is zero #31556
  • bpo-46430: fix error-handling in _Py_Deepfreeze_Init #31596
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = None
    closed_at = <Date 2022-02-26.16:37:30.193>
    created_at = <Date 2022-01-19.11:06:14.395>
    labels = ['type-bug', '3.11']
    title = 'intern strings in deepfrozen modules'
    updated_at = <Date 2022-03-04.15:06:51.121>
    user = 'https://github.com/kumaraditya303'

    bugs.python.org fields:

    activity = <Date 2022-03-04.15:06:51.121>
    actor = 'corona10'
    assignee = 'none'
    closed = True
    closed_date = <Date 2022-02-26.16:37:30.193>
    closer = 'gvanrossum'
    components = []
    creation = <Date 2022-01-19.11:06:14.395>
    creator = 'kumaraditya'
    dependencies = []
    files = []
    hgrepos = []
    issue_num = 46430
    keywords = ['patch']
    message_count = 26.0
    messages = ['410936', '411003', '412920', '412922', '412937', '412983', '413002', '413004', '413822', '413823', '413824', '413832', '413837', '413839', '413847', '413851', '413852', '413855', '413893', '413895', '413928', '413930', '413931', '414119', '414120', '414127']
    nosy_count = 7.0
    nosy_names = ['gvanrossum', 'vstinner', 'christian.heimes', 'corona10', 'miss-islington', 'sobolevn', 'kumaraditya']
    pr_nums = ['30683', '31248', '31549', '31556', '31596']
    priority = 'normal'
    resolution = 'fixed'
    stage = 'resolved'
    status = 'closed'
    superseder = None
    type = 'behavior'
    url = 'https://bugs.python.org/issue46430'
    versions = ['Python 3.11']

    @kumaraditya303 kumaraditya303 added 3.11 only security fixes labels Jan 19, 2022
    @kumaraditya303
    Copy link
    Contributor Author

    Interns strings in deep-frozen modules.

    See faster-cpython/ideas#218

    @kumaraditya303
    Copy link
    Contributor Author

    This speeds up comparison of strings by just comparing their pointers so it is much faster.

    @gvanrossum
    Copy link
    Member

    New changeset c0a5ebe by Kumar Aditya in branch 'main':
    bpo-46430: Intern strings in deep-frozen modules (GH-30683)
    c0a5ebe

    @tiran
    Copy link
    Member

    tiran commented Feb 9, 2022

    I noticed that the new interning code lacks proper error reporting. There are only asserts. _PyCode_New() checks the return value of intern_strings.

    @gvanrossum
    Copy link
    Member

    We discussed that and found that a lot of errors are ignored during interning anyway.

    But it's not too late to change if you want to (sending a PR would probably be quicker than arguing :-).

    @miss-islington
    Copy link
    Contributor

    New changeset dee020a by Nikita Sobolev in branch 'main':
    Fix sphinx-lint after bpo-31097 and b878b3a (GH-31248)
    dee020a

    @kumaraditya303
    Copy link
    Contributor Author

    I consider this done so closing it as improving the error handling of interning it out of scope of this issue.

    @tiran
    Copy link
    Member

    tiran commented Feb 10, 2022

    Please leave the ticket open until we have an agreement how to handle the missing error checks.

    @tiran tiran reopened this Feb 10, 2022
    @tiran tiran added the type-bug An unexpected behavior, bug, or error label Feb 10, 2022
    @tiran tiran reopened this Feb 10, 2022
    @tiran tiran added the type-bug An unexpected behavior, bug, or error label Feb 10, 2022
    @vstinner
    Copy link
    Member

    New changeset c0a5ebe by Kumar Aditya in branch 'main':
    bpo-46430: Intern strings in deep-frozen modules (GH-30683)

    This change introduced a memory leak at Python exit.

    Before:

    $ ./python -X showrefcount -c pass
    [0 refs, 0 blocks]

    After:

    $ ./python -X showrefcount -c pass
    [0 refs, 344 blocks]

    @vstinner
    Copy link
    Member

    This change introduced a memory leak at Python exit.

    It's regression related to bpo-1635741 which I fixed recently:
    https://mail.python.org/archives/list/python-dev@python.org/thread/E4C6TDNVDPDNNP73HTGHN5W42LGAE22F/

    @vstinner
    Copy link
    Member

    _PyStaticCode_InternStrings() error handling is based on assert(): that's really bad. It can crash Python (exit with abort()) at the first memory allocation failure. Why not returning -1 on error?

    @gvanrossum
    Copy link
    Member

    Okay, let's change the error handling. @Kumar, can you handle that?

    @victor, the refleak is unrelated to the error handling right? Presumably the leak is imaginary -- the deep-frozen interned strings should be accounted for somehow. @Kumar do you need help investigating this?

    @vstinner
    Copy link
    Member

    Presumably the leak is imaginary

    Well, you can check with your favorite memory debugger like Valgrind if you don't trust Python internal memory debugger :-)

    Not leaking memory at exit matters when Python is embedded in an application.

    I don't think that it's related to the error handling which is stripped in release mode.

    @gvanrossum
    Copy link
    Member

    Not leaking memory at exit matters when Python is embedded
    in an application.

    Sure, but "leaks" caused by deep-freezing cannot be solved by freeing up the deep-frozen memory -- the solution must be to update the accounting somewhere.

    Where is the existence of Py_None accounted for (since it's statically allocated, or at least used to be)? That's likely where we'd have to do something about the deep-frozen objects.

    @vstinner
    Copy link
    Member

    Sure, but "leaks" caused by deep-freezing cannot be solved by freeing up the deep-frozen memory -- the solution must be to update the accounting somewhere.

    Python allocates memory (ex: with PyObject_Malloc()) which is not released at exit. Two examples from Valgrind:

    ==196803== 50 bytes in 1 blocks are still reachable in loss record 1 of 87
    ==196803== at 0x484486F: malloc (vg_replace_malloc.c:381)
    ==196803== by 0x544E29: _PyMem_RawMalloc (obmalloc.c:101)
    ==196803== by 0x545E0E: PyObject_Malloc (obmalloc.c:700)
    ==196803== by 0x587159: PyUnicode_New (unicodeobject.c:1448)
    ==196803== by 0x58C4CF: get_latin1_char (unicodeobject.c:2148)
    ==196803== by 0x58D7F7: _PyUnicode_FromUCS1 (unicodeobject.c:2450)
    ==196803== by 0x58E374: PyUnicode_FromKindAndData (unicodeobject.c:2525)
    ==196803== by 0x69402B: r_object (marshal.c:1150)
    ==196803== by 0x694295: r_object (marshal.c:1212)
    ==196803== by 0x694AE0: r_object (marshal.c:1393)
    ==196803== by 0x694295: r_object (marshal.c:1212)
    ==196803== by 0x694AA4: r_object (marshal.c:1387)
    ==196803== by 0x6950F3: read_object (marshal.c:1524)
    ==196803== by 0x6953E7: PyMarshal_ReadObjectFromString (marshal.c:1641)
    ==196803== by 0x763223: _Py_Get_Getpath_CodeObject (getpath.c:792)
    ==196803== by 0x76337F: _PyConfig_InitPathConfig (getpath.c:847)
    ==196803== by 0x68CD04: config_init_import (initconfig.c:1967)
    ==196803== by 0x68CE4E: _PyConfig_InitImportConfig (initconfig.c:2000)
    ==196803== by 0x69E594: init_interp_main (pylifecycle.c:1103)
    ==196803== by 0x69EBF8: pyinit_main (pylifecycle.c:1216)
    ==196803== by 0x69EDCE: Py_InitializeFromConfig (pylifecycle.c:1247)
    ==196803== by 0x6D5346: pymain_init (main.c:67)
    ==196803== by 0x6D64F8: pymain_main (main.c:692)
    ==196803== by 0x6D65A1: Py_BytesMain (main.c:725)
    ==196803== by 0x41D7B5: main (python.c:15)

    and

    ==196803== 3,336 bytes in 60 blocks are still reachable in loss record 87 of 87
    ==196803== at 0x484486F: malloc (vg_replace_malloc.c:381)
    ==196803== by 0x544E29: _PyMem_RawMalloc (obmalloc.c:101)
    ==196803== by 0x545E0E: PyObject_Malloc (obmalloc.c:700)
    ==196803== by 0x587159: PyUnicode_New (unicodeobject.c:1448)
    ==196803== by 0x59AEB9: unicode_decode_utf8 (unicodeobject.c:5162)
    ==196803== by 0x59B5B0: PyUnicode_DecodeUTF8Stateful (unicodeobject.c:5292)
    ==196803== by 0x58D296: PyUnicode_FromString (unicodeobject.c:2322)
    ==196803== by 0x5CFFAE: PyUnicode_InternFromString (unicodeobject.c:15650)
    ==196803== by 0x4E8B61: descr_new (descrobject.c:885)
    ==196803== by 0x4E8C9B: PyDescr_NewMethod (descrobject.c:934)
    ==196803== by 0x56694C: type_add_method (typeobject.c:5643)
    ==196803== by 0x566A92: type_add_methods (typeobject.c:5689)
    ==196803== by 0x569166: type_ready_fill_dict (typeobject.c:6165)
    ==196803== by 0x569A02: type_ready (typeobject.c:6421)
    ==196803== by 0x569B59: PyType_Ready (typeobject.c:6457)
    ==196803== by 0x544363: _PyTypes_InitTypes (object.c:1952)
    ==196803== by 0x69D12D: pycore_init_types (pylifecycle.c:704)
    ==196803== by 0x69D8FC: pycore_interp_init (pylifecycle.c:831)
    ==196803== by 0x69DC31: pyinit_config (pylifecycle.c:887)
    ==196803== by 0x69E355: pyinit_core (pylifecycle.c:1050)
    ==196803== by 0x69ED32: Py_InitializeFromConfig (pylifecycle.c:1240)
    ==196803== by 0x6D5346: pymain_init (main.c:67)
    ==196803== by 0x6D64F8: pymain_main (main.c:692)
    ==196803== by 0x6D65A1: Py_BytesMain (main.c:725)
    ==196803== by 0x41D7B5: main (python.c:15)

    Before the commit c0a5ebe, Python didn't leak this memory.

    @gvanrossum
    Copy link
    Member

    That's pretty mysterious. The deep-freeze code isn't on the stack for either of those, but allocation of new unicode string objects is. I'm guessing these are somehow leaked by the interning, but I don't follow yet how.

    @vstinner
    Copy link
    Member

    PyUnicode_InternInPlace() in intern_strings() can convert an immortal string to a regular Python strong reference, whereas _PyStaticCode_Dealloc() doesn't bother with clearing co_names, co_consts and co_localsplusnames.

    @vstinner
    Copy link
    Member

    I tried to hack _PyStaticCode_Dealloc() to free strings, but since immortal objects are half-baken today, calling Py_DECREF() does crash.

    Py_SETREF() should increase _Py_RefTotal if the old object is immortal and he new object is not.

    Maybe this change should be reverted until Eric's PEP-683 "Immortal Objects, Using a Fixed Refcount" is implemented.

    @kumaraditya303
    Copy link
    Contributor Author

    I have created a PR to fix the memory leak, See #31549

    @kumaraditya303
    Copy link
    Contributor Author

    Okay, let's change the error handling. @Kumar, can you handle that?

    How it should be handled? Currently PyUnicode_InternInPlace ignores any errors and does not return it. It would be backwards-incompatible to change that, moreover as I explained in #30683 (comment) intern_strings only check if all the items are strings which will be always true in case of deep-freeze so I don't think anything needs to be changed here. I would be interested to know if I am missing something though.

    @vstinner
    Copy link
    Member

    New changeset 4dc7463 by Kumar Aditya in branch 'main':
    bpo-46430: Fix memory leak in interned strings of deep-frozen modules (GH-31549)
    4dc7463

    @vstinner
    Copy link
    Member

    I wrote #31555 to make sure that Python doesn't leak at Python exit.

    @gvanrossum
    Copy link
    Member

    How it should be handled? Currently PyUnicode_InternInPlace ignores any errors and does not return it. It would be backwards-incompatible to change that, moreover as I explained in #30683 (comment) intern_strings only check if all the items are strings which will be always true in case of deep-freeze so I don't think anything needs to be changed here. I would be interested to know if I am missing something though.

    The other functions you are calling *do* return errors. You should not ignore those. If any errors are reported the caller can decide what to do (e.g. call Py_FatalError().

    @vstinner
    Copy link
    Member

    commit 0d9b565
    Author: Kumar Aditya <59607654+kumaraditya303@users.noreply.github.com>
    Date: Sat Feb 26 22:05:03 2022 +0530

    Propagate errors (however unlikely) from _Py_Deepfreeze_Init() (GH-31596)
    

    @vstinner
    Copy link
    Member

    The other functions you are calling *do* return errors. You should not ignore those. If any errors are reported the caller can decide what to do (e.g. call Py_FatalError().

    PEP-587 introduced PyStatus to Python startup code which let the Py_Initialize() caller to decide how to handle errors ;-) For example, you can open a graphical popup rather than killing the process with SIGABRT (Py_FatalError() behavior) which might be more user friendly :-D Or just log the error.

    @gvanrossum
    Copy link
    Member

    PEP-587 introduced PyStatus to Python startup code which let the Py_Initialize() caller to decide how to handle errors ;-) For example, you can open a graphical popup rather than killing the process with SIGABRT (Py_FatalError() behavior) which might be more user friendly :-D Or just log the error.

    That's up to pycore_interp_init() in pylifecycle.c now. Kumar called _PyStatus_ERR() there, so I think nothing more needs to be done.

    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    3.11 only security fixes type-bug An unexpected behavior, bug, or error
    Projects
    None yet
    Development

    No branches or pull requests

    5 participants