Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

OS X: malloc(): set default diagnostics to DEBUG_WRITE_ON_CRASH #85100

Closed
remilapeyre mannequin opened this issue Jun 9, 2020 · 30 comments
Closed

OS X: malloc(): set default diagnostics to DEBUG_WRITE_ON_CRASH #85100

remilapeyre mannequin opened this issue Jun 9, 2020 · 30 comments
Labels
3.9 only security fixes 3.10 only security fixes 3.11 only security fixes OS-mac tests Tests in the Lib/test dir

Comments

@remilapeyre
Copy link
Mannequin

remilapeyre mannequin commented Jun 9, 2020

BPO 40928
Nosy @ronaldoussoren, @ned-deily, @skrah, @ambv, @miss-islington, @remilapeyre, @jdevries3133
PRs
  • bpo-40928: notify users of osx malloc warnings #26783
  • bpo-40928 osx malloc real fix #26786
  • [3.10] bpo-40928: notify users running test_decimal on macOS of malloc warnings (GH-26783) #27628
  • [3.9] bpo-40928: notify users running test_decimal on macOS of malloc warnings (GH-26783) #27629
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = None
    closed_at = <Date 2021-08-06.17:13:42.011>
    created_at = <Date 2020-06-09.14:23:41.706>
    labels = ['OS-mac', 'tests', '3.9', '3.10', '3.11']
    title = 'OS X: malloc(): set default diagnostics to DEBUG_WRITE_ON_CRASH'
    updated_at = <Date 2021-08-06.17:13:42.010>
    user = 'https://github.com/remilapeyre'

    bugs.python.org fields:

    activity = <Date 2021-08-06.17:13:42.010>
    actor = 'lukasz.langa'
    assignee = 'none'
    closed = True
    closed_date = <Date 2021-08-06.17:13:42.011>
    closer = 'lukasz.langa'
    components = ['macOS', 'Tests']
    creation = <Date 2020-06-09.14:23:41.706>
    creator = 'remi.lapeyre'
    dependencies = []
    files = []
    hgrepos = []
    issue_num = 40928
    keywords = ['patch']
    message_count = 29.0
    messages = ['371107', '371112', '371116', '371118', '371122', '371123', '371142', '371172', '371185', '371186', '371189', '371190', '371191', '371192', '371197', '371198', '371199', '371201', '371203', '371204', '371205', '371209', '371213', '396053', '396066', '399083', '399086', '399106', '399109']
    nosy_count = 7.0
    nosy_names = ['ronaldoussoren', 'ned.deily', 'skrah', 'lukasz.langa', 'miss-islington', 'remi.lapeyre', 'jack__d']
    pr_nums = ['26783', '26786', '27628', '27629']
    priority = 'normal'
    resolution = 'fixed'
    stage = 'resolved'
    status = 'closed'
    superseder = None
    type = None
    url = 'https://bugs.python.org/issue40928'
    versions = ['Python 3.9', 'Python 3.10', 'Python 3.11']

    Linked PRs

    @remilapeyre
    Copy link
    Mannequin Author

    remilapeyre mannequin commented Jun 9, 2020

    Here's the result of "./python -m test test_decimal -m test_maxcontext_exact_arith":

    0:00:00 load avg: 1.33 Run tests sequentially
    0:00:00 load avg: 1.33 [1/1] test_decimal
    python(7835,0x11a218dc0) malloc: can't allocate region
    :*** mach_vm_map(size=842105263157895168, flags: 100) failed (error code=3)
    python(7835,0x11a218dc0) malloc: *** set a breakpoint in malloc_error_break to debug
    python(7835,0x11a218dc0) malloc: can't allocate region
    :*** mach_vm_map(size=842105263157895168, flags: 100) failed (error code=3)
    python(7835,0x11a218dc0) malloc: *** set a breakpoint in malloc_error_break to debug
    python(7835,0x11a218dc0) malloc: can't allocate region
    :*** mach_vm_map(size=421052631578947584, flags: 100) failed (error code=3)
    python(7835,0x11a218dc0) malloc: *** set a breakpoint in malloc_error_break to debug
    python(7835,0x11a218dc0) malloc: can't allocate region
    :*** mach_vm_map(size=421052631578947584, flags: 100) failed (error code=3)
    python(7835,0x11a218dc0) malloc: *** set a breakpoint in malloc_error_break to debug

    == Tests result: SUCCESS ==

    1 test OK.

    Total duration: 553 ms
    Tests result: SUCCESS

    I spent quite a time to find where this error was coming from and it's actually not an error but a dubious message from OSX. Others will surely lose time on this too, I think the best way to handle it is to use bigmemtest() with a small value on MacOS so that it's only run when '-M' is given on this platform, but always on the others.

    @remilapeyre remilapeyre mannequin added 3.10 only security fixes tests Tests in the Lib/test dir labels Jun 9, 2020
    @skrah
    Copy link
    Mannequin

    skrah mannequin commented Jun 9, 2020

    The annoying "error" looks the same as bpo-5614, which was closed as "won't fix".

    Is this the only place in the test suite now? If it is, I'm okay with
    squashing the remark.

    @skrah skrah mannequin added 3.9 only security fixes labels Jun 9, 2020
    @remilapeyre
    Copy link
    Mannequin Author

    remilapeyre mannequin commented Jun 9, 2020

    The annoying "error" looks the same as bpo-5614, which was closed as "won't fix".

    Yes, it's a similar message, test_io does not display it on Catalina anymore thought.

    Is this the only place in the test suite now?

    As far as I know yes.

    If it is, I'm okay with squashing the remark.

    I'm not sure to what this means.

    @skrah
    Copy link
    Mannequin

    skrah mannequin commented Jun 9, 2020

    "squashing the remark" means suppressing the message from malloc().
    I would really like it if the test could be executed by default though.

    Looking at:

    https://developer.apple.com/library/archive/documentation/System/Conceptual/ManPages_iPhoneOS/man3/malloc.3.html

    Could you try what happens if you set this environment variable?

    export MallocLogFile=/dev/null

    If it works, we could build a solution around that.

    .

    @remilapeyre
    Copy link
    Mannequin Author

    remilapeyre mannequin commented Jun 9, 2020

    Thanks, I found "MallocDebugReport" in man malloc (I could have thought to look into there...) that seem to do the trick:

    $ MallocDebugReport=none ./python -m test test_decimal -m test_maxcontext_exact_arith

    0:00:00 load avg: 1.27 Run tests sequentially
    0:00:00 load avg: 1.27 [1/1] test_decimal

    == Tests result: SUCCESS ==

    1 test OK.

    Total duration: 694 ms
    Tests result: SUCCESS

    There is also MallocDebugReport=crash that will "write messages to standard error only for a condition that is about to cause a crash." and effectively suppress this one.

    Setting it in the test with os.putenv() does not seem to work.

    @remilapeyre
    Copy link
    Mannequin Author

    remilapeyre mannequin commented Jun 9, 2020

    I forgot to say that MallocLogFile=/dev/null did not work, that's why I tried MallocDebugReport.

    @skrah
    Copy link
    Mannequin

    skrah mannequin commented Jun 9, 2020

    Thanks for checking, it's a pity that os.putenv() does not work.

    There's a global variable malloc_logger that is set from the
    environment on startup:

    https://opensource.apple.com/source/libmalloc/libmalloc-53.1.1/src/malloc.c

    It seems to be always checked for NULL, but I can't find any
    documentation that says it is guaranteed.

    A nuclear option would be to set malloc_logger to NULL on Python
    startup. Or write a custom malloc_logger, which *does* seem to
    be supported.

    @remilapeyre
    Copy link
    Mannequin Author

    remilapeyre mannequin commented Jun 10, 2020

    Would it be possible to wrap malloc_print_configure() (https://github.com/PureDarwin/libmalloc/blob/e37056265821cd6e014ab911d9fe3b9d0da88e22/src/malloc_printf.c#L59) in a context manager that we put in test.support? or to override https://github.com/PureDarwin/libmalloc/blob/e37056265821cd6e014ab911d9fe3b9d0da88e22/src/malloc_printf.c#L36?

    This would let us change this behaviour locally without masking all other errors.

    @skrah
    Copy link
    Mannequin

    skrah mannequin commented Jun 10, 2020

    I have two observations:

    First, I think that the default DEBUG_WRITE_ALWAYS, which is the cause
    of this issue, is not C standard conforming -- The standard does not
    permit implementation defined diagnostics (global side effects!) on a
    completely normal allocation failure.

    So I would feel justified to make DEBUG_WRITE_ON_CRASH the default
    on Python startup, even if that means that we force that on extension
    modules.

    That would also address annoyances like in msg327446. Speaking of
    which, do you also get that diagnostic in cases like these?

    >>> [0] * 10000000000000000
    Traceback (most recent call last):
      File "<stdin>", line 1, in <module>
    MemoryError

    FreeBSD has a similar reporting mechanism, but the default is not
    DEBUG_WRITE_ALWAYS. I suspect DEBUG_WRITE_ALWAYS is there for the
    convenience of Apple developers dealing with user reports.

    Second, we *could* write such a context manager and it would probably
    be okay *for the test suite*, but in general changing a static variable
    that is not thread safe at runtime in a library should be avoided.

    I don't have OS X at all though. Ronald, what do you think?

    @remilapeyre
    Copy link
    Mannequin Author

    remilapeyre mannequin commented Jun 10, 2020

    I have two observations:

    First, I think that the default DEBUG_WRITE_ALWAYS, which is the cause
    of this issue, is not C standard conforming -- The standard does not
    permit implementation defined diagnostics (global side effects!) on a
    completely normal allocation failure.

    That makes sense

    > So I would feel justified to make DEBUG_WRITE_ON_CRASH the default
    > on Python startup, even if that means that we force that on extension
    > modules.
    > 
    > That would also address annoyances like in msg327446. Speaking of
    > which, do you also get that diagnostic in cases like these?
    > 
    > 
    > >>> [0] * 10000000000000000
    > Traceback (most recent call last):
    >   File "<stdin>", line 1, in <module>
    > MemoryError
    >

    It gives:

    >>> [0] * 10000000000000000
    python3(36633,0x110c08dc0) malloc: can't allocate region
    :*** mach_vm_map(size=80000000000004096, flags: 100) failed (error code=3)
    python3(36633,0x110c08dc0) malloc: *** set a breakpoint in malloc_error_break to debug
    Traceback (most recent call last):
      File "<stdin>", line 1, in <module>
    MemoryError

    Not very nice.

    FreeBSD has a similar reporting mechanism, but the default is not
    DEBUG_WRITE_ALWAYS. I suspect DEBUG_WRITE_ALWAYS is there for the
    convenience of Apple developers dealing with user reports.

    Second, we *could* write such a context manager and it would probably
    be okay *for the test suite*, but in general changing a static variable
    that is not thread safe at runtime in a library should be avoided.

    What about overriding the default to DEBUG_WRITE_ON_CRASH if MallocDebugReport
    is not set? This would be more standard compliant and still let the user
    override it if he whishes so.

    I don't have OS X at all though. Ronald, what do you think?

    @skrah
    Copy link
    Mannequin

    skrah mannequin commented Jun 10, 2020

    What about overriding the default to DEBUG_WRITE_ON_CRASH if MallocDebugReport is not set?

    That sounds like a very good compromise!

    @remilapeyre
    Copy link
    Mannequin Author

    remilapeyre mannequin commented Jun 10, 2020

    Ok, I will try to do this in the coming days. Is adding this kind of super ugly, super specific code to pymain_init() ok?

    @skrah
    Copy link
    Mannequin

    skrah mannequin commented Jun 10, 2020

    I would probably put all that code in a separate function like
    init_darwin_libc(), and call init_darwin_libc() in pymain_init().

    The real fun will start when we figure out which OS X versions
    support this feature. I hope we have a buildbot for each. :-)

    @skrah
    Copy link
    Mannequin

    skrah mannequin commented Jun 10, 2020

    Changing the title to get more input from others.

    @skrah skrah mannequin changed the title test_decimal.CWhitebox.test_maxcontext_exact_arith() shows "malloc: can't allocate region" on MacOS OS X: malloc(): set default diagnostics to DEBUG_WRITE_ON_CRASH Jun 10, 2020
    @skrah skrah mannequin changed the title test_decimal.CWhitebox.test_maxcontext_exact_arith() shows "malloc: can't allocate region" on MacOS OS X: malloc(): set default diagnostics to DEBUG_WRITE_ON_CRASH Jun 10, 2020
    @vstinner
    Copy link
    Member

    Ok, I will try to do this in the coming days. Is adding this kind of super ugly, super specific code to pymain_init() ok?

    Maybe _PyPreConfig_Write() would be a better place.

    There is a very little risk that memory allocation done before _PyPreConfig_Write() (i.e. in _PyPreConfig_Read()) fail with out of memory.

    @vstinner
    Copy link
    Member

    Would it be possible to wrap malloc_print_configure() (https://github.com/PureDarwin/libmalloc/blob/e37056265821cd6e014ab911d9fe3b9d0da88e22/src/malloc_printf.c#L59) in a context manager that we put in test.support? or to override https://github.com/PureDarwin/libmalloc/blob/e37056265821cd6e014ab911d9fe3b9d0da88e22/src/malloc_printf.c#L36?

    I don't see much benefit of having a feature which only works on macOS.

    You can put a hook on memory allocations using PyMem_SetAllocator() to trigger any action you want when any memory allocation done by Python fails.

    But is it really an use case to log any memory allocation failure? Python is very likely to raise a MemoryError exception in this case. It's well defined, no?

    @remilapeyre
    Copy link
    Mannequin Author

    remilapeyre mannequin commented Jun 10, 2020

    But is it really an use case to log any memory allocation failure? Python is very likely to raise a MemoryError exception in this case. It's well defined, no?

    Yes, we are not trying to log the allocation failure, we are just trying to suppress a dubious error message that occurs specifically on MacOS.

    @ronaldoussoren
    Copy link
    Contributor

    I'm not sure if it is worthwhile to tweak CPython to work around this misfeature of libc on macOS (and likely iOS). Is this something that causes issues in real code, or just in the test suite? Looking at this particular test is seems to use a context that is not likely to occur in real code (due to using a lot of memory)

    Note that malloc_print_configure is not a public API, using that can cause problems for folks that embed Python in apps that are shipped in the app store.

    @skrah
    Copy link
    Mannequin

    skrah mannequin commented Jun 10, 2020

    No, it does not cause real issues. Adding this feature would just suppress chatty diagnostics like this one, which are a bit unfriendly:

    >>> [0] * 10000000000000000
    python3(36633,0x110c08dc0) malloc: can't allocate region
    :*** mach_vm_map(size=80000000000004096, flags: 100) failed (error code=3)
    python3(36633,0x110c08dc0) malloc: *** set a breakpoint in malloc_error_break to debug
    Traceback (most recent call last):
      File "<stdin>", line 1, in <module>
    MemoryError

    So yes, users do occasionally see it outside the test suite.

    @skrah
    Copy link
    Mannequin

    skrah mannequin commented Jun 10, 2020

    But if malloc_print_configure() is not a public API, it should probably not be used.

    @skrah
    Copy link
    Mannequin

    skrah mannequin commented Jun 10, 2020

    And indeed, with the new decimal MAX_PREC feature, you should see
    the misguided diagnostic as well on OS X:

    >>> from decimal import *
    >>> 
    >>> c = getcontext()
    >>> c.prec = MAX_PREC
    >>> Decimal(4).sqrt()
    Decimal('2')
    >>> 

    So it is real code, but extremely rarely used.

    @ronaldoussoren
    Copy link
    Contributor

    I've filed a report with Apple about this (FB7731971), even though I expect that this behaviour will not change. That's something we should have done years ago.

    BTW. The link to malloc_print_configure that Victor shared seems to indicate that the default for MallocDebugReport should be "none", instead of "stderr". The code on opensource.apple.com conforms this (https://opensource.apple.com/source/libmalloc/libmalloc-283.60.1/src/malloc_printf.c.auto.html). Apparently Apple uses a slightly different code when building macOS :-(

    Note that this annoying message is only printed when the proces runs out of address space, which requires allocating an insane amount of memory (the first warning line in the initial message of this bug report tries to allocate 872 petabyte).

    If this is something we want to avoid its probably easier to switch to a allocator that won't avoids calling malloc(3) on large enough allocations (for example return NULL for any allocation attempt above 4TB, which is comfortably above the max amount of memory in a Mac Pro). This theoretically limits memory allocations, but in practice the system will be unusable with far smaller allocations (assuming the allocated memory is actually used).

    @skrah
    Copy link
    Mannequin

    skrah mannequin commented Jun 10, 2020

    The first warning line in the initial message of this bug report tries to allocate 872 petabyte.

    Yes, that's for systems like Linux where you never know what overallocation is going to permit and subsequently freeze the system.

    Special casing malloc() for OS X/large allocations sounds like a good plan.

    @jdevries3133
    Copy link
    Mannequin

    jdevries3133 mannequin commented Jun 18, 2021

    From my pr, "This is a bandaid to help people save time debugging this non-bug. I don't think this solution is very strong; I'm just hoping to move the discussion forward on the bpo, and hoping for a hint on how to suppress these warnings all together or come up with some other optimal solution.

    I applied the function in test_decimal because that test consistently generates these malloc warnings on my system."

    I'm still trying to find a true solution. Can anyone help to point me in a better direction?

    @jdevries3133
    Copy link
    Mannequin

    jdevries3133 mannequin commented Jun 18, 2021

    Alternatively, my latest PR implements what @ronaldoussoren suggested: capping OS X memory allocations at 4TB.

    @ambv
    Copy link
    Contributor

    ambv commented Aug 6, 2021

    New changeset 15d3c14 by Jack DeVries in branch 'main':
    bpo-40928: notify users running test_decimal on macOS of malloc warnings (GH-26783)
    15d3c14

    @ambv
    Copy link
    Contributor

    ambv commented Aug 6, 2021

    New changeset 693a661 by Łukasz Langa in branch '3.9':
    [3.9] bpo-40928: notify users running test_decimal on macOS of malloc warnings (GH-26783) (GH-27629)
    693a661

    @miss-islington
    Copy link
    Contributor

    New changeset a5d9963 by Miss Islington (bot) in branch '3.10':
    bpo-40928: notify users running test_decimal on macOS of malloc warnings (GH-26783)
    a5d9963

    @ambv
    Copy link
    Contributor

    ambv commented Aug 6, 2021

    Given that:

    • this is only surfacing in test_decimal on macOS;
    • the real solution to modify malloc has potential performance implications; and
    • silencing the spew isn't possible from within the test module

    I elected to accept Jack's patch to notify macOS users running regression tests of the harmless malloc spew. I don't think there's anything else to do here.

    Thanks all for your effort in investigating this!

    @ambv ambv added the 3.11 only security fixes label Aug 6, 2021
    @ambv ambv closed this as completed Aug 6, 2021
    @ambv ambv added the 3.11 only security fixes label Aug 6, 2021
    @ambv ambv closed this as completed Aug 6, 2021
    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    @skrah
    Copy link
    Contributor

    skrah commented Jan 28, 2024

    the real solution to modify malloc has potential performance implications

    No, it does not. This has been in obmalloc.c for at least 8 years:

        if (size > (size_t)PY_SSIZE_T_MAX)
            return NULL;
    

    PY_SSIZE_T_MAX could be changed to a lower value for MacOS, using a zero cost ifdef.

    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    3.9 only security fixes 3.10 only security fixes 3.11 only security fixes OS-mac tests Tests in the Lib/test dir
    Projects
    None yet
    Development

    No branches or pull requests

    5 participants