classification
Title: OS X: malloc(): set default diagnostics to DEBUG_WRITE_ON_CRASH
Type: Stage: resolved
Components: macOS, Tests Versions: Python 3.11, Python 3.10, Python 3.9
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: Nosy List: jack__d, lukasz.langa, miss-islington, ned.deily, remi.lapeyre, ronaldoussoren, skrah
Priority: normal Keywords: patch

Created on 2020-06-09 14:23 by remi.lapeyre, last changed 2021-08-06 17:13 by lukasz.langa. This issue is now closed.

Pull Requests
URL Status Linked Edit
PR 26783 merged jack__d, 2021-06-18 12:10
PR 26786 closed jack__d, 2021-06-18 13:57
PR 27628 merged miss-islington, 2021-08-06 12:51
PR 27629 merged lukasz.langa, 2021-08-06 13:03
Messages (29)
msg371107 - (view) Author: Rémi Lapeyre (remi.lapeyre) * Date: 2020-06-09 14:23
Here's the result of "./python -m test test_decimal -m test_maxcontext_exact_arith":

0:00:00 load avg: 1.33 Run tests sequentially
0:00:00 load avg: 1.33 [1/1] test_decimal
python(7835,0x11a218dc0) malloc: can't allocate region
:*** mach_vm_map(size=842105263157895168, flags: 100) failed (error code=3)
python(7835,0x11a218dc0) malloc: *** set a breakpoint in malloc_error_break to debug
python(7835,0x11a218dc0) malloc: can't allocate region
:*** mach_vm_map(size=842105263157895168, flags: 100) failed (error code=3)
python(7835,0x11a218dc0) malloc: *** set a breakpoint in malloc_error_break to debug
python(7835,0x11a218dc0) malloc: can't allocate region
:*** mach_vm_map(size=421052631578947584, flags: 100) failed (error code=3)
python(7835,0x11a218dc0) malloc: *** set a breakpoint in malloc_error_break to debug
python(7835,0x11a218dc0) malloc: can't allocate region
:*** mach_vm_map(size=421052631578947584, flags: 100) failed (error code=3)
python(7835,0x11a218dc0) malloc: *** set a breakpoint in malloc_error_break to debug

== Tests result: SUCCESS ==

1 test OK.

Total duration: 553 ms
Tests result: SUCCESS




I spent quite a time to find where this error was coming from and it's actually not an error but a dubious message from OSX. Others will surely lose time on this too, I think the best way to handle it is to use bigmemtest() with a small value on MacOS so that it's only run when '-M' is given on this platform, but always on the others.
msg371112 - (view) Author: Stefan Krah (skrah) * (Python committer) Date: 2020-06-09 14:50
The annoying "error" looks the same as #5614, which was closed as "won't fix".

Is this the only place in the test suite now? If it is, I'm okay with
squashing the remark.
msg371116 - (view) Author: Rémi Lapeyre (remi.lapeyre) * Date: 2020-06-09 15:40
> The annoying "error" looks the same as #5614, which was closed as "won't fix".

Yes, it's a similar message, test_io does not display it on Catalina anymore thought.

> Is this the only place in the test suite now? 

As far as I know yes.


> If it is, I'm okay with squashing the remark.

I'm not sure to what this means.
msg371118 - (view) Author: Stefan Krah (skrah) * (Python committer) Date: 2020-06-09 16:10
"squashing the remark" means suppressing the message from malloc().
I would really like it if the test could be executed by default though.


Looking at:

https://developer.apple.com/library/archive/documentation/System/Conceptual/ManPages_iPhoneOS/man3/malloc.3.html


Could you try what happens if you set this environment variable?

   export MallocLogFile=/dev/null


If it works, we could build a solution around that.






.
msg371122 - (view) Author: Rémi Lapeyre (remi.lapeyre) * Date: 2020-06-09 16:42
Thanks, I found "MallocDebugReport" in `man malloc` (I could have thought to look into there...) that seem to do the trick:

$ MallocDebugReport=none ./python -m test test_decimal -m test_maxcontext_exact_arith

0:00:00 load avg: 1.27 Run tests sequentially
0:00:00 load avg: 1.27 [1/1] test_decimal

== Tests result: SUCCESS ==

1 test OK.

Total duration: 694 ms
Tests result: SUCCESS


There is also MallocDebugReport=crash that will "write messages to standard error only for a condition that is about to cause a crash." and effectively suppress this one.


Setting it in the test with os.putenv() does not seem to work.
msg371123 - (view) Author: Rémi Lapeyre (remi.lapeyre) * Date: 2020-06-09 16:44
I forgot to say that MallocLogFile=/dev/null did not work, that's why I tried MallocDebugReport.
msg371142 - (view) Author: Stefan Krah (skrah) * (Python committer) Date: 2020-06-09 21:20
Thanks for checking, it's a pity that os.putenv() does not work.

There's a global variable malloc_logger that is set from the
environment on startup:

https://opensource.apple.com/source/libmalloc/libmalloc-53.1.1/src/malloc.c


It seems to be always checked for NULL, but I can't find any
documentation that says it is guaranteed.

A nuclear option would be to set malloc_logger to NULL on Python
startup.  Or write a custom malloc_logger, which *does* seem to
be supported.
msg371172 - (view) Author: Rémi Lapeyre (remi.lapeyre) * Date: 2020-06-10 08:21
Would it be possible to wrap malloc_print_configure() (https://github.com/PureDarwin/libmalloc/blob/e37056265821cd6e014ab911d9fe3b9d0da88e22/src/malloc_printf.c#L59) in a context manager that we put in test.support? or to override https://github.com/PureDarwin/libmalloc/blob/e37056265821cd6e014ab911d9fe3b9d0da88e22/src/malloc_printf.c#L36?


This would let us change this behaviour locally without masking all other errors.
msg371185 - (view) Author: Stefan Krah (skrah) * (Python committer) Date: 2020-06-10 12:14
I have two observations:

First, I think that the default DEBUG_WRITE_ALWAYS, which is the cause
of this issue, is not C standard conforming -- The standard does not
permit implementation defined diagnostics (global side effects!) on a
completely normal allocation failure.

So I would feel justified to make DEBUG_WRITE_ON_CRASH the default
on Python startup, even if that means that we force that on extension
modules.

That would also address annoyances like in msg327446. Speaking of
which, do you also get that diagnostic in cases like these?


>>> [0] * 10000000000000000
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
MemoryError


FreeBSD has a similar reporting mechanism, but the default is not
DEBUG_WRITE_ALWAYS.  I suspect DEBUG_WRITE_ALWAYS is there for the
convenience of Apple developers dealing with user reports.


Second, we *could* write such a context manager and it would probably
be okay *for the test suite*, but in general changing a static variable
that is not thread safe at runtime in a library should be avoided.


I don't have OS X at all though. Ronald, what do you think?
msg371186 - (view) Author: Rémi Lapeyre (remi.lapeyre) * Date: 2020-06-10 12:23
> I have two observations:
> 
> First, I think that the default DEBUG_WRITE_ALWAYS, which is the cause
> of this issue, is not C standard conforming -- The standard does not
> permit implementation defined diagnostics (global side effects!) on a
> completely normal allocation failure.

That makes sense

> So I would feel justified to make DEBUG_WRITE_ON_CRASH the default
> on Python startup, even if that means that we force that on extension
> modules.
> 
> That would also address annoyances like in msg327446. Speaking of
> which, do you also get that diagnostic in cases like these?
> 
> 
> >>> [0] * 10000000000000000
> Traceback (most recent call last):
>   File "<stdin>", line 1, in <module>
> MemoryError
>

It gives:

>>> [0] * 10000000000000000
python3(36633,0x110c08dc0) malloc: can't allocate region
:*** mach_vm_map(size=80000000000004096, flags: 100) failed (error code=3)
python3(36633,0x110c08dc0) malloc: *** set a breakpoint in malloc_error_break to debug
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
MemoryError

Not very nice.

> 
> FreeBSD has a similar reporting mechanism, but the default is not
> DEBUG_WRITE_ALWAYS.  I suspect DEBUG_WRITE_ALWAYS is there for the
> convenience of Apple developers dealing with user reports.
> 
> 
> Second, we *could* write such a context manager and it would probably
> be okay *for the test suite*, but in general changing a static variable
> that is not thread safe at runtime in a library should be avoided.
> 

What about overriding the default to DEBUG_WRITE_ON_CRASH if MallocDebugReport
is not set? This would be more standard compliant and still let the user 
override it if he whishes so.

> 
> I don't have OS X at all though. Ronald, what do you think?
msg371189 - (view) Author: Stefan Krah (skrah) * (Python committer) Date: 2020-06-10 12:34
> What about overriding the default to DEBUG_WRITE_ON_CRASH if MallocDebugReport is not set?


That sounds like a very good compromise!
msg371190 - (view) Author: Rémi Lapeyre (remi.lapeyre) * Date: 2020-06-10 12:37
Ok, I will try to do this in the coming days. Is adding this kind of super ugly, super specific code to pymain_init() ok?
msg371191 - (view) Author: Stefan Krah (skrah) * (Python committer) Date: 2020-06-10 13:02
I would probably put all that code in a separate function like
init_darwin_libc(), and call init_darwin_libc() in pymain_init().

The real fun will start when we figure out which OS X versions
support this feature. I hope we have a buildbot for each. :-)
msg371192 - (view) Author: Stefan Krah (skrah) * (Python committer) Date: 2020-06-10 13:14
Changing the title to get more input from others.
msg371197 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2020-06-10 13:36
> Ok, I will try to do this in the coming days. Is adding this kind of super ugly, super specific code to pymain_init() ok?

Maybe _PyPreConfig_Write() would be a better place.

There is a very little risk that memory allocation done before _PyPreConfig_Write() (i.e. in _PyPreConfig_Read()) fail with out of memory.
msg371198 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2020-06-10 13:40
> Would it be possible to wrap malloc_print_configure() (https://github.com/PureDarwin/libmalloc/blob/e37056265821cd6e014ab911d9fe3b9d0da88e22/src/malloc_printf.c#L59) in a context manager that we put in test.support? or to override https://github.com/PureDarwin/libmalloc/blob/e37056265821cd6e014ab911d9fe3b9d0da88e22/src/malloc_printf.c#L36?

I don't see much benefit of having a feature which only works on macOS.

You can put a hook on memory allocations using PyMem_SetAllocator() to trigger any action you want when any memory allocation done by Python fails.

But is it really an use case to log any memory allocation failure? Python is very likely to raise a MemoryError exception in this case. It's well defined, no?
msg371199 - (view) Author: Rémi Lapeyre (remi.lapeyre) * Date: 2020-06-10 13:51
> But is it really an use case to log any memory allocation failure? Python is very likely to raise a MemoryError exception in this case. It's well defined, no?

Yes, we are not trying to log the allocation failure, we are just trying to suppress a dubious error message that occurs specifically on MacOS.
msg371201 - (view) Author: Ronald Oussoren (ronaldoussoren) * (Python committer) Date: 2020-06-10 13:55
I'm not sure if it is worthwhile to tweak CPython to work around this misfeature of libc on macOS (and likely iOS).  Is this something that causes issues in real code, or just in the test suite?  Looking at this particular test is seems to use a context that is not likely to occur in real code (due to using a lot of memory)


Note that malloc_print_configure is not a public API, using that can cause problems for folks that embed Python in apps that are shipped in the app store.
msg371203 - (view) Author: Stefan Krah (skrah) * (Python committer) Date: 2020-06-10 14:00
No, it does not cause real issues. Adding this feature would just suppress chatty diagnostics like this one, which are a bit unfriendly: 

>>> [0] * 10000000000000000
python3(36633,0x110c08dc0) malloc: can't allocate region
:*** mach_vm_map(size=80000000000004096, flags: 100) failed (error code=3)
python3(36633,0x110c08dc0) malloc: *** set a breakpoint in malloc_error_break to debug
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
MemoryError


So yes, users do occasionally see it outside the test suite.
msg371204 - (view) Author: Stefan Krah (skrah) * (Python committer) Date: 2020-06-10 14:03
But if malloc_print_configure() is not a public API, it should probably not be used.
msg371205 - (view) Author: Stefan Krah (skrah) * (Python committer) Date: 2020-06-10 14:15
And indeed, with the new decimal MAX_PREC feature, you should see
the misguided diagnostic as well on OS X:

>>> from decimal import *
>>> 
>>> c = getcontext()
>>> c.prec = MAX_PREC
>>> Decimal(4).sqrt()
Decimal('2')
>>> 


So it is real code, but extremely rarely used.
msg371209 - (view) Author: Ronald Oussoren (ronaldoussoren) * (Python committer) Date: 2020-06-10 15:13
I've filed a report with Apple about this (FB7731971), even though I expect that this behaviour will not change. That's something we should have done years ago.

BTW. The link to malloc_print_configure that Victor shared seems to indicate that the default for MallocDebugReport should be "none", instead of "stderr". The code on opensource.apple.com conforms this (https://opensource.apple.com/source/libmalloc/libmalloc-283.60.1/src/malloc_printf.c.auto.html).  Apparently Apple uses a slightly different code when building macOS :-(

Note that this annoying message is only printed when the proces runs out of address space, which requires allocating an insane amount of memory (the first warning line in the initial message of this bug report tries to allocate 872 petabyte).  

If this is something we want to avoid its probably easier to switch to a allocator that won't avoids calling malloc(3) on large enough allocations (for example return NULL for any allocation attempt above 4TB, which is comfortably above the max amount of memory in a Mac Pro).  This theoretically limits memory allocations, but in practice the system will be unusable with far smaller allocations (assuming the allocated memory is actually used).
msg371213 - (view) Author: Stefan Krah (skrah) * (Python committer) Date: 2020-06-10 16:03
> The first warning line in the initial message of this bug report tries to allocate 872 petabyte.

Yes, that's for systems like Linux where you never know what overallocation is going to permit and subsequently freeze the system.


Special casing malloc() for OS X/large allocations sounds like a good plan.
msg396053 - (view) Author: Jack DeVries (jack__d) * Date: 2021-06-18 12:15
From my pr, "This is a bandaid to help people save time debugging this non-bug. I don't think this solution is very strong; I'm just hoping to move the discussion forward on the bpo, and hoping for a hint on how to suppress these warnings all together or come up with some other optimal solution.

I applied the function in test_decimal because that test consistently generates these malloc warnings on my system."

I'm still trying to find a true solution. Can anyone help to point me in a better direction?
msg396066 - (view) Author: Jack DeVries (jack__d) * Date: 2021-06-18 13:59
Alternatively, my latest PR implements what @ronaldoussoren suggested: capping OS X memory allocations at 4TB.
msg399083 - (view) Author: Łukasz Langa (lukasz.langa) * (Python committer) Date: 2021-08-06 12:50
New changeset 15d3c14df32a35ac69898a7852115722e30d7857 by Jack DeVries in branch 'main':
bpo-40928: notify users running test_decimal on macOS of malloc warnings (GH-26783)
https://github.com/python/cpython/commit/15d3c14df32a35ac69898a7852115722e30d7857
msg399086 - (view) Author: Łukasz Langa (lukasz.langa) * (Python committer) Date: 2021-08-06 13:35
New changeset 693a661478036dc2c5b48e138dd09320a972c25a by Łukasz Langa in branch '3.9':
[3.9] bpo-40928: notify users running test_decimal on macOS of malloc warnings (GH-26783) (GH-27629)
https://github.com/python/cpython/commit/693a661478036dc2c5b48e138dd09320a972c25a
msg399106 - (view) Author: miss-islington (miss-islington) Date: 2021-08-06 17:09
New changeset a5d99632766b458b42f327e8bd0f82b0345c9a63 by Miss Islington (bot) in branch '3.10':
bpo-40928: notify users running test_decimal on macOS of malloc warnings (GH-26783)
https://github.com/python/cpython/commit/a5d99632766b458b42f327e8bd0f82b0345c9a63
msg399109 - (view) Author: Łukasz Langa (lukasz.langa) * (Python committer) Date: 2021-08-06 17:13
Given that:
- this is only surfacing in test_decimal on macOS;
- the real solution to modify malloc has potential performance implications; and
- silencing the spew isn't possible from within the test module

I elected to accept Jack's patch to notify macOS users running regression tests of the harmless malloc spew. I don't think there's anything else to do here.

Thanks all for your effort in investigating this!
History
Date User Action Args
2021-08-06 17:13:42lukasz.langasetstatus: open -> closed
versions: + Python 3.11
messages: + msg399109

resolution: fixed
stage: patch review -> resolved
2021-08-06 17:09:10miss-islingtonsetmessages: + msg399106
2021-08-06 13:35:45lukasz.langasetmessages: + msg399086
2021-08-06 13:03:46lukasz.langasetpull_requests: + pull_request26123
2021-08-06 12:51:15miss-islingtonsetnosy: + miss-islington
pull_requests: + pull_request26122
2021-08-06 12:50:59lukasz.langasetnosy: + lukasz.langa
messages: + msg399083
2021-06-18 13:59:18jack__dsetmessages: + msg396066
2021-06-18 13:57:46jack__dsetpull_requests: + pull_request25371
2021-06-18 12:15:49jack__dsetmessages: + msg396053
2021-06-18 12:10:33jack__dsetkeywords: + patch
nosy: + jack__d

pull_requests: + pull_request25368
stage: patch review
2020-11-16 21:18:01vstinnersetnosy: - vstinner
2020-11-16 20:06:53ronaldoussorensetnosy: + ned.deily
components: + macOS
2020-11-07 11:38:01ronaldoussorenlinkissue41126 superseder
2020-06-10 16:03:53skrahsetmessages: + msg371213
2020-06-10 15:13:47ronaldoussorensetmessages: + msg371209
2020-06-10 14:15:07skrahsetmessages: + msg371205
2020-06-10 14:03:14skrahsetmessages: + msg371204
2020-06-10 14:00:25skrahsetmessages: + msg371203
2020-06-10 13:55:58ronaldoussorensetmessages: + msg371201
2020-06-10 13:51:22remi.lapeyresetmessages: + msg371199
2020-06-10 13:40:53vstinnersetmessages: + msg371198
2020-06-10 13:36:44vstinnersetnosy: + vstinner
messages: + msg371197
2020-06-10 13:14:17skrahsetmessages: + msg371192
title: test_decimal.CWhitebox.test_maxcontext_exact_arith() shows "malloc: can't allocate region" on MacOS -> OS X: malloc(): set default diagnostics to DEBUG_WRITE_ON_CRASH
2020-06-10 13:02:40skrahsetmessages: + msg371191
2020-06-10 12:37:57remi.lapeyresetmessages: + msg371190
2020-06-10 12:34:44skrahsetmessages: + msg371189
2020-06-10 12:23:48remi.lapeyresetmessages: + msg371186
2020-06-10 12:14:59skrahsetmessages: + msg371185
2020-06-10 08:21:53remi.lapeyresetmessages: + msg371172
2020-06-09 21:20:05skrahsetmessages: + msg371142
2020-06-09 16:44:10remi.lapeyresetmessages: + msg371123
2020-06-09 16:42:10remi.lapeyresetmessages: + msg371122
2020-06-09 16:22:17ned.deilysetnosy: + ronaldoussoren
2020-06-09 16:10:24skrahsetmessages: + msg371118
2020-06-09 15:40:10remi.lapeyresetmessages: + msg371116
2020-06-09 14:50:52skrahsetnosy: + skrah

messages: + msg371112
versions: + Python 3.9
2020-06-09 14:23:41remi.lapeyrecreate