This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: Add mimalloc memory allocator
Type: enhancement Stage: patch review
Components: Interpreter Core Versions: Python 3.11
process
Status: open Resolution:
Dependencies: Superseder:
Assigned To: Nosy List: christian.heimes, corona10, erlendaasland, h-vetinari, nascheme
Priority: normal Keywords: patch

Created on 2022-02-06 14:49 by christian.heimes, last changed 2022-04-11 14:59 by admin.

Pull Requests
URL Status Linked Edit
PR 31164 open christian.heimes, 2022-02-06 14:52
Messages (12)
msg412639 - (view) Author: Christian Heimes (christian.heimes) * (Python committer) Date: 2022-02-06 14:49
From https://github.com/microsoft/mimalloc

> mimalloc (pronounced "me-malloc") is a general purpose allocator with excellent performance characteristics. Initially developed by Daan Leijen for the run-time systems of the Koka and Lean languages.

mimalloc has several interesting properties that make it useful for CPython. Amongst other it is fast, thread-safe, and NUMA-aware. It has built-in free lists with multi-sharding and allocation heaps. While Python's obmalloc requires the GIL to protect its data structures, mimalloc uses mostly thread-local and atomic instructions (compare-and-swap) for efficiency. Sam Gross' nogil relies on mimalloc's thread safety and uses first-class heaps for heap walking GC.

mimalloc works on majority of platforms and CPU architectures. However it requires a compiler with C11 atomics support. CentOS 7's default GCC is slightly too old, more recent GCC from Developer Toolset is required. 

For 3.11 I plan to integrate mimalloc as an optional drop-in replacement for obmalloc. Users will be able to compile CPython without mimalloc or disable mimalloc with PYTHONMALLOC env var. Since mimalloc will be optional in 3.11, Python won't depend or expose on any of the advanced features yet. The approach enables the community to test and give feedback with minimal risk of breakage.

mimalloc sources will vendored without any option to use system libraries. Python's mimalloc requires several non-standard compile-time flags. In the future Python may extend or modify mimalloc for heap walking and nogil, too.

(This is a tracking bug until I find time to finish a PEP.)
msg412641 - (view) Author: Dong-hee Na (corona10) * (Python committer) Date: 2022-02-06 14:59
I add Neil to the nosy list since he is one of the kick-off members with this amazing works :)
msg412645 - (view) Author: Christian Heimes (christian.heimes) * (Python committer) Date: 2022-02-06 16:12
New features:

- vendored mimalloc 2.0.3 + two patches from mimalloc dev branch. Mimalloc is embedded in obmalloc.o. Symbols are either hidden or names are mangled to have a _Py_ prefix.
- ./configure --with[out]-mimalloc (default: yes), fails if atomics are not available.
- PYTHONMALLOC=mimalloc, PYTHONMALLOC=mimalloc-debug env var settings
- PYMEM_ALLOCATOR_MIMALLOC, PYMEM_ALLOCATOR_MIMALLOC_DEBUG
- sys.debugmallocstats() and _PyObject_DebugMallocStats() prints mimalloc stats
- sys._malloc_info struct, contains information about available and current allocator
msg412654 - (view) Author: Christian Heimes (christian.heimes) * (Python committer) Date: 2022-02-06 18:58
Buildbots "PPC64 Fedora PR" and all RHEL 7 build bots provided by David Edelsohn are failing because compiler is missing support for stdatomic.h.
msg412669 - (view) Author: Neil Schemenauer (nascheme) * (Python committer) Date: 2022-02-06 21:16
Thanks, I'm indeed interested.  Most credit goes to Christian for advancing this.

For the missing stdatomic.h, would it be appropriate to have an autoconfig check for it?  Can just disable mimalloc if it doesn't exist.
msg412679 - (view) Author: Christian Heimes (christian.heimes) * (Python committer) Date: 2022-02-06 23:08
We have an autoconf check for stdatomic.h. The test even verifies that a program with atomic_load_explicit() compiles and links.

How do we want to use mimalloc in the future? Is it going to stay optional in 3.12? Then the default setting for --with-mimalloc should depend on presence of stdatomic.h. Do we want to make it mandatory for GC heap walking and nogil? Then --with-mimalloc should default to "yes" and configure should abort when stdatomic.h is missing.

I'm leaning towards --with-mimalloc=yes. It will make users aware that they need a compiler with atomics:

configure: error: --with-mimalloc requires stdatomic.h. Update your compiler or rebuild with --without-mimalloc. Python 3.12 will require stdatomic.
msg412741 - (view) Author: Christian Heimes (christian.heimes) * (Python committer) Date: 2022-02-07 13:30
ICC might be a problem. Apparently some version have an incomplete stdatomic.h, see bpo-37415.
msg412749 - (view) Author: Christian Heimes (christian.heimes) * (Python committer) Date: 2022-02-07 14:24
References:

- C11 stdatomic.h https://en.cppreference.com/w/c/atomic
- mimalloc-atomic.h https://github.com/microsoft/mimalloc/blob/master/include/mimalloc-atomic.h
- MSVC Interlocked Variable Access https://docs.microsoft.com/de-de/windows/win32/sync/interlocked-variable-access
msg412796 - (view) Author: Neil Schemenauer (nascheme) * (Python committer) Date: 2022-02-07 22:35
My preference would be for --with-mimalloc=yes in an upcoming release. For platforms without the required stdatomic.h stuff, they can manually specify --with-mimalloc=no.  That will make them aware that a future release of Python might no longer build (if mimalloc is no longer optional).

A soft-landing for merging nogil is not a good enough reason to merge mimalloc, IMHO.  nogil may never be merged.  There should be some concrete and immediate advantage to switch to mimalloc.  The idea of using the "heap walking" to improve is cyclic GC is not concrete enough.  It's just an idea at this point.

I think the (small) performance win could be enough of a reason to merge.  This seems to be the most recent benchmark:

https://gist.github.com/pablogsal/8027937b71cd30f17aaaa5ef7c885d3e

There is also the long-term maintenance issue.  So far, mimalloc upstream has been responsive.  The mimalloc code is not so huge or complicated that we couldn't maintain it (if for some reason it gets abandoned upstream).  However, I think we would prefer to maintain obmalloc rather than mimalloc, all else being equal.  Abandonment by the upstream seems fairly unlikely.  So, I'm not too concerned about maintenance.
msg412835 - (view) Author: Christian Heimes (christian.heimes) * (Python committer) Date: 2022-02-08 13:04
New benchmark:

| Benchmark               | 2022-02-08_11-54-master-69e10976b2e7 | 2022-02-08_11-57-master-d6f5f010b586 |
|-------------------------|:------------------------------------:|:------------------------------------:|
| mako                    | 8.85 ms                              | 7.83 ms: 1.13x faster                |
| hexiom                  | 6.04 ms                              | 5.54 ms: 1.09x faster                |
| spectral_norm           | 81.4 ms                              | 75.2 ms: 1.08x faster                |
| pyflate                 | 380 ms                               | 352 ms: 1.08x faster                 |
| scimark_sparse_mat_mult | 4.05 ms                              | 3.76 ms: 1.08x faster                |
| pickle_pure_python      | 312 us                               | 290 us: 1.07x faster                 |
| unpickle_pure_python    | 238 us                               | 222 us: 1.07x faster                 |
| float                   | 63.1 ms                              | 59.5 ms: 1.06x faster                |
| tornado_http            | 90.3 ms                              | 86.0 ms: 1.05x faster                |
| html5lib                | 62.8 ms                              | 60.2 ms: 1.04x faster                |
| regex_compile           | 121 ms                               | 116 ms: 1.04x faster                 |
| scimark_lu              | 106 ms                               | 102 ms: 1.04x faster                 |
| nqueens                 | 70.9 ms                              | 68.4 ms: 1.04x faster                |
| crypto_pyaes            | 70.1 ms                              | 67.8 ms: 1.03x faster                |
| logging_silent          | 97.5 ns                              | 94.4 ns: 1.03x faster                |
| sympy_integrate         | 17.2 ms                              | 16.7 ms: 1.03x faster                |
| sympy_str               | 260 ms                               | 252 ms: 1.03x faster                 |
| sympy_expand            | 441 ms                               | 427 ms: 1.03x faster                 |
| pathlib                 | 14.1 ms                              | 13.7 ms: 1.03x faster                |
| regex_dna               | 164 ms                               | 159 ms: 1.03x faster                 |
| regex_v8                | 21.1 ms                              | 20.6 ms: 1.02x faster                |
| sympy_sum               | 138 ms                               | 136 ms: 1.02x faster                 |
| scimark_fft             | 286 ms                               | 281 ms: 1.02x faster                 |
| pickle                  | 9.34 us                              | 9.19 us: 1.02x faster                |
| xml_etree_parse         | 126 ms                               | 124 ms: 1.01x faster                 |
| richards                | 43.0 ms                              | 42.4 ms: 1.01x faster                |
| xml_etree_generate      | 71.2 ms                              | 70.5 ms: 1.01x faster                |
| scimark_monte_carlo     | 58.8 ms                              | 58.3 ms: 1.01x faster                |
| deltablue               | 3.60 ms                              | 3.58 ms: 1.01x faster                |
| chaos                   | 64.6 ms                              | 64.3 ms: 1.01x faster                |
| 2to3                    | 216 ms                               | 215 ms: 1.00x faster                 |
| pidigits                | 155 ms                               | 154 ms: 1.00x faster                 |
| nbody                   | 76.4 ms                              | 77.0 ms: 1.01x slower                |
| python_startup_no_site  | 3.96 ms                              | 3.99 ms: 1.01x slower                |
| xml_etree_iterparse     | 82.5 ms                              | 83.1 ms: 1.01x slower                |
| scimark_sor             | 103 ms                               | 104 ms: 1.01x slower                 |
| unpickle                | 11.3 us                              | 11.4 us: 1.01x slower                |
| telco                   | 5.53 ms                              | 5.58 ms: 1.01x slower                |
| python_startup          | 5.56 ms                              | 5.62 ms: 1.01x slower                |
| json_loads              | 20.6 us                              | 20.8 us: 1.01x slower                |
| json_dumps              | 9.61 ms                              | 9.77 ms: 1.02x slower                |
| dulwich_log             | 60.9 ms                              | 62.1 ms: 1.02x slower                |
| logging_format          | 5.47 us                              | 5.62 us: 1.03x slower                |
| pickle_list             | 3.06 us                              | 3.15 us: 1.03x slower                |
| django_template         | 30.2 ms                              | 31.2 ms: 1.03x slower                |
| meteor_contest          | 80.7 ms                              | 84.1 ms: 1.04x slower                |
| pickle_dict             | 21.9 us                              | 23.4 us: 1.07x slower                |
| logging_simple          | 4.84 us                              | 5.20 us: 1.07x slower                |
| Geometric mean          | (ref)                                | 1.01x faster                         |

Benchmark hidden because not significant (9): unpack_sequence, go, raytrace, chameleon, xml_etree_process, fannkuch, sqlite_synth, regex_effbot, unpickle_list
msg412867 - (view) Author: Christian Heimes (christian.heimes) * (Python committer) Date: 2022-02-08 20:36
I re-ran the benchmark of d6f5f010b586:

| Benchmark               | 2022-02-08_11-54-master-69e10976b2e7 | 2022-02-08_11-57-master-d6f5f010b586 |
|-------------------------|:------------------------------------:|:------------------------------------:|
| pickle_pure_python      | 312 us                               | 281 us: 1.11x faster                 |
| unpickle_pure_python    | 238 us                               | 216 us: 1.10x faster                 |
| pyflate                 | 380 ms                               | 349 ms: 1.09x faster                 |
| hexiom                  | 6.04 ms                              | 5.55 ms: 1.09x faster                |
| logging_silent          | 97.5 ns                              | 89.8 ns: 1.09x faster                |
| float                   | 63.1 ms                              | 59.3 ms: 1.07x faster                |
| html5lib                | 62.8 ms                              | 59.1 ms: 1.06x faster                |
| crypto_pyaes            | 70.1 ms                              | 66.1 ms: 1.06x faster                |
| json_loads              | 20.6 us                              | 19.4 us: 1.06x faster                |
| tornado_http            | 90.3 ms                              | 86.1 ms: 1.05x faster                |
| mako                    | 8.85 ms                              | 8.45 ms: 1.05x faster                |
| richards                | 43.0 ms                              | 41.1 ms: 1.05x faster                |
| xml_etree_parse         | 126 ms                               | 120 ms: 1.05x faster                 |
| logging_format          | 5.47 us                              | 5.25 us: 1.04x faster                |
| sympy_integrate         | 17.2 ms                              | 16.5 ms: 1.04x faster                |
| sympy_str               | 260 ms                               | 251 ms: 1.04x faster                 |
| fannkuch                | 325 ms                               | 314 ms: 1.04x faster                 |
| regex_v8                | 21.1 ms                              | 20.4 ms: 1.04x faster                |
| sympy_expand            | 441 ms                               | 425 ms: 1.04x faster                 |
| regex_compile           | 121 ms                               | 117 ms: 1.03x faster                 |
| sympy_sum               | 138 ms                               | 134 ms: 1.03x faster                 |
| scimark_lu              | 106 ms                               | 103 ms: 1.03x faster                 |
| go                      | 128 ms                               | 125 ms: 1.03x faster                 |
| pathlib                 | 14.1 ms                              | 13.7 ms: 1.02x faster                |
| scimark_monte_carlo     | 58.8 ms                              | 57.9 ms: 1.02x faster                |
| nqueens                 | 70.9 ms                              | 69.9 ms: 1.02x faster                |
| pidigits                | 155 ms                               | 153 ms: 1.01x faster                 |
| pickle                  | 9.34 us                              | 9.22 us: 1.01x faster                |
| raytrace                | 278 ms                               | 275 ms: 1.01x faster                 |
| 2to3                    | 216 ms                               | 213 ms: 1.01x faster                 |
| deltablue               | 3.60 ms                              | 3.56 ms: 1.01x faster                |
| logging_simple          | 4.84 us                              | 4.78 us: 1.01x faster                |
| xml_etree_iterparse     | 82.5 ms                              | 81.7 ms: 1.01x faster                |
| regex_dna               | 164 ms                               | 162 ms: 1.01x faster                 |
| unpack_sequence         | 32.7 ns                              | 32.4 ns: 1.01x faster                |
| telco                   | 5.53 ms                              | 5.48 ms: 1.01x faster                |
| python_startup          | 5.56 ms                              | 5.58 ms: 1.00x slower                |
| xml_etree_generate      | 71.2 ms                              | 71.6 ms: 1.01x slower                |
| unpickle_list           | 4.08 us                              | 4.12 us: 1.01x slower                |
| chameleon               | 6.07 ms                              | 6.14 ms: 1.01x slower                |
| chaos                   | 64.6 ms                              | 65.3 ms: 1.01x slower                |
| json_dumps              | 9.61 ms                              | 9.75 ms: 1.01x slower                |
| xml_etree_process       | 49.9 ms                              | 50.7 ms: 1.01x slower                |
| meteor_contest          | 80.7 ms                              | 82.0 ms: 1.02x slower                |
| scimark_sparse_mat_mult | 4.05 ms                              | 4.12 ms: 1.02x slower                |
| unpickle                | 11.3 us                              | 11.5 us: 1.02x slower                |
| django_template         | 30.2 ms                              | 31.0 ms: 1.02x slower                |
| scimark_sor             | 103 ms                               | 106 ms: 1.02x slower                 |
| spectral_norm           | 81.4 ms                              | 84.9 ms: 1.04x slower                |
| pickle_dict             | 21.9 us                              | 23.5 us: 1.08x slower                |
| Geometric mean          | (ref)                                | 1.02x faster                         |

Benchmark hidden because not significant (7): scimark_fft, dulwich_log, python_startup_no_site, regex_effbot, sqlite_synth, nbody, pickle_list
msg412986 - (view) Author: Christian Heimes (christian.heimes) * (Python committer) Date: 2022-02-10 09:27
ICC 2021 has full support for stdatomic.h and compiles mimalloc just fine:

$ CC="icc" ./configure -C --with-pydebug
$ make
$ ./python
Python 3.11.0a5+ (main, Feb  9 2022, 15:57:40) [GCC Intel(R) C++ gcc 7.5 mode] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import sys
>>> sys._malloc_info
sys._malloc_info(allocator='mimalloc_debug', with_pymalloc=True, with_mimalloc=True, mimalloc_secure=4, mimalloc_debug=2)


AIX xlc is still a problem. It does not support C11 stdatomic.h. But it comes with older GCC atomic memory access __sync function family, https://www.ibm.com/docs/en/xl-c-and-cpp-aix/13.1.3?topic=cbif-gcc-atomic-memory-access-built-in-functions-extension . It might be possible to re-implement mimalloc's atomics with __sync functions (e.g. https://gist.github.com/nhatminhle/5181506). The implementation would be less efficient, though. The __sync functions don't have memory order, atomic_load_explicit(v) becomes __sync_fetch_and_add(v, 0), and atomic_store_explicit() requires two full memory barriers.
History
Date User Action Args
2022-04-11 14:59:55adminsetgithub: 90815
2022-03-24 02:10:23h-vetinarisetnosy: + h-vetinari
2022-02-10 09:27:44christian.heimessetmessages: + msg412986
2022-02-08 20:36:37christian.heimessetmessages: + msg412867
2022-02-08 13:04:17christian.heimessetmessages: + msg412835
2022-02-07 22:35:15naschemesetmessages: + msg412796
2022-02-07 14:24:45christian.heimessetmessages: + msg412749
2022-02-07 13:30:04christian.heimessetmessages: + msg412741
2022-02-07 08:41:30erlendaaslandsetnosy: + erlendaasland
2022-02-06 23:08:03christian.heimessetmessages: + msg412679
2022-02-06 21:16:36naschemesetmessages: + msg412669
2022-02-06 18:58:56christian.heimessetmessages: + msg412654
2022-02-06 16:12:39christian.heimessetmessages: + msg412645
2022-02-06 14:59:52corona10setmessages: + msg412641
2022-02-06 14:57:03corona10setnosy: + nascheme, corona10
2022-02-06 14:52:03christian.heimessetkeywords: + patch
stage: patch review
pull_requests: + pull_request29337
2022-02-06 14:49:12christian.heimescreate