classification
Title: obmalloc: radix tree for tracking arena address ranges
Type: enhancement Stage: resolved
Components: Interpreter Core Versions:
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: Nosy List: methane, nascheme, tim.peters, vstinner
Priority: normal Keywords: patch

Created on 2019-06-29 23:08 by nascheme, last changed 2021-03-30 02:53 by nascheme. This issue is now closed.

Files
File name Uploaded Description Edit
pyperf_radix_compare.txt nascheme, 2019-06-29 23:12
obmalloc_overhead.py nascheme, 2019-06-29 23:12
perf_compare_noradix.txt nascheme, 2021-02-24 21:40 Compare base obmalloc with radix-tree version
perf_compare_radix4x.txt nascheme, 2021-02-24 21:41 Compare base obmalloc with radix-tree version, 4x sizes
perf_compare_arm64.txt nascheme, 2021-03-01 23:46 Compare base obmalloc with radix-tree version, ARM64
Pull Requests
URL Status Linked Edit
PR 14474 closed nascheme, 2019-06-29 23:13
Messages (5)
msg346903 - (view) Author: Neil Schemenauer (nascheme) * (Python committer) Date: 2019-06-29 23:08
This patch implements an alternative version of obmalloc's address_in_range().  It uses a radix tree to map the areas of memory covered by obmalloc arenas.  pymalloc_free() uses address_in_range() to determine if a block of memory is controlled by obmalloc or has been allocated by the system allocator.  The address_in_range() function must be fast.

The current version of address_in_range() uses a slightly memory unsanitary scheme.  I.e. it reads memory that has possibly not been initialized.  In theory that is not allowed and could cause a crash.  In practice, it has worked reliability for many years.  There is some ugly logic in obmalloc.c to disable sanity checking (e.g. ASN, TSAN, MSAN).  One advantage of this radix tree approach is that it doesn't have this unsanitary behavior.

Another small advantage of the radix tree approach is that it is independent from the OS page size.  With the current address_in_range() scheme, the size of obmalloc pools are limited to the size of the OS page size.  If larger, a segmentation fault could occur.  Bug #37211 (obmalloc: eliminate limit on pool size) allows for larger pools but at the cost of some additional code complexity and some additional memory overhead.

This patch has been discussed quite a bit on the python-dev list.  Thread subjects are:
    obmalloc (was Have a big machine and spare time? Here's a possible Python bug.)
    radix tree arena map for obmalloc

That discussion focuses quite a bit on the value of increasing the obmalloc arena and pool sizes.  This proposed patch keeps the sizes the same.  We can evaluate changing the sizes as a different issue.

I have run the pyperformance benchmark suite to compare performance of the radix tree with the status quo.  I think there is no significant difference for the programs that are part of the suite.  The pyperformance comparision report is attached.

If we do decide to go with larger pool and arena sizes, the radix tree approach actually uses less memory than the "big pools PR" (Bug #37211).  Attached is Tim's program to compare memory overheads of the different approaches.

I think it is worth considering using the radix tree by default on 64-bit platforms.  It seems to be not any slower than the status quo, does not have the memory unsanitary behavior and gives us the ability to easily increase pool sizes if we decide we want to.
msg387886 - (view) Author: Neil Schemenauer (nascheme) * (Python committer) Date: 2021-03-01 23:46
I was curious about the performance impact on a non-AMD64 platform.  I run pyperformance using an AWS t4g.micro instance.  That's an ARM64 CPU.  I did four sets of runs in total and the results seem repeatable, even given the vitalization.  I don't know why xml_etree_iterparse is significantly slower.  Seems like a spurious anomaly.

I was going to try on a m6g.metal instance (bare metal, no virtualization) by my AWS account doesn't allow that many vCPUs it seems.
msg388744 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2021-03-15 14:38
See also bpo-43313: "feature: support pymalloc for subinterpreters. each subinterpreter has pymalloc_state".
msg389694 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2021-03-29 12:29
See also bpo-43593 "pymalloc is not aware of Memory Tagging Extension (MTE) and crashes".
msg389782 - (view) Author: Neil Schemenauer (nascheme) * (Python committer) Date: 2021-03-30 02:51
New changeset 85b6b70589c187639aeebc560d67e9cc04abb4d8 by Neil Schemenauer in branch 'master':
bpo-37448: Use radix tree for pymalloc address_in_range(). (GH-14474)
https://github.com/python/cpython/commit/85b6b70589c187639aeebc560d67e9cc04abb4d8
History
Date User Action Args
2021-03-30 02:53:53naschemesetstatus: open -> closed
resolution: fixed
stage: patch review -> resolved
2021-03-30 02:51:39naschemesetmessages: + msg389782
2021-03-29 12:29:05vstinnersetmessages: + msg389694
2021-03-15 14:38:25vstinnersetnosy: + vstinner
messages: + msg388744
2021-03-01 23:46:14naschemesetfiles: + perf_compare_arm64.txt

messages: + msg387886
2021-02-24 21:41:03naschemesetfiles: + perf_compare_radix4x.txt
2021-02-24 21:40:31naschemesetfiles: + perf_compare_noradix.txt
2019-06-30 02:58:30xtreaksetnosy: + methane
2019-06-30 02:27:26tim.peterssetnosy: + tim.peters
2019-06-29 23:13:56naschemesetkeywords: + patch
pull_requests: + pull_request14291
2019-06-29 23:12:46naschemesetfiles: + obmalloc_overhead.py
2019-06-29 23:12:21naschemesetfiles: + pyperf_radix_compare.txt
2019-06-29 23:08:22naschemecreate