Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enhance tracemalloc to trace properly free lists #79234

Closed
vstinner opened this issue Oct 23, 2018 · 12 comments
Closed

Enhance tracemalloc to trace properly free lists #79234

vstinner opened this issue Oct 23, 2018 · 12 comments
Labels
3.8 only security fixes stdlib Python modules in the Lib dir

Comments

@vstinner
Copy link
Member

BPO 35053
Nosy @vstinner, @methane, @serhiy-storchaka
PRs
  • bpo-35053: Enhance tracemalloc to trace free lists #10063
  • bpo-35053: Add Include/tracemalloc.h #10091
  • bpo-35053: Define _PyTraceMalloc_NewReference in object.h #10107
  • Files
  • dict_wrong_traceback.py
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = None
    closed_at = <Date 2018-10-25.14:16:18.914>
    created_at = <Date 2018-10-23.16:42:14.413>
    labels = ['3.8', 'library']
    title = 'Enhance tracemalloc to trace properly free lists'
    updated_at = <Date 2018-10-25.22:03:41.289>
    user = 'https://github.com/vstinner'

    bugs.python.org fields:

    activity = <Date 2018-10-25.22:03:41.289>
    actor = 'vstinner'
    assignee = 'none'
    closed = True
    closed_date = <Date 2018-10-25.14:16:18.914>
    closer = 'vstinner'
    components = ['Library (Lib)']
    creation = <Date 2018-10-23.16:42:14.413>
    creator = 'vstinner'
    dependencies = []
    files = ['47889']
    hgrepos = []
    issue_num = 35053
    keywords = ['patch']
    message_count = 12.0
    messages = ['328326', '328351', '328361', '328363', '328364', '328374', '328375', '328428', '328436', '328441', '328500', '328501']
    nosy_count = 3.0
    nosy_names = ['vstinner', 'methane', 'serhiy.storchaka']
    pr_nums = ['10063', '10091', '10107']
    priority = 'normal'
    resolution = 'fixed'
    stage = 'resolved'
    status = 'closed'
    superseder = None
    type = None
    url = 'https://bugs.python.org/issue35053'
    versions = ['Python 3.8']

    @vstinner
    Copy link
    Member Author

    CPython uses many "free lists": list of "deallocated" objects which are kept alive to optimize allocation of new objects. For example, the builtin list type has a free list.

    Problem: tracemalloc only traces the memory allocation when the object is created, but it doesn't update the traceback when the "free object" is reused to create "a new object".

    Attached PR modifies _Py_NewReference() to update the Python traceback in the tracemalloc trace.

    @vstinner vstinner added 3.8 only security fixes stdlib Python modules in the Lib dir labels Oct 23, 2018
    @methane
    Copy link
    Member

    methane commented Oct 24, 2018

    Is performance overhead negligible?

    @vstinner
    Copy link
    Member Author

    Is performance overhead negligible?

    Thank you for asking the most important question :-)

    I ran this microbenchmark:

    make distclean
    ./configure --enable-lto
    make
    ./python -m venv env
    env/bin/python -m pip install perf
    sudo env/bin/python -m perf system tune
    env/bin/python -m perf timeit -o FILE.json -v '[]'

    My first attempt:

    $ env/bin/python -m perf compare_to ref.json patch.json 
    Mean +- std dev: [ref] 20.6 ns +- 0.1 ns -> [patch] 22.4 ns +- 0.1 ns: 1.09x slower (+9%)

    The addition of the _PyTraceMalloc_NewReference() call which does nothing (check tracing flag, return) adds 1.7 ns: it's not negligible on such micro-benchmark, and I would prefer to avoid it whenever possible since _Py_NewReference() is the root of the free list optimization.

    New attempt: expose tracemalloc_config and add _Py_unlikely() macro (instruct the compiler that tracing is false most of the time):

    Mean +- std dev: [ref] 20.6 ns +- 0.1 ns -> [unlikely] 20.4 ns +- 0.3 ns: 1.01x faster (-1%)

    Good! The overhead is now negligible!

    But... is the hardcore low-level _Py_unlikely() optimization really needed?...

    $ env/bin/python -m perf compare_to ref.json if_tracing.json 
    Benchmark hidden because not significant (1): timeit

    => no, the macro is useless, so I removed it!

    New benchmark to double-check on my laptop.

    git checkout master
    make clean; make
    rm -rf env; ./python -m venv env; env/bin/python -m pip install perf
    sudo env/bin/python -m perf system tune
    env/bin/python -m perf timeit -o ref.json -v '[]' --rigorous

    git checkout tracemalloc_newref
    make clean; make
    rm -rf env; ./python -m venv env; env/bin/python -m pip install perf
    env/bin/python -m perf timeit -o patch.json -v '[]' --rigorous

    $ env/bin/python -m perf compare_to ref.json patch.json 
    Mean +- std dev: [ref] 20.8 ns +- 0.7 ns -> [patch] 20.5 ns +- 0.3 ns: 1.01x faster (-1%)

    The std dev is a little bit high. I didn't use CPU isolation and Hexchat + Firefox was running in the background, *but* it seems like the mean is very close, and so that my PR has no significant overhead.

    @vstinner
    Copy link
    Member Author

    Python 3.8 uses many free lists:
    https://pythondev.readthedocs.io/cpython_impl_optim.html#free-lists

    Attached dict_wrong_traceback.py shows the bug on the dictionary of an object:

    $ ./python ~/dict_wrong_traceback.py 
      File "/home/vstinner/dict_wrong_traceback.py", line 13
        p = Point() # first object (dead!)
      File "/home/vstinner/dict_wrong_traceback.py", line 8
        self.x = 1

    tracemalloc shows the traceback of the first object... which has been destroyed!

    With the fix:

    $ ./python ~/dict_wrong_traceback.py 
      File "/home/vstinner/dict_wrong_traceback.py", line 16
        p = Point() # second object (alive)
      File "/home/vstinner/dict_wrong_traceback.py", line 8
        self.x = 1

    It's much better: it doesn't show dead objects anymore :-)

    @vstinner
    Copy link
    Member Author

    A little bit of history.

    I opened a bug 2 years ago but I closed it (lack of interest):
    vstinner/pytracemalloc#2

    I rewrote tracemalloc between version 0.9 and 1.0. In tracemalloc 0.9, there was an API to track free lists. Here is the code to handle "alloc" and "free" of an object inside a freelist:

    https://github.com/vstinner/pytracemalloc/blob/a2b2616fc73cd5ce0ea45d1b68a490e0fc52ccc8/_tracemalloc.c#L291-L337

    My PR 10063 has a more correct and efficient implementation:

    • It keeps the trace alive when the object moves into the free list to report the real memory usage of Python: the free lists consumes memory
    • It writes directly into the hash table entry rather than remove/add frequently the trace, it should be more efficient

    @vstinner
    Copy link
    Member Author

    Even if Python 3.6 and 3.7 are impacted by the bug, I propose to only fix Python 3.8 since the change modifies a _Py_NewReference() function which is critical for performance.

    @vstinner
    Copy link
    Member Author

    While writing PR 10063, I was unhappy with _Py_NewReference() macro, and so I wrote bpo-35059 to convert it to a static inline function.

    @vstinner
    Copy link
    Member Author

    New changeset 9e00e80 by Victor Stinner in branch 'master':
    bpo-35053: Enhance tracemalloc to trace free lists (GH-10063)
    9e00e80

    @vstinner
    Copy link
    Member Author

    New changeset 6279c1c by Victor Stinner in branch 'master':
    bpo-35053: Add Include/tracemalloc.h (GH-10091)
    6279c1c

    @vstinner
    Copy link
    Member Author

    This change modifies _Py_NewReference() which is a very important function and it impacts Python performance. I prefer to keep the bug in Python 3.6 and 3.7 to not risk to introduce a regression.

    The bug exists since Python 3.4 and I'm the first one to spot it. It's not like there is huge pressure to backport the fix.

    Thanks again INADA-san for the review!

    @vstinner
    Copy link
    Member Author

    New changeset c89a932 by Victor Stinner in branch 'master':
    bpo-35053: Define _PyTraceMalloc_NewReference in object.h (GH-10107)
    c89a932

    @vstinner
    Copy link
    Member Author

    Oh :-( I didn't expect that I would have to declare PyTraceMalloc_NewReference() in object.h even if Py_LIMITED_API is defined...

    Python currently leaks too much things even if Py_LIMITED_API is defined. It's time to break the C API! :-) https://pythoncapi.readthedocs.io/ (just kidding, or not)

    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    3.8 only security fixes stdlib Python modules in the Lib dir
    Projects
    None yet
    Development

    No branches or pull requests

    2 participants