Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dynamically generate the _source attribute of namedtuple to save memory) #63839

Closed
vstinner opened this issue Nov 18, 2013 · 13 comments
Closed
Assignees
Labels
performance Performance or resource usage

Comments

@vstinner
Copy link
Member

BPO 19640
Nosy @rhettinger, @vstinner, @ericvsmith, @giampaolo, @tiran, @merwok, @ericsnowcurrently
Files
  • namedtuple_source.patch
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = 'https://github.com/rhettinger'
    closed_at = <Date 2014-04-02.20:28:04.621>
    created_at = <Date 2013-11-18.09:42:10.388>
    labels = ['performance']
    title = 'Dynamically generate the _source attribute of namedtuple to save memory)'
    updated_at = <Date 2017-07-17.13:31:32.088>
    user = 'https://github.com/vstinner'

    bugs.python.org fields:

    activity = <Date 2017-07-17.13:31:32.088>
    actor = 'giampaolo.rodola'
    assignee = 'rhettinger'
    closed = True
    closed_date = <Date 2014-04-02.20:28:04.621>
    closer = 'rhettinger'
    components = []
    creation = <Date 2013-11-18.09:42:10.388>
    creator = 'vstinner'
    dependencies = []
    files = ['34484']
    hgrepos = []
    issue_num = 19640
    keywords = ['patch']
    message_count = 13.0
    messages = ['203270', '203271', '203304', '203707', '203870', '203878', '213904', '213946', '213949', '214003', '214212', '214219', '215398']
    nosy_count = 7.0
    nosy_names = ['rhettinger', 'vstinner', 'eric.smith', 'giampaolo.rodola', 'christian.heimes', 'eric.araujo', 'eric.snow']
    pr_nums = []
    priority = 'low'
    resolution = 'rejected'
    stage = None
    status = 'closed'
    superseder = None
    type = 'resource usage'
    url = 'https://bugs.python.org/issue19640'
    versions = ['Python 3.5']

    @vstinner
    Copy link
    Member Author

    The definition of a new nametuple creates a large Python script to create the new type. The code stores the code in a private attribute:

        namespace = dict(__name__='namedtuple_%s' % typename)
        exec(class_definition, namespace)
        result = namespace[typename]
        result._source = class_definition

    This attribute wastes memory, I don't understand the purpose of the attribute. It was not discussed in an issue, so I guess that there is no real use case:

    changeset: 68879:bffdd7e9265c
    user: Raymond Hettinger <python@rcn.com>
    date: Wed Mar 23 12:52:23 2011 -0700
    files: Doc/library/collections.rst Lib/collections/init.py Lib/test/test_collections.py
    description:
    Expose the namedtuple source with a _source attribute.

    Can we just drop this attribute to reduce the Python memory footprint?

    @vstinner
    Copy link
    Member Author

    I found this issue while using my tracemalloc module to analyze the memory consumption of Python. On the Python test suite, the _source attribute is the 5th line allocating the memory memory:

    /usr/lib/python3.4/collections/init.py: 676.2 kB

    @vstinner
    Copy link
    Member Author

    the 5th line allocating the memory memory

    oops, the 5th line allocating the *most* memory

    @ericsnowcurrently
    Copy link
    Member

    As an alternative, how about turning _source into a property?

    @merwok
    Copy link
    Member

    merwok commented Nov 22, 2013

    In a first version namedtuple had an argument (named echo or verbose) that would cause the source code to be printed out, for use at the interactive prompt. Raymond later changed it to a _source attribute, more easy to work with than printed output.

    About the other question you asked on the ML (why isn’t there a base NamedTuple class to inherit): this has been discussed on python-ideas IIRC, and people have written ActiveState recipes for that idea. It should be easy to find the ML archive links from the ActiveState posts.

    @ericsnowcurrently
    Copy link
    Member

    A while back, because of those python-ideas discussions, Raymond added a link at the bottom of the namedtuple section of the docs at http://docs.python.org/3.4/library/collections.html#namedtuple-factory-function-for-tuples-with-named-fields. The link points to a nice recipe by Jan Kaliszewski.

    @vstinner
    Copy link
    Member Author

    As an alternative, how about turning _source into a property?

    A class or an instance property? A class property requires a metaclass. I guess that each namedtuple type requires its own metaclass, right?

    @ericsnowcurrently
    Copy link
    Member

    It does not necessarily require a metaclass. You can accomplish it using a custom descriptor:

    class classattr:
        def __init__(self, getter):
            self.getter = getter
        def __get__(self, obj, cls):
            return self.getter(cls)

    FWIW, this is a descriptor that may be worth adding somewhere regardless.

    @vstinner
    Copy link
    Member Author

    namedtuple_source.patch: Replace _source attribute wasting memory with a property generating the source on demand. The patch adds also unit test for the verbose attribute (which is public and documented, even it is said to be "outdated").

    The patch removes also repr_fmt and num_fields parameters of the class definition template, compute these values using the list of fields.

    I suggested to change Python 3.4.1 and 3.5.

    Test script:
    ---

    import email
    import http.client
    import pickle
    import test.regrtest
    import test.test_os
    import tracemalloc
    import xmlrpc.server
    
    snap = tracemalloc.take_snapshot()
    with open("dump.pickle", "wb") as fp:
        pickle.dump(snap, fp, 2)

    With the patch, the memory footprint is reduced by 176 kB.

    @vstinner vstinner changed the title Drop _source attribute of namedtuple Drop _source attribute of namedtuple (waste memory) Mar 18, 2014
    @vstinner vstinner added the performance Performance or resource usage label Mar 18, 2014
    @ericsnowcurrently
    Copy link
    Member

    Also be sure the have Raymond's sign-off before committing anything for this. :)

    @rhettinger rhettinger self-assigned this Mar 20, 2014
    @rhettinger
    Copy link
    Contributor

    FWIW, the "verbose" option is mentioned as outdated because the "_source" attribute was added.

    Also, there are real use cases, people are using the _source as writing it to a .py file so that the dynamic namedtuple generation step can be skipped on subsequent imports. This is useful when people want to avoid the use of eval or want to run cython on the code.

    The attribute can be "dropped". It is part of the API.

    Sorry, the memory use bugs you. It is bigger than typical docstrings but is not a significant memory consumer in most applications.

    I like the idea of dynamically generating the source upon lookup, but want to think about whether there are any unintended consequences to that space saving hack.

    @rhettinger rhettinger changed the title Drop _source attribute of namedtuple (waste memory) Dynamically generate the _source attribute of namedtuple to save memory) Mar 20, 2014
    @rhettinger
    Copy link
    Contributor

    The size of the _source attribute is about 2k per namedtuple class:

    >>> from collections import namedtuple
    >>> Response = namedtuple('Response', ['code', 'msg', 'compressed', 'written'])
    >>> len(Response._source)
    2174

    @rhettinger
    Copy link
    Contributor

    Victor, I don't think the added complexity is worth 2k per named tuple class. Every time I've gone down the path of lazy evaluation, I've paid an unexpected price for it down the road. If the savings were huge, it might be worth it, but that isn't the case here. This isn't really different than proposing that all docstring be in a separate module to be lazily loaded only when people look at help.

    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    performance Performance or resource usage
    Projects
    None yet
    Development

    No branches or pull requests

    4 participants