Lazily create dictionaries for plain Python objects #89503

markshannon · 2021-10-01T11:21:18Z

BPO	45340
Nosy	@methane, @markshannon, @MojoVampire, @corona10, @iritkatriel
PRs	bpo-45340: Don't create object dictionaries unless actually needed #28802

^{Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.}

Show more details

GitHub fields:

assignee = 'https://github.com/markshannon'
closed_at = <Date 2021-10-20.10:54:07.424>
created_at = <Date 2021-10-01.11:21:17.552>
labels = ['interpreter-core', 'type-feature', '3.11']
title = 'Lazily create dictionaries for plain Python objects'
updated_at = <Date 2021-11-30.23:14:06.431>
user = 'https://github.com/markshannon'

bugs.python.org fields:

activity = <Date 2021-11-30.23:14:06.431>
actor = 'iritkatriel'
assignee = 'Mark.Shannon'
closed = True
closed_date = <Date 2021-10-20.10:54:07.424>
closer = 'Mark.Shannon'
components = ['Interpreter Core']
creation = <Date 2021-10-01.11:21:17.552>
creator = 'Mark.Shannon'
dependencies = []
files = []
hgrepos = []
issue_num = 45340
keywords = ['patch']
message_count = 7.0
messages = ['403010', '403492', '403494', '403737', '403830', '404424', '407407']
nosy_count = 5.0
nosy_names = ['methane', 'Mark.Shannon', 'josh.r', 'corona10', 'iritkatriel']
pr_nums = ['28802']
priority = 'normal'
resolution = 'fixed'
stage = 'resolved'
status = 'closed'
superseder = None
type = 'enhancement'
url = 'https://bugs.python.org/issue45340'
versions = ['Python 3.11']

markshannon · 2021-10-01T11:21:17Z

A "Normal" Python objects is conceptually just a pair of pointers, one to the class, and one to the dictionary.

With shared keys, the dictionary is redundant as it is no more than a pair of pointers, one to the keys and one to the values.

By adding a pointer to the values to the object, or embedding the values in the object, and fetching the keys via the class, we can avoid creating a dictionary for many objects.

See faster-cpython/ideas#72 for more details.

MojoVampire · 2021-10-08T19:03:23Z

Hmm... Key-sharing dictionaries were accepted largely without question because they didn't harm code that broke them (said code gained nothing, but lost nothing either), and provided a significant benefit. Specifically:

They imposed no penalty on code that violated the code-style recommendation to initialize all variables consistently in __init__ (code that always ended up using a non-sharing dict). Such classes don't benefit, but neither do they get penalized (just a minor CPU cost to unshare when it realized sharing wouldn't work).
It imposes no penalty for using vars(object)/object.__dict__ when you don't modify the set of keys (so reading or changing values of existing attributes caused no problems).

The initial version of this worsens case #2; you'd have to convert to key-sharing dicts, and possibly to unshared dicts a moment later, if the set of attributes is changed. And when it happens, you'd be paying the cost of the now defunct values pointer storage for the life of each instance (admittedly a small cost).

But the final proposal compounds this, because the penalty for lazy attribute creation (directly, or dynamically by modifying via vars()/dict) is now a per-instance cost of n pointers (one for each value).

The CPython codebase rarely uses lazy attribute creation, but AFAIK there is no official recommendation to avoid it (not in PEP-8, not in the official tutorial, not even in PEP-412 which introduced Key-Sharing Dictionaries). Imposing a fairly significant penalty on people who aren't even violating language recommendations, let alone language rules, seems harsh.

I'm not against this initial version (one pointer wasted isn't so bad), but the additional waste in the final version worries me greatly.

Beyond the waste, I'm worried how you'd handle the creation of the first instance of such a class; you'd need to allocate and initialize an instance before you know how many values to tack on to the object. Would the first instance use a real dict during the first __init__ call that it would use to realloc the instance (and size all future instances) at the end of __init__? Or would it be realloc-ing for each and every attribute creation? In either case, threading issues seem like a problem.

Seems like:

Even in the ideal case, this only slightly improves memory locality, and only provides a fixed reduction in memory usage per-instance (the dict header and a little allocator round-off waste), not one that scales with number of attributes.
Classes that would benefit from this would typically do better to use __slots__ (now that dataclasses.dataclass supports slots=True, encouraging that as a default use case adds little work for class writers to use them)

If the gains are really impressive, might still be worth it. But I'm just worried that we'll make the language penalize people who don't know to avoid lazy attribute creation. And the complexity of this layered:

Not-a-dict
Key-sharing-dict
Regular dict

approach makes me worry it will allow subtle bugs in key-sharing dicts to go unnoticed (because so little code would still use them).

MojoVampire · 2021-10-08T19:09:21Z

Hmm... And there's one other issue (that wouldn't affect people until they actually start worrying about memory overhead). Right now, if you want to determine the overhead of an instance, the options are:

Has __dict__: sys.getsizeof(obj) + sys.getsizeof(obj.__dict__)
Lacks __dict__ (built-ins, slotted classes): sys.getsizeof(obj)

This change would mean even checking if something using this setup has a __dict__ creates one. Without additional introspection support, there's no way to tell the real memory usage of the instance without changing the memory usage (for the worse).

markshannon · 2021-10-12T14:35:58Z

Josh,

I'm not really following the details of what you are saying.

You claim "Key-sharing dictionaries were accepted largely without question because they didn't harm code that broke them".
Is that true? I don't remember it that way. They were accepted because they saved memory and didn't slow things down.

This issue, proposes the same thing: less memory used, no slower or a bit faster.

If you are curious about how the first few instances of a class are handled, it is described here:
faster-cpython/ideas#72 (comment)

Lazy attribute is not an issue here. How well keys are shared across instances depends on the dictionary implementation and was improved by #28520

It would be helpful if you could give specific examples where you think this change would use more memory or be slower.

markshannon · 2021-10-13T13:19:43Z

New changeset a8b9350 by Mark Shannon in branch 'main':
bpo-45340: Don't create object dictionaries unless actually needed (GH-28802)
a8b9350

markshannon · 2021-10-20T10:54:07Z

Josh, please reopen if you have more to add.

iritkatriel · 2021-11-30T23:14:06Z

I believe this may have caused the regression in bpo-45941.

erlend-aasland · 2022-11-29T21:53:43Z

This seems to have caused the 3.11 regression in gh-99886. According to the reporter, the issue was fixed in main with gh-95278.

This PR fixes some test failures that were due to an optimization in Python 3.11: the tests expected objects in a cycle to have a `__dict__`, but Python 3.11 introduces an optimization that creates `__dict__` lazily when needed. See python/cpython#89503.

markshannon added the 3.11 only security fixes label Oct 1, 2021

markshannon self-assigned this Oct 1, 2021

markshannon added interpreter-core (Objects, Python, Grammar, and Parser dirs) type-feature A feature request or enhancement 3.11 only security fixes labels Oct 1, 2021

markshannon self-assigned this Oct 1, 2021

markshannon added interpreter-core (Objects, Python, Grammar, and Parser dirs) type-feature A feature request or enhancement labels Oct 1, 2021

markshannon closed this as completed Oct 20, 2021

ezio-melotti transferred this issue from another repository Apr 10, 2022

mdickinson mentioned this issue Oct 7, 2023

Fix a test failure to lazy __dict__ creation on Python >= 3.11 mdickinson/refcycle#88

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Lazily create dictionaries for plain Python objects #89503

Lazily create dictionaries for plain Python objects #89503

markshannon commented Oct 1, 2021

markshannon commented Oct 1, 2021

MojoVampire mannequin commented Oct 8, 2021

MojoVampire mannequin commented Oct 8, 2021

markshannon commented Oct 12, 2021

markshannon commented Oct 13, 2021

markshannon commented Oct 20, 2021

iritkatriel commented Nov 30, 2021

erlend-aasland commented Nov 29, 2022

Lazily create dictionaries for plain Python objects #89503

Lazily create dictionaries for plain Python objects #89503

Comments

markshannon commented Oct 1, 2021

markshannon commented Oct 1, 2021

MojoVampire mannequin commented Oct 8, 2021

MojoVampire mannequin commented Oct 8, 2021

markshannon commented Oct 12, 2021

markshannon commented Oct 13, 2021

markshannon commented Oct 20, 2021

iritkatriel commented Nov 30, 2021

erlend-aasland commented Nov 29, 2022