Message 403492 - Python tracker

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author	josh.r
Recipients	Mark.Shannon, corona10, josh.r, methane
Date	2021-10-08.19:03:22
SpamBayes Score	-1.0
Marked as misclassified	Yes
Message-id	<1633719803.37.0.989542155304.issue45340@roundup.psfhosted.org>
In-reply-to

Content
Hmm... Key-sharing dictionaries were accepted largely without question because they didn't harm code that broke them (said code gained nothing, but lost nothing either), and provided a significant benefit. Specifically: 1. They imposed no penalty on code that violated the code-style recommendation to initialize all variables consistently in __init__ (code that always ended up using a non-sharing dict). Such classes don't benefit, but neither do they get penalized (just a minor CPU cost to unshare when it realized sharing wouldn't work). 2. It imposes no penalty for using vars(object)/object.__dict__ when you don't modify the set of keys (so reading or changing values of existing attributes caused no problems). The initial version of this worsens case #2; you'd have to convert to key-sharing dicts, and possibly to unshared dicts a moment later, if the set of attributes is changed. And when it happens, you'd be paying the cost of the now defunct values pointer storage for the life of each instance (admittedly a small cost). But the final proposal compounds this, because the penalty for lazy attribute creation (directly, or dynamically by modifying via vars()/__dict__) is now a per-instance cost of n pointers (one for each value). The CPython codebase rarely uses lazy attribute creation, but AFAIK there is no official recommendation to avoid it (not in PEP 8, not in the official tutorial, not even in PEP 412 which introduced Key-Sharing Dictionaries). Imposing a fairly significant penalty on people who aren't even violating language recommendations, let alone language rules, seems harsh. I'm not against this initial version (one pointer wasted isn't so bad), but the additional waste in the final version worries me greatly. Beyond the waste, I'm worried how you'd handle the creation of the first instance of such a class; you'd need to allocate and initialize an instance before you know how many values to tack on to the object. Would the first instance use a real dict during the first __init__ call that it would use to realloc the instance (and size all future instances) at the end of __init__? Or would it be realloc-ing for each and every attribute creation? In either case, threading issues seem like a problem. Seems like: 1. Even in the ideal case, this only slightly improves memory locality, and only provides a fixed reduction in memory usage per-instance (the dict header and a little allocator round-off waste), not one that scales with number of attributes. 2. Classes that would benefit from this would typically do better to use __slots__ (now that dataclasses.dataclass supports slots=True, encouraging that as a default use case adds little work for class writers to use them) If the gains are really impressive, might still be worth it. But I'm just worried that we'll make the language penalize people who don't know to avoid lazy attribute creation. And the complexity of this layered: 1. Not-a-dict 2. Key-sharing-dict 3. Regular dict approach makes me worry it will allow subtle bugs in key-sharing dicts to go unnoticed (because so little code would still use them).

Hmm... Key-sharing dictionaries were accepted largely without question because they didn't harm code that broke them (said code gained nothing, but lost nothing either), and provided a significant benefit. Specifically:

1. They imposed no penalty on code that violated the code-style recommendation to initialize all variables consistently in __init__ (code that always ended up using a non-sharing dict). Such classes don't benefit, but neither do they get penalized (just a minor CPU cost to unshare when it realized sharing wouldn't work).

2. It imposes no penalty for using vars(object)/object.__dict__ when you don't modify the set of keys (so reading or changing values of existing attributes caused no problems).

The initial version of this worsens case #2; you'd have to convert to key-sharing dicts, and possibly to unshared dicts a moment later, if the set of attributes is changed. And when it happens, you'd be paying the cost of the now defunct values pointer storage for the life of each instance (admittedly a small cost).

But the final proposal compounds this, because the penalty for lazy attribute creation (directly, or dynamically by modifying via vars()/__dict__) is now a per-instance cost of n pointers (one for each value).

The CPython codebase rarely uses lazy attribute creation, but AFAIK there is no official recommendation to avoid it (not in PEP 8, not in the official tutorial, not even in PEP 412 which introduced Key-Sharing Dictionaries). Imposing a fairly significant penalty on people who aren't even violating language recommendations, let alone language rules, seems harsh.

I'm not against this initial version (one pointer wasted isn't so bad), but the additional waste in the final version worries me greatly.

Beyond the waste, I'm worried how you'd handle the creation of the first instance of such a class; you'd need to allocate and initialize an instance before you know how many values to tack on to the object. Would the first instance use a real dict during the first __init__ call that it would use to realloc the instance (and size all future instances) at the end of __init__? Or would it be realloc-ing for each and every attribute creation? In either case, threading issues seem like a problem.

Seems like:

1. Even in the ideal case, this only slightly improves memory locality, and only provides a fixed reduction in memory usage per-instance (the dict header and a little allocator round-off waste), not one that scales with number of attributes.

2. Classes that would benefit from this would typically do better to use __slots__ (now that dataclasses.dataclass supports slots=True, encouraging that as a default use case adds little work for class writers to use them)

If the gains are really impressive, might still be worth it. But I'm just worried that we'll make the language penalize people who don't know to avoid lazy attribute creation. And the complexity of this layered:

1. Not-a-dict
2. Key-sharing-dict
3. Regular dict

approach makes me worry it will allow subtle bugs in key-sharing dicts to go unnoticed (because so little code would still use them).

History
Date	User	Action	Args
2021-10-08 19:03:23	josh.r	set	recipients: + josh.r, methane, Mark.Shannon, corona10
2021-10-08 19:03:23	josh.r	set	messageid: <1633719803.37.0.989542155304.issue45340@roundup.psfhosted.org>
2021-10-08 19:03:23	josh.r	link	issue45340 messages
2021-10-08 19:03:22	josh.r	create