Issue 30020: Make attrgetter use namedtuple

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

This issue has been migrated to GitHub: https://github.com/python/cpython/issues/74206

classification

Title:	Make attrgetter use namedtuple
Type:	enhancement	Stage:	resolved
Components:	Library (Lib)	Versions:	Python 3.7

process

Status:	closed	Resolution:	rejected
Dependencies:		Superseder:
Assigned To:		Nosy List:	Isaac Morland, christian.heimes, josh.r, r.david.murray, rhettinger, serhiy.storchaka, steven.daprano
Priority:	normal	Keywords:

Created on 2017-04-08 03:04 by Isaac Morland, last changed 2022-04-11 14:58 by admin. This issue is now closed.

Files
File name	Uploaded	Description	Edit
test_attrgetter.py	Isaac Morland, 2017-04-08 15:19

Messages (9)
msg291314 - (view)	Author: Isaac Morland (Isaac Morland)	Date: 2017-04-08 03:04
I would find it useful if the tuples returned by attrgetter functions were namedtuples. An initial look at the code for attrgetter suggests that this would be an easy change and should make little difference to performance. Giving a namedtuple where previously a tuple was returned seems unlikely to trigger bugs in existing code so I propose to simply change attrgetter rather than providing a parameter to specify whether or not to use the new behaviour. Patch will be forthcoming but comments appreciated.
msg291320 - (view)	Author: Steven D'Aprano (steven.daprano) *	Date: 2017-04-08 06:35
Before writing a patch that may be rejected, can you explain in detail what change you propose? Example(s) will be good. For example: py> from operator import attrgetter py> f = attrgetter('keys') py> f({}) <built-in method keys of dict object at 0xb7c47ecc> I don't see a tuple here, so what (if anything) are you planning to change? How about the example from help(attrgetter)? \| After h = attrgetter('name.first', 'name.last'), the call h(r) returns \| (r.name.first, r.name.last).
msg291333 - (view)	Author: Isaac Morland (Isaac Morland)	Date: 2017-04-08 15:19
I've attached a file which illustrates what I'm proposing to happen with the examples from the help. Note that attrgetter (attr) is not affected, only attrgetter (attrs) for more than one attribute. The idea is that tuples resulting from attrgetter functions will retain the attribute names from the original object. In some work I have done recently this would have been very handy with groupby. I had some initial confusion because changing the Python attrgetter implementation didn't make any difference. Once I realized I needed to turn off import of the C implementation, I figured the rest out fairly quickly. Here is the diff: diff --git a/Lib/operator.py b/Lib/operator.py index 0e2e53e..9b2a8fa 100644 --- a/Lib/operator.py +++ b/Lib/operator.py @@ -247,8 +247,12 @@ class attrgetter: else: self._attrs = (attr,) + attrs getters = tuple(map(attrgetter, self._attrs)) + + from collections import namedtuple + nt = namedtuple ('attrgetter', self._attrs, rename=True) + def func(obj): - return tuple(getter(obj) for getter in getters) + return nt._make (getter(obj) for getter in getters) self._call = func def __call__(self, obj): @@ -409,7 +413,7 @@ def ixor(a, b): try: - from _operator import + pass except ImportError: pass else: There are some issues that still need to be addressed. The biggest is that I've turned off the C implementation. I assume that we'll need a C implementation the new version. In addition to this: 1) I just call the namedtuple type "attrgetter". I'm thinking something obtained by mashing together the field names or something similar might be more appropriate. However, I would prefer not to repeat the logic within namedtuple that deals with field names that aren't identifiers. So I'm wondering if maybe I should also modify namedtuple to allow None as the type name, in which case it would use an appropriate default type name based on the field names. 2) I import from collections inside the function. It didn't seem to work at the top-level, I'm guessing because I'm in the library and collections isn't ready when operator is initialized. This may be fine I just point it out as something on which I could use advice. I'm hoping this provides enough detail for people to understand what I'm proposing and evaluate whether this is a desireable enhancement. If so, I'll dig into the C implementation next, although I may need assistance with that.
msg291337 - (view)	Author: Josh Rosenberg (josh.r) *	Date: 2017-04-08 17:02
Aside from other issues, namedtuples can't have fields beginning with underscores, attrgetter can get attributes beginning with underscores (and dotted attributes for that matter). There isn't going to be an obvious or intuitive mapping to apply here.
msg291338 - (view)	Author: Isaac Morland (Isaac Morland)	Date: 2017-04-08 17:38
What are the "other issues"? As to the issue you raise here, that's why I use rename=True. First create a type with an underscore attribute: >>> t = namedtuple ('t', ['a', '1234'], rename=True) (just an easy way of creating such a type; used of namedtuple specifically is admittedly a bit of a red herring) Now create an object and illustrate its attributes: >>> tt = t ('c', 'd') >>> tt.a 'c' >>> tt._1 'd' Now use my modified attrgetter to get the attributes as a namedtuple: >>> attrgetter ('a', '_1') (tt) attrgetter(a='c', _1='d') >>> And the example from the help, used in the test file I've already attached, illustrates that the dotted attribute case also works. Essentially, my patch provides no benefit for attrgetter specified attributes that aren't valid namedtuple attribute names, but because of rename=True it still works and doesn't break anything. So if you give "a" as an attribute name, the output will have an "a" attribute; if you give "_b" as an attribute name, the output will have an "_1" (or whatever number) attribute. Similarly, it doesn't help with dotted attributes, but it doesn't hurt either.
msg291341 - (view)	Author: R. David Murray (r.david.murray) *	Date: 2017-04-08 20:13
This is a clever idea, but I vote -1 for this proposal. I think it makes attrgetter more complex for little purpose. The fact that only some attribute names work and the others get mangled makes the API very ugly and not, IMO, desirable. Finally, if you want this, you can pretty easily write a function that wraps attrgetter to do it. I don't think it has enough utility to justify being in the stdlib.
msg291342 - (view)	Author: Christian Heimes (christian.heimes) *	Date: 2017-04-08 20:45
I'm with RDM and vote -1, too. A namedtuple is a bit more costly than a normal tuple. That's especially try for dynamic attrgetter() that are defined ad-hoc, e.g. as key function for sorted(). The creation of type classes doesn't come for free.
msg291344 - (view)	Author: Serhiy Storchaka (serhiy.storchaka) *	Date: 2017-04-08 21:07
I vote -1 too. 1. As was said, this doesn't work with all attribute names. 2. This adds circular dependency between operator and collections modules. 3. This increases memory consumption and startup time for small programs that don't use the collections module. 4. Performance. Creating a namedtuple is slower than creating a tuple, and many code is optimized for raw tuples. 5. This increases the complexity of attrgetter() implementation and can introduce new bugs.
msg291350 - (view)	Author: Raymond Hettinger (rhettinger) *	Date: 2017-04-09 05:53
I concur with the others. This is unnecessary feature creep that doesn't make sense for typical use cases.

History
Date	User	Action	Args
2022-04-11 14:58:45	admin	set	github: 74206
2017-04-09 05:53:39	rhettinger	set	status: open -> closed nosy: + rhettinger messages: + msg291350 resolution: rejected stage: resolved
2017-04-08 21:07:13	serhiy.storchaka	set	nosy: + serhiy.storchaka messages: + msg291344
2017-04-08 20:45:06	christian.heimes	set	nosy: + christian.heimes messages: + msg291342
2017-04-08 20:13:07	r.david.murray	set	nosy: + r.david.murray messages: + msg291341
2017-04-08 17:38:12	Isaac Morland	set	messages: + msg291338
2017-04-08 17:02:14	josh.r	set	nosy: + josh.r messages: + msg291337
2017-04-08 15:19:34	Isaac Morland	set	files: + test_attrgetter.py messages: + msg291333
2017-04-08 06:35:29	steven.daprano	set	nosy: + steven.daprano messages: + msg291320
2017-04-08 03:04:51	Isaac Morland	create