This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: Make attrgetter use namedtuple
Type: enhancement Stage: resolved
Components: Library (Lib) Versions: Python 3.7
process
Status: closed Resolution: rejected
Dependencies: Superseder:
Assigned To: Nosy List: Isaac Morland, christian.heimes, josh.r, r.david.murray, rhettinger, serhiy.storchaka, steven.daprano
Priority: normal Keywords:

Created on 2017-04-08 03:04 by Isaac Morland, last changed 2022-04-11 14:58 by admin. This issue is now closed.

Files
File name Uploaded Description Edit
test_attrgetter.py Isaac Morland, 2017-04-08 15:19
Messages (9)
msg291314 - (view) Author: Isaac Morland (Isaac Morland) Date: 2017-04-08 03:04
I would find it useful if the tuples returned by attrgetter functions were namedtuples.  An initial look at the code for attrgetter suggests that this would be an easy change and should make little difference to performance.  Giving a namedtuple where previously a tuple was returned seems unlikely to trigger bugs in existing code so I propose to simply change attrgetter rather than providing a parameter to specify whether or not to use the new behaviour.

Patch will be forthcoming but comments appreciated.
msg291320 - (view) Author: Steven D'Aprano (steven.daprano) * (Python committer) Date: 2017-04-08 06:35
Before writing a patch that may be rejected, can you explain in detail what change you propose? Example(s) will be good.

For example:

py> from operator import attrgetter
py> f = attrgetter('keys')
py> f({})
<built-in method keys of dict object at 0xb7c47ecc>

I don't see a tuple here, so what (if anything) are you planning to change?


How about the example from help(attrgetter)?

 |  After h = attrgetter('name.first', 'name.last'), the call h(r) returns
 |  (r.name.first, r.name.last).
msg291333 - (view) Author: Isaac Morland (Isaac Morland) Date: 2017-04-08 15:19
I've attached a file which illustrates what I'm proposing to happen with the examples from the help.  Note that attrgetter (attr) is not affected, only attrgetter (*attrs) for more than one attribute.  The idea is that tuples resulting from attrgetter functions will retain the attribute names from the original object.  In some work I have done recently this would have been very handy with groupby.

I had some initial confusion because changing the Python attrgetter implementation didn't make any difference.  Once I realized I needed to turn off import of the C implementation, I figured the rest out fairly quickly.  Here is the diff:

diff --git a/Lib/operator.py b/Lib/operator.py
index 0e2e53e..9b2a8fa 100644
--- a/Lib/operator.py
+++ b/Lib/operator.py
@@ -247,8 +247,12 @@ class attrgetter:
         else:
             self._attrs = (attr,) + attrs
             getters = tuple(map(attrgetter, self._attrs))
+
+            from collections import namedtuple
+            nt = namedtuple ('attrgetter', self._attrs, rename=True)
+
             def func(obj):
-                return tuple(getter(obj) for getter in getters)
+                return nt._make (getter(obj) for getter in getters)
             self._call = func
 
     def __call__(self, obj):
@@ -409,7 +413,7 @@ def ixor(a, b):
 
 
 try:
-    from _operator import *
+    pass
 except ImportError:
     pass
 else:

There are some issues that still need to be addressed.  The biggest is that I've turned off the C implementation.  I assume that we'll need a C implementation the new version.  In addition to this:

1) I just call the namedtuple type "attrgetter".  I'm thinking something obtained by mashing together the field names or something similar might be more appropriate.  However, I would prefer not to repeat the logic within namedtuple that deals with field names that aren't identifiers.  So I'm wondering if maybe I should also modify namedtuple to allow None as the type name, in which case it would use an appropriate default type name based on the field names.

2) I import from collections inside the function.  It didn't seem to work at the top-level, I'm guessing because I'm in the library and collections isn't ready when operator is initialized.  This may be fine I just point it out as something on which I could use advice.

I'm hoping this provides enough detail for people to understand what I'm proposing and evaluate whether this is a desireable enhancement.  If so, I'll dig into the C implementation next, although I may need assistance with that.
msg291337 - (view) Author: Josh Rosenberg (josh.r) * (Python triager) Date: 2017-04-08 17:02
Aside from other issues, namedtuples can't have fields beginning with underscores, attrgetter can get attributes beginning with underscores (and dotted attributes for that matter). There isn't going to be an obvious or intuitive mapping to apply here.
msg291338 - (view) Author: Isaac Morland (Isaac Morland) Date: 2017-04-08 17:38
What are the "other issues"?

As to the issue you raise here, that's why I use rename=True.

First create a type with an underscore attribute:

>>> t = namedtuple ('t', ['a', '1234'], rename=True)

(just an easy way of creating such a type; used of namedtuple specifically is admittedly a bit of a red herring)

Now create an object and illustrate its attributes:

>>> tt = t ('c', 'd')
>>> tt.a
'c'
>>> tt._1
'd'

Now use my modified attrgetter to get the attributes as a namedtuple:

>>> attrgetter ('a', '_1') (tt)
attrgetter(a='c', _1='d')
>>> 

And the example from the help, used in the test file I've already attached, illustrates that the dotted attribute case also works.

Essentially, my patch provides no benefit for attrgetter specified attributes that aren't valid namedtuple attribute names, but because of rename=True it still works and doesn't break anything.  So if you give "a" as an attribute name, the output will have an "a" attribute; if you give "_b" as an attribute name, the output will have an "_1" (or whatever number) attribute.  Similarly, it doesn't help with dotted attributes, but it doesn't hurt either.
msg291341 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2017-04-08 20:13
This is a clever idea, but I vote -1 for this proposal.  I think it makes attrgetter more complex for little purpose.  The fact that only some attribute names work and the others get mangled makes the API very ugly and not, IMO, desirable.

Finally, if you want this, you can pretty easily write a function that wraps attrgetter to do it.  I don't think it has enough utility to justify being in the stdlib.
msg291342 - (view) Author: Christian Heimes (christian.heimes) * (Python committer) Date: 2017-04-08 20:45
I'm with RDM and vote -1, too.

A namedtuple is a bit more costly than a normal tuple. That's especially try for dynamic attrgetter() that are defined ad-hoc, e.g. as key function for sorted(). The creation of type classes doesn't come for free.
msg291344 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2017-04-08 21:07
I vote -1 too.

1. As was said, this doesn't work with all attribute names.
2. This adds circular dependency between operator and collections modules.
3. This increases memory consumption and startup time for small programs that don't use the collections module.
4. Performance. Creating a namedtuple is slower than creating a tuple, and many code is optimized for raw tuples.
5. This increases the complexity of attrgetter() implementation and can introduce new bugs.
msg291350 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2017-04-09 05:53
I concur with the others.  This is unnecessary feature creep that doesn't make sense for typical use cases.
History
Date User Action Args
2022-04-11 14:58:45adminsetgithub: 74206
2017-04-09 05:53:39rhettingersetstatus: open -> closed

nosy: + rhettinger
messages: + msg291350

resolution: rejected
stage: resolved
2017-04-08 21:07:13serhiy.storchakasetnosy: + serhiy.storchaka
messages: + msg291344
2017-04-08 20:45:06christian.heimessetnosy: + christian.heimes
messages: + msg291342
2017-04-08 20:13:07r.david.murraysetnosy: + r.david.murray
messages: + msg291341
2017-04-08 17:38:12Isaac Morlandsetmessages: + msg291338
2017-04-08 17:02:14josh.rsetnosy: + josh.r
messages: + msg291337
2017-04-08 15:19:34Isaac Morlandsetfiles: + test_attrgetter.py

messages: + msg291333
2017-04-08 06:35:29steven.dapranosetnosy: + steven.daprano
messages: + msg291320
2017-04-08 03:04:51Isaac Morlandcreate