operator.attrgetter slower than lambda after adding dotted names ability #54369

tzot · 2010-10-21T00:10:19Z

BPO	10160
Nosy	@rhettinger, @terryjreedy, @pitrou, @alex
Files	so3940518.py: sample benchmark to verify speed of operator.attrgetter vs attribute getting using a lambda issue10160.diff: (documentation, source, test) changes issue10160-2.diff: newer version of (source, documentation, testing) issue10160.diff: after fixing issues noted by Antoine Pitrou

^{Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.}

Show more details

GitHub fields:

assignee = 'https://github.com/pitrou'
closed_at = <Date 2010-10-31.15:27:40.648>
created_at = <Date 2010-10-21.00:10:18.520>
labels = ['tests', 'library', 'performance']
title = 'operator.attrgetter slower than lambda after adding dotted names ability'
updated_at = <Date 2011-02-21.11:33:45.837>
user = 'https://bugs.python.org/tzot'

bugs.python.org fields:

activity = <Date 2011-02-21.11:33:45.837>
actor = 'tzot'
assignee = 'pitrou'
closed = True
closed_date = <Date 2010-10-31.15:27:40.648>
closer = 'pitrou'
components = ['Library (Lib)', 'Tests']
creation = <Date 2010-10-21.00:10:18.520>
creator = 'tzot'
dependencies = []
files = ['19310', '19313', '19339', '19440']
hgrepos = []
issue_num = 10160
keywords = ['patch']
message_count = 14.0
messages = ['119247', '119249', '119250', '119251', '119258', '119261', '119264', '119417', '119419', '120006', '120023', '120035', '120059', '128957']
nosy_count = 5.0
nosy_names = ['rhettinger', 'terry.reedy', 'tzot', 'pitrou', 'alex']
pr_nums = []
priority = 'low'
resolution = 'fixed'
stage = 'resolved'
status = 'closed'
superseder = None
type = 'performance'
url = 'https://bugs.python.org/issue10160'
versions = ['Python 3.2']

tzot · 2010-10-21T00:10:12Z

(Discovered in that StackOverflow answer: http://stackoverflow.com/questions/3940518/3942509#3942509 ; check the comments too)

operator.attrgetter in its simplest form (i.e. with a single non-dotted name) needs more time to execute than an equivalent lambda expression.

Attached file so3940518.py runs a simple benchmark comparing: a list comprehension of plain attribute access; attrgetter; and lambda. I will append sample benchmark times at the end of the comment.

Browsing Modules/operator.c, I noticed that the dotted_getattr function was using PyUnicode_Check and (possibly) splitting on dots on *every* call of the attrgetter, which I thought to be most inefficient.

I changed the py3k-daily-snapshot source to make the PyUnicode_Check calls in the attrgetter_new function; also, I modified the algorithm to pre-parse the operator.attrgetter functions for possible dots in the names, in order for the dotted_getattr function to become speedier.

The only “drawback” is that now operator.attrgetter raises a TypeError on creation, not on subsequent calls of the attrgetter object; this shouldn't be a compatibility problem. However, I obviously had to update both Doc/library/operator.rst and Lib/test/test_operator.py .

I am not sure whether I should attach a zip/tar file with both the attachments (the sample benchmark and the diff); so I'll attach the diff in a further comment.

On the Ubuntu server 9.10 where I made the changes, I ran the so3940518.py sample benchmark before and after the changes.

Run before the changes (third column is seconds, less is better):

list comp 0.40999999999999925 1000000
map attrgetter 1.3899999999999997 1000000
map lambda 1.0099999999999998 1000000

Run after the changes:

list comp 0.40000000000000036 1000000
map attrgetter 0.5199999999999996 1000000
map lambda 0.96 1000000

tzot · 2010-10-21T00:22:34Z

Here comes the diff to Modules/operator.c, Doc/library/operator.rst and Lib/test/test_operator.py . As far as I could check, there are no leaks, but a more experienced eye in core development could not hurt. Also, obviously test_operatory.py passes all tests.

Should this be accepted, I believe it should be backported to 2.7 (at least). I can do that, just let me know.

tzot · 2010-10-21T00:28:46Z

Newer version of the diff, since I forgot some "if(0) fprintf" debug calls that shouldn't be there.

tzot · 2010-10-21T00:40:51Z

An explanation to the changes.

The old code kept the operator.itemgetter arguments in the ag->attr member. If the argument count (ag->nattrs) was 1, the single argument was kept; if more than 1, a tuple of the original arguments was kept.

On every attrgetter_call call, if ag->nattrs was 1, dotted_getattr was called with the plain ag->attr as attribute name; if > 2, dotted_getattr was called for every one of the original arguments.

Now, ag->attr is always a tuple, containing either dotless strings or tuples of dotless strings:

operator.attrgetter("name1", "name2.name3", "name4")

stores ("name1", ("name2", "name3"), "name4") in ag->attr.

dotted_getattr accordingly chooses based on type (either str or tuple, ensured by attrgetter_new) whether to do a single access or a recursive one.

rhettinger · 2010-10-21T02:03:57Z

Thanks for noticing this. I wasn't aware that it had slowed down after dotted name support had been added.

Since it is a mild performance issue, I'm lowering the priority. Will take a further look when I get the chance. A first look at the patch shows that it is bigger than I expected. Isn't it possible to check for a dot on construction and just record the boolean so that a fast, non-splitting path can be used.

I'm reluctant to make the code volume grow much for this function. It is already a bit voluminous for the minor convenience it offers.

alex · 2010-10-21T05:33:24Z

Voice of ignorance here: why can't this be implemented in the "naive" way one might in Python, use the existing string splitting algorithms of stringlib, but just leave it in __new__.

tzot · 2010-10-21T06:56:00Z

Modules/operator.c grows by ~70 lines, most of it the setup code for ag->attr; also I loop twice over the args of attrgetter_new, choosing fast code that runs once per attrgetter creation than temporary data.

Alex's suggestion to make use of Python-level functions to shorten the code of attrgetter_new could obviously work to decrease the source lines. I don't know how fast I would produce such a version if requested, though.

Whatever the way attrgetter_new sets up the data, I would suggest that you keep the logic changes in general, i.e. set-up in attrgetter_new and keep a thinner dotted_getattr , since it avoids running the same checks and splitting over and over again for every attrgetter_call invocation.

terryjreedy · 2010-10-23T04:17:40Z

I am not sure whether I should attach a zip/tar file with both the attachments (the sample benchmark and the diff); so I'll attach the diff in a further comment.

Uploading separate files with .py, .diff extensions that can be viewed in a browser is indeed the right thing to do.\

tzot · 2010-10-23T06:41:12Z

A newer version of the patch with the following changes:

single loop in the ag->attr setup phase of attrgetter_new; interning of the stored attribute names
added two more tests of invalid attrgetter parameters (".attr", "attr.")

pitrou · 2010-10-30T18:11:04Z

Some comments about the patch:

this seems to be some dead code:

     if (nattrs <= 1) {
         if (!PyArg_UnpackTuple(args, "attrgetter", 1, 1, &attr))
             return NULL;

you can't call PyUnicode_GET_SIZE or PyUnicode_AS_UNICODE before you have called PyUnicode_Check. If the object is not an unicode object, it triggers an assertion in debug mode:
python: ./Modules/operator.c:404: attrgetter_new: Assertion `((((((PyObject*)(item))->ob_type))->tp_flags & ((1L<<28))) != 0)' failed.
you should also call PyUnicode_InternInPlace() on the last dotless name (after the for loop)
this comment looks false:
/* attrgetter_new code should ensure we never come here */
in dotted_getattr, when attr is a tuple, you leak references to intermediate objects (because you don't decref obj before doing "newobj = obj")

Other than that, looks quite worthwhile.

tzot · 2010-10-30T21:31:05Z

Thank you very much, Antoine, for your review. My comments in reply:

the dead code: it's not dead, IIRC it ensures that at least one argument is given, otherwise it raises an exception.
PyUnicode_GET_SIZE: you're right. The previous patch didn't have this problem, because there were two loops: the first one made sure in advance that all arguments are PyUnicode.
the false comment: right again. A remain from the first patch.
dotted_getattr and references: right! I should have noted better what Raymond's initial loop did.

Attached a corrected version of the patch according to Antoine's comments.

rhettinger · 2010-10-31T04:19:24Z

The patch looks fine.

You're overview of the process presented here in the tracker would make a nice comment in the code.

If Antoine is happy with the latest patch, then it's ready to apply.

pitrou · 2010-10-31T15:27:40Z

The PyUnicode_GET_SIZE issue was still there, but I've fixed it and committed in r86036. Thanks for your contribution!

tzot · 2011-02-21T11:33:46Z

This is not the proper place for it, but in the 3.2 and 2.7 news it is reported that “The multi-argument form of operator.attrgetter() function now runs slightly faster” while it should be “The multi-argument form of operator.attrgetter() function now runs slightly faster and the single-argument form runs much faster”.

tzot mannequin assigned docspython Oct 21, 2010

tzot mannequin added docs Documentation in the Doc dir stdlib Python modules in the Lib dir tests Tests in the Lib/test dir labels Oct 21, 2010

benjaminp assigned rhettinger and unassigned docspython Oct 21, 2010

rhettinger added the performance Performance or resource usage label Oct 21, 2010

merwok removed the docs Documentation in the Doc dir label Oct 21, 2010

rhettinger assigned pitrou and unassigned rhettinger Oct 31, 2010

pitrou closed this as completed Oct 31, 2010

ezio-melotti transferred this issue from another repository Apr 10, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

operator.attrgetter slower than lambda after adding dotted names ability #54369

operator.attrgetter slower than lambda after adding dotted names ability #54369

tzot mannequin commented Oct 21, 2010

tzot mannequin commented Oct 21, 2010

tzot mannequin commented Oct 21, 2010

tzot mannequin commented Oct 21, 2010

tzot mannequin commented Oct 21, 2010

rhettinger commented Oct 21, 2010

alex commented Oct 21, 2010

tzot mannequin commented Oct 21, 2010

terryjreedy commented Oct 23, 2010

tzot mannequin commented Oct 23, 2010

pitrou commented Oct 30, 2010

tzot mannequin commented Oct 30, 2010

rhettinger commented Oct 31, 2010

pitrou commented Oct 31, 2010

tzot mannequin commented Feb 21, 2011

operator.attrgetter slower than lambda after adding dotted names ability #54369

operator.attrgetter slower than lambda after adding dotted names ability #54369

Comments

tzot mannequin commented Oct 21, 2010

tzot mannequin commented Oct 21, 2010

tzot mannequin commented Oct 21, 2010

tzot mannequin commented Oct 21, 2010

tzot mannequin commented Oct 21, 2010

rhettinger commented Oct 21, 2010

alex commented Oct 21, 2010

tzot mannequin commented Oct 21, 2010

terryjreedy commented Oct 23, 2010

tzot mannequin commented Oct 23, 2010

pitrou commented Oct 30, 2010

tzot mannequin commented Oct 30, 2010

rhettinger commented Oct 31, 2010

pitrou commented Oct 31, 2010

tzot mannequin commented Feb 21, 2011