property_descr_get reuse argument tuple #68098

llllllllll · 2015-04-10T20:10:37Z

BPO	23910
Nosy	@warsaw, @rhettinger, @vstinner, @ericvsmith, @ericsnowcurrently, @serhiy-storchaka, @llllllllll
Files	namedtuple.patch: nameduple in C. namedtuple-indexer.patch namedtuple-indexer-update.patch namedtuple-indexer-with-news.patch property.patch with-static-tuple.patch

^{Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.}

Show more details

GitHub fields:

assignee = 'https://github.com/rhettinger'
closed_at = <Date 2015-04-30.15:20:53.514>
created_at = <Date 2015-04-10.20:10:37.345>
labels = ['interpreter-core', 'type-feature']
title = 'property_descr_get reuse argument tuple'
updated_at = <Date 2018-10-01.10:11:02.588>
user = 'https://github.com/llllllllll'

bugs.python.org fields:

activity = <Date 2018-10-01.10:11:02.588>
actor = 'vstinner'
assignee = 'rhettinger'
closed = True
closed_date = <Date 2015-04-30.15:20:53.514>
closer = 'rhettinger'
components = ['Interpreter Core']
creation = <Date 2015-04-10.20:10:37.345>
creator = 'llllllllll'
dependencies = []
files = ['38892', '38896', '38905', '38994', '39210', '39216']
hgrepos = []
issue_num = 23910
keywords = ['patch']
message_count = 26.0
messages = ['240447', '240448', '240449', '240450', '240451', '240452', '240465', '240535', '240565', '240568', '240934', '242082', '242085', '242089', '242091', '242093', '242103', '242112', '242114', '242123', '242124', '242234', '242235', '242273', '243981', '326783']
nosy_count = 8.0
nosy_names = ['barry', 'rhettinger', 'vstinner', 'eric.smith', 'python-dev', 'eric.snow', 'serhiy.storchaka', 'llllllllll']
pr_nums = []
priority = 'low'
resolution = 'fixed'
stage = None
status = 'closed'
superseder = None
type = 'enhancement'
url = 'https://bugs.python.org/issue23910'
versions = ['Python 3.5']

llllllllll · 2015-04-10T20:10:34Z

I am working on implementing nameduple in C; I am almost there; however, on the path of moving to full compatibility, I ran into a refcount issue somewhere. Hopefully someone can help me work this out.

To describe the issue, When I run the collections tests I most frequently segfault in a non namedtuple test; however, sometimes it runs (and passes) however that probably means I am using an unowned obect somewhere. I am new to the CPython API so I would love to go through this with someone to learn more about how it all works.

I am currently at PyCon and would absolutly love to speak to someone in person. I will be at CPython sprints for at least one day.

ericvsmith · 2015-04-10T20:44:11Z

What's the motivating use case for this?

llllllllll · 2015-04-10T21:20:33Z

Ideally, namedtuple is used to make your code cleaner, using "magic" indecies is less clear than using a named index in a lot of cases. Because namedtuple is mainly to make the code more readable, I don't think that it should have an impact on the runtime performance of the code. This means that namedtuple should be a very thin wrapper around tuple. Currently, namedtuple named item access is much slower than elementwise access. I have this as a standalone package (there are some changes in the diff I posted to acheive full backwards compat) here https://pypi.python.org/pypi/cnamedtuple/0.1.5 that show some profiling runs of this code. The notable one to me is named item access being around twice as fast.

Another issue we ran into at work is that we found ways to get the exec call in nametuple to execute arbitrary code; while this would not be a concern for most people by the nature of the way we got this to happen, we wanted to look at other ways of writing namedtuple. I looked through some older discussion and found that non-exec solutions were rejected in the past for perfomance reasons.

ericvsmith · 2015-04-10T21:29:37Z

Have you thought of just exposing Object/structseq.c?

Before you put much time into this, I'd get Raymond's acceptance of whatever approach you want to take. It might be best to raise it on python-ideas.

llllllllll · 2015-04-10T21:54:44Z

would the idea be to deprecate namedtuple in favor of a public structseq that is exposed through collections, or change structseq to fit the namedtuple API?

ericvsmith · 2015-04-10T22:25:34Z

I haven't seen thought it through, just that it seems very similar to a C namedtuple.

llllllllll · 2015-04-11T07:11:10Z

I stripped down the patch to only the descriptor like we had discussed.

rhettinger · 2015-04-12T04:25:55Z

Can you post before and afters timings of the patch?

llllllllll · 2015-04-12T18:48:04Z

# Original version / new python implementation
./python -m timeit -s "from collections import namedtuple;a = namedtuple('a', 'a b c')(1, 2, 3)" "a.a"
10000000 loops, best of 3: 0.07 usec per loop

# C implementation
./python -m timeit -s "from collections import namedtuple;a = namedtuple('a', 'a b c')(1, 2, 3)" "a.a"
10000000 loops, best of 3: 0.028 usec per loop

The fallback is the same implementation that is currently used so this should have no affect on pypi.

llllllllll · 2015-04-12T20:56:32Z

sorry, I meant pypy

llllllllll · 2015-04-14T16:25:09Z

I am updating the patch to include an entry in Misc/NEWS.

rhettinger · 2015-04-26T21:33:37Z

FWIW, the current property(itemgetter(index)) code has a Python creation step, but the actual attribute lookup and dispatch is done entirely in C (no pure python steps around the eval lookup).

Rather than making a user visible C hack directly to namedtuple, any optimization effort should be directly at improving the speed of property() and itemgetter().

Here are some quick notes to get you started:

* The overhead of property() is almost nothing.
* The code for property_descr_get() in Objects/descrobject.c
* It has two nearly zero cost predictable branches
* So the the only real work is a call to 
  PyObject_CallFunctionObjArgs(gs->prop_get, obj, NULL);
* which then calls both
       objargs_mktuple(vargs) 
  and 
       PyObject_Call(callable, args, NULL);
* That can be sped-up by using 
       PyTuple_New(1)
  and a direct call to  PyObject_Call()
* The logic in PyObject_Call can potentially be tightened
  in the context of a property(itemgetter(index)) call.
  Look to see whether recursion check is necessary
  (itemgetter on a tuple subclass that doesn't extend __getitem__
   is non-recursive)
* If so, then entire call to PyObject_Call() in property 
  can potentially be simplified to:
   result = (*call)(func, arg, kw);

I haven't looked too closely at this, but I think you get the general idea. If the speed of property(itemgetter(index)) is the bottleneck in your code, the starting point is to unwind the exact series of C steps performed to see if any of them can be simplified. For the most part, the code in property() and itemgetter() were both implemented using the simplest C parts of the C API rather than the fastest. The chain of calls isn't specialized for the common use case (i.e. property_get() needing exactly 1 argument rather than a variable length arg tuple and itemgetter doing a known integer offset on a list or tuple rather than the less common case of generic types and a possible tuple of indices).

We should start by optimizing what we've already got. That will have a benefit beyond named tuples (faster itemgetters for sorting and faster property gets for the entire language).

It also helps us avoid making the NT code less familiar (using custom private APIs rather than generic, well-known components).

It also reduces the risk of breaking code that relies on the published implementation of named tuple attribute lookups (for example, I've seen deployed code that customizes the attribute docstrings like this):
Point = namedtuple('Point', ['x', 'y'])
Point.x = property(Point.x.fget, doc='abscissa')
Point.y = property(Point.y.fget, doc='ordinate')
coordinate = Point(x=250, y=43)

rhettinger · 2015-04-26T23:04:43Z

One other thought: the itemgetter.__call__ method is already pretty thin:

if (!PyArg_UnpackTuple(args, "itemgetter", 1, 1, &obj))
return NULL;
if (nitems == 1)
    return PyObject_GetItem(obj, ig-\>item);

But you could add a special case for single integer index being applied to a known sequence. Extract the Py_ssize_t index in itemgetter_new and store it in the itemgetterobject. Then add a fast path in itemgetter.__call__. Roughly something like this:

  if (ig->index != -1 &&
      PyTuple_Check(obj) && 
      nitems == 1 &&
      PyTuple_GET_SIZE(obj) > ig->index) {
       result = PySequence_Fast_GET_ITEM(obj, ig->index);
       Py_INCREF(result);
       return result;
  }
 
Perhaps also add a check to make sure the tuple subclass hasn't overridden the __getitem__ method (see an example of how to do this in the code for Modules/_collectionsmodule.c::_count_elements()).

Something along these lines would allow all the steps to be inlined and would eliminate all the usual indirections inherent in the abstract API.

Another alternative is to use the PySequence API instead of the PyTuple API. That trades away a little of the speed-up in return for speeding-up itemgetter on lists as well as tuples.

llllllllll · 2015-04-27T00:44:07Z

I was unable to see a performance increase by playing with the itemgetter.__call__ code; however, updating the propery code seemed to show a small improvement. I think that for simple indexing the cost of checking if it is a sequence outways the faster dispatch (when using PySequence_GetItem). I can play with this further.

default
[joejev@Sheila cpython]$ ./python -m timeit -s "from collections import namedtuple as n;a = n('n', 'a b c')(1, 2, 3)" "a.a"
10000000 loops, best of 3: 0.101 usec per loop
patch
[joejev@Sheila cpython]$ ./python -m timeit -s "from collections import namedtuple as n;a = n('n', 'a b c')(1, 2, 3)" "a.a"
10000000 loops, best of 3: 0.0942 usec per loop

rhettinger · 2015-04-27T01:30:57Z

If you have a chance, run a C profiler so we can see whether most of the time is being spent in an attribute lookup for the current property(itemgetter) approach versus your nt-indexer approach. Without a profile, I have only my instincts that the overhead is a mix of indirections and function call overhead (both solveable by in-lining), and over-generalization for all PyObject_GetItem() (solvable by specialization to a tuple subclass), and variable length argument lists (solveable by using of PyTuple_New(1)).

Ideally, I would like something that speeds-up broader swaths of the language and doesn't obfuscate the otherwise clean generated code. ISTM that the C code for both property() and itemgetter() still have room to optimize the common case.

llllllllll · 2015-04-27T02:19:02Z

This was very exciting, I have never run gprof before; so just to make sure I did this correctly, I will list my steps:

compile with the -pg flag set
run the test with ./python -m timeit ...
$ gprof python gmon.out > profile.out

Here is default:

Each sample counts as 0.01 seconds.
% cumulative self self total
time seconds seconds calls Ts/call Ts/call name
22.48 0.87 0.87 PyEval_EvalFrameEx
9.82 1.25 0.38 PyObject_CallFunctionObjArgs
6.85 1.52 0.27 PyObject_GenericGetAttr
6.46 1.77 0.25 tupledealloc
5.56 1.98 0.22 PyArg_UnpackTuple
5.17 2.18 0.20 PyNumber_AsSsize_t
5.17 2.38 0.20 tuplesubscript
5.04 2.58 0.20 PyObject_Call
4.91 2.77 0.19 _PyType_Lookup
4.65 2.95 0.18 PyTuple_New
4.65 3.13 0.18 PyObject_GetItem
4.39 3.30 0.17 itemgetter_call
1.94 3.37 0.08 PyObject_GetAttr
1.81 3.44 0.07 PyObject_GC_UnTrack
1.81 3.51 0.07 _PyTuple_DebugMallocStats
1.03 3.55 0.04 PyErr_Occurred
1.03 3.59 0.04 property_descr_set
1.03 3.63 0.04 tuplerepr
0.78 3.66 0.03 PyLong_AsSsize_t
0.78 3.69 0.03 PyObject_GC_Track
0.52 3.71 0.02 property_descr_get
0.52 3.73 0.02 repeat_next
0.52 3.75 0.02 repeat_traverse
0.39 3.77 0.02 PyArg_ValidateKeywordArguments
0.39 3.78 0.02 _Py_CheckFunctionResult
0.26 3.79 0.01 PyCFunction_NewEx
0.26 3.80 0.01 PyCode_New
0.26 3.81 0.01 PyErr_Restore
0.26 3.82 0.01 PyType_GetSlot
0.26 3.83 0.01 _PyObject_CallMethodIdObjArgs
0.26 3.84 0.01 attrgetter_new
0.26 3.85 0.01 update_one_slot
0.13 3.86 0.01 _PyObject_GenericGetAttrWithDict
0.13 3.86 0.01 _PyObject_SetAttrId
0.13 3.87 0.01 gc_isenabled
0.13 3.87 0.01 visit_decref

Here is my patch:

Each sample counts as 0.01 seconds.
% cumulative self self total
time seconds seconds calls Ts/call Ts/call name
26.63 1.02 1.02 PyEval_EvalFrameEx
8.88 1.36 0.34 PyObject_GenericGetAttr
7.83 1.66 0.30 tupledealloc
6.27 1.90 0.24 PyObject_Call
5.74 2.12 0.22 PyTuple_New
5.48 2.33 0.21 property_descr_get
5.22 2.53 0.20 _PyType_Lookup
4.44 2.70 0.17 tuplesubscript
4.18 2.86 0.16 PyArg_UnpackTuple
3.92 3.01 0.15 PyNumber_AsSsize_t
3.66 3.15 0.14 PyObject_GetItem
2.87 3.26 0.11 itemgetter_call
2.61 3.36 0.10 PyObject_GC_UnTrack
1.70 3.43 0.07 PyObject_GetAttr
1.31 3.48 0.05 repeat_next
1.31 3.53 0.05 repeat_traverse
1.04 3.57 0.04 attrgetter_new
1.04 3.61 0.04 property_descr_set
0.78 3.64 0.03 PyErr_Occurred
0.78 3.67 0.03 PyErr_Restore
0.78 3.70 0.03 PyLong_AsSsize_t
0.78 3.73 0.03 PyType_GetSlot
0.52 3.75 0.02 PyObject_GC_Track
0.26 3.76 0.01 PyArg_ValidateKeywordArguments
0.26 3.77 0.01 PyDict_GetItem
0.26 3.78 0.01 _PyObject_GenericGetAttrWithDict
0.26 3.79 0.01 _PyTuple_DebugMallocStats
0.26 3.80 0.01 _Py_CheckFunctionResult
0.26 3.81 0.01 convertitem
0.26 3.82 0.01 r_object
0.26 3.83 0.01 tuplerepr
0.13 3.83 0.01 _PyObject_SetAttrId

It looks like you were correct that PyObject_CallFunctionObjArgs was eating up a lot of time.

rhettinger · 2015-04-27T07:26:48Z

Hmm, the presense of _PyTuple_DebugMallocStats, repeat_traverse, and visit_decref suggests that the profile may have been run with debugging code enabled and GC enabled.

The property patch looks good. Depending on how far you want to go with this, you could save your tuple of length-1 statically and reuse it on successive calls if its refcnt doesn't grow (see the code for zip() for an example of how to do this). That would save the PyTuple_New and tupledealloc calls.

Going further, potentially you could in-line some of the code it PyObject_Call, caching the callsite and its NULL check, and looking at the other steps to see if they are all necessary in the context of a property_desc_get().

llllllllll · 2015-04-27T12:38:17Z

I switched to the static tuple.

llllllllll · 2015-04-27T13:13:24Z

I don't think that I can cache the __call__ of the fget object because it might be an instance of a heaptype, and if someone changed the __class__ of the object in between calls to another heaptype that had a different __call__, you would still get the __call__ from the first type. I also don't know if this is supported behavior or just something that works by accident.

I read through PyObject_Call, and all the code is needed assuming we are not caching the tp_call value.

rhettinger · 2015-04-27T15:42:33Z

What kind of speed improvement have you gotten?

llllllllll · 2015-04-27T15:53:13Z

I am currently on a different machine so these numbers are not relative to the others posted earlier.

default
./python -m timeit -s "from collections import namedtuple as n;a = n('n', 'a b c')(1, 2, 3)" "a.a"
10000000 loops, best of 3: 0.0699 usec per loop
patch
./python -m timeit -s "from collections import namedtuple as n;a = n('n', 'a b c')(1, 2, 3)" "a.a"
10000000 loops, best of 3: 0.0468 usec per loop

rhettinger · 2015-04-29T16:26:07Z

0.0699 usec per loop --> 0.0468

That's pretty good for a small patch :-)

For the pre-computed 1-tuple, I think you need to check for a refcnt of 1 and fallback to PyTuple_New if the tuple is in use (i.e. a property that calls another property). See Objects/enumobject.c::105 for an example.

Also, consider providing a way to clean-up that tuple on shutdown. For example, look at what is done with the various freelists. An easier way is to make the premade tuple part of the property object struct so that it gets freed when the property is deallocated.

Adding Serhiy to the nosy list, he can help with cleaning-up the patch so that it is ready to apply.

llllllllll · 2015-04-29T17:18:54Z

I don't think that we need to worry about reusing the single argument tuple in a recursive situation because we never need the value after we start the call. We also just write our new value and then clean up with a NULL to make sure that we don't blow up when we dealloc the tuple. For example:

>>> class C(object):
...     @property
...     def f(self):
...         return D().f
... 
>>> class D(object):
...     @property
...     def f(self):
...         return 1
... 
>>> C().f
1

This works with recursive properties.
I also think that is is getting cleaned up on shutdown, if I put a pointer to garbage in the tuple, the interpreter blows up on shutdown; this makes me think that tuple_dealloc is being called somewhere. About putting the tuple on the property instance, that would nice for memory management; however, that increases the memory overhead of each property. This also means that we would only get the faster lookup after the property has been accessed once; this is fine but the current implementation makes it so that all properties are faster after any of them are looked up once. I could be wrong about the cleanup though.

I am also updating the title and headers because this issue is no longer about namedtuple.

python-dev · 2015-04-30T15:08:23Z

New changeset 661cdbd617b8 by Raymond Hettinger in branch 'default':
Issue bpo-23910: Optimize property() getter calls. Patch by Joe Jevnik
https://hg.python.org/cpython/rev/661cdbd617b8

serhiy-storchaka · 2015-05-24T13:12:40Z

This patch caused crash with C implementation of lru_cache (bpo-14373). Opened bpo-24276 for fixing this.

vstinner · 2018-10-01T10:11:03Z

This optimization caused multiple crashes, so it has been decided to remove it :-( See bpo-30156.

llllllllll mannequin added extension-modules C modules in the Modules dir type-feature A feature request or enhancement labels Apr 10, 2015

rhettinger self-assigned this Apr 11, 2015

llllllllll mannequin added interpreter-core (Objects, Python, Grammar, and Parser dirs) and removed extension-modules C modules in the Modules dir labels Apr 29, 2015

llllllllll mannequin changed the title ~~C implementation of namedtuple (WIP)~~ property_descr_get reuse argument tuple Apr 29, 2015

rhettinger closed this as completed Apr 30, 2015

ezio-melotti transferred this issue from another repository Apr 10, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

property_descr_get reuse argument tuple #68098

property_descr_get reuse argument tuple #68098

llllllllll mannequin commented Apr 10, 2015

llllllllll mannequin commented Apr 10, 2015

ericvsmith commented Apr 10, 2015

llllllllll mannequin commented Apr 10, 2015

ericvsmith commented Apr 10, 2015

llllllllll mannequin commented Apr 10, 2015

ericvsmith commented Apr 10, 2015

llllllllll mannequin commented Apr 11, 2015

rhettinger commented Apr 12, 2015

llllllllll mannequin commented Apr 12, 2015

llllllllll mannequin commented Apr 12, 2015

llllllllll mannequin commented Apr 14, 2015

rhettinger commented Apr 26, 2015

rhettinger commented Apr 26, 2015

llllllllll mannequin commented Apr 27, 2015

rhettinger commented Apr 27, 2015

llllllllll mannequin commented Apr 27, 2015

rhettinger commented Apr 27, 2015

llllllllll mannequin commented Apr 27, 2015

llllllllll mannequin commented Apr 27, 2015

rhettinger commented Apr 27, 2015

llllllllll mannequin commented Apr 27, 2015

rhettinger commented Apr 29, 2015

llllllllll mannequin commented Apr 29, 2015

python-dev mannequin commented Apr 30, 2015

serhiy-storchaka commented May 24, 2015

vstinner commented Oct 1, 2018

property_descr_get reuse argument tuple #68098

property_descr_get reuse argument tuple #68098

Comments

llllllllll mannequin commented Apr 10, 2015

llllllllll mannequin commented Apr 10, 2015

ericvsmith commented Apr 10, 2015

llllllllll mannequin commented Apr 10, 2015

ericvsmith commented Apr 10, 2015

llllllllll mannequin commented Apr 10, 2015

ericvsmith commented Apr 10, 2015

llllllllll mannequin commented Apr 11, 2015

rhettinger commented Apr 12, 2015

llllllllll mannequin commented Apr 12, 2015

llllllllll mannequin commented Apr 12, 2015

llllllllll mannequin commented Apr 14, 2015

rhettinger commented Apr 26, 2015

rhettinger commented Apr 26, 2015

llllllllll mannequin commented Apr 27, 2015

rhettinger commented Apr 27, 2015

llllllllll mannequin commented Apr 27, 2015

rhettinger commented Apr 27, 2015

llllllllll mannequin commented Apr 27, 2015

llllllllll mannequin commented Apr 27, 2015

rhettinger commented Apr 27, 2015

llllllllll mannequin commented Apr 27, 2015

rhettinger commented Apr 29, 2015

llllllllll mannequin commented Apr 29, 2015

python-dev mannequin commented Apr 30, 2015

serhiy-storchaka commented May 24, 2015

vstinner commented Oct 1, 2018