classification
Title: Optimize namedtuple creation
Type: performance Stage: resolved
Components: Library (Lib) Versions: Python 3.7
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: rhettinger Nosy List: Ethan Smith, Jelle Zijlstra, eric.smith, giampaolo.rodola, gvanrossum, inada.naoki, josh.r, lazka, llllllllll, ncoghlan, pitrou, rhettinger, serhiy.storchaka, vstinner, xiang.zhang
Priority: normal Keywords: patch

Created on 2016-11-08 04:07 by inada.naoki, last changed 2017-09-10 17:25 by rhettinger. This issue is now closed.

Files
File name Uploaded Description Edit
28638-functools-no-namedtuple.patch inada.naoki, 2016-11-08 04:21 review
namedtuple-no-compile.patch serhiy.storchaka, 2016-11-08 12:36 review
namedtuple1.py eric.smith, 2016-11-08 22:17
namedtuple-clinic.diff inada.naoki, 2016-11-21 09:10 review
namedtuple-clinic2.diff inada.naoki, 2016-11-21 09:35 review
functools-CacheInfo-Makefile.patch serhiy.storchaka, 2016-12-01 14:09 review
namedtuple-clinic3.patch inada.naoki, 2016-12-03 10:04 review
Pull Requests
URL Status Linked Edit
PR 2736 closed Jelle Zijlstra, 2017-07-16 22:10
PR 3454 merged rhettinger, 2017-09-08 07:16
Messages (62)
msg280277 - (view) Author: INADA Naoki (inada.naoki) * (Python committer) Date: 2016-11-08 04:07
I surprised how functools make import time slower.
And I find namedtuple makes it slower.

When I replaced

_CacheInfo = namedtuple("CacheInfo", ["hits", "misses", "maxsize", "currsize"])

this line with `_CachedInfo._source`:

(before)
$ ~/local/py37/bin/python3 -m perf timeit -s 'import importlib, functools' -- 'importlib.reload(functools)'
.....................
Median +- std dev: 1.21 ms +- 0.01 ms

(replaced)
$ ~/local/py37/bin/python3 -m perf timeit -s 'import importlib, functools' -- 'importlib.reload(functools)'
.....................
Median +- std dev: 615 us +- 12 us
msg280279 - (view) Author: INADA Naoki (inada.naoki) * (Python committer) Date: 2016-11-08 04:21
I feel this patch is safe enough to be landed in 3.6.
msg280282 - (view) Author: Xiang Zhang (xiang.zhang) * (Python committer) Date: 2016-11-08 05:45
I doubt this deserves a change. The slow import is the case only the first time functools is imported. Later imports will just use the cache (sys.modules). And if this is gonna change, maybe we don't have to copy the entire namedtuple structure?
msg280283 - (view) Author: INADA Naoki (inada.naoki) * (Python committer) Date: 2016-11-08 06:09
> The slow import is the case only the first time functools is imported. Later imports will just use the cache (sys.modules).

Yes. But first import time is also important for CLI applications.
That's why mercurial and Bazaar has lazy import system.

Since many stdlib uses functools, many applications may be suffered from
slow functools import even if we remove it from site.py.

>  maybe we don't have to copy the entire namedtuple structure?

https://docs.python.org/3.5/library/functools.html#functools.lru_cache

The doc says it's a namedtuple.  So it should be namedtuple compatible.
msg280284 - (view) Author: Xiang Zhang (xiang.zhang) * (Python committer) Date: 2016-11-08 06:41
> Yes. But first import time is also important for CLI applications.
That's why mercurial and Bazaar has lazy import system.

The lazy import system could benefit many libs so the result could be impressive. But here only functools is enhanced, half a millisecond is reduced.

Performance of course is important, but replicating code sounds not good. It means you have to maintain two pieces.
msg280285 - (view) Author: INADA Naoki (inada.naoki) * (Python committer) Date: 2016-11-08 07:16
> The lazy import system could benefit many libs so the result could be impressive. But here only functools is enhanced, half a millisecond is reduced.

On the other hand, implementing lazy import makes application complex.

This patch only enhance functools, but it is very important one module.
Even if we remove functools from site.py, most applications relies on it,
especially for functools.wraps().
This patch can optimize startup time of them.

Half milliseconds is small, but it isn't negligible on some situation.
Some people loves tools quickly starts.  For example, there are many
people maintain their vimrc to keep <50~100ms startup time.
And Python is common language to implement vim plugins.

Additionally, it make noise when profiling startup time.
I've very confused when I saw PyParse_AddToken() in profile.
Less noise make it easy to optimize startup time.


> Performance of course is important, but replicating code sounds not good. It means you have to maintain two pieces.

Yes. Balance is important.
I want to hear more opinions from more other devs.
msg280288 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2016-11-08 09:01
What is the main culprit, importing the collections module or compiling a named tuple?
msg280291 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2016-11-08 11:17
Using namedtuple is not new in 3.6, thus this is not a regression that can be fixed at beta stage.

Inlining the source of a named tuple class looks ugly solution. It would be better to write the source in separate file and import it. Makefile can have a rule for recreating this source file if collections.py is changed.

More general solution would be to make namedtuple() using cached precompiled class and update the cache if it doesn't match namedtuple arguments.

Yet one solution is to make namedtuple() not using compiling, but return patched local class. But Raymond is against this.
msg280297 - (view) Author: INADA Naoki (inada.naoki) * (Python committer) Date: 2016-11-08 12:16
> What is the main culprit, importing the collections module or compiling a named tuple?

In this time, later.
But collections module takes 1+ ms to import too.
I'll try to optimize it.

> Using namedtuple is not new in 3.6, thus this is not a regression that can be fixed at beta stage.

Make sense.

> More general solution would be to make namedtuple() using cached precompiled class and update the cache if it doesn't match namedtuple arguments.

What "precompiled class" means? pyc file? or source string to be
executed?

> Yet one solution is to make namedtuple() not using compiling, but return patched local class. But Raymond is against this.

I'll search the discussion.
I think anther solution is reimplement namedtuple by C.
As far as I remember, attrs [1] does it.

[1] https://pypi.python.org/pypi/attrs
msg280298 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2016-11-08 12:36
Here is a sample patch that make namedtuple() not using dynamic compilation. It has rough the same performance effect as inlining the named tuple source, but affects all named tuples.
msg280300 - (view) Author: INADA Naoki (inada.naoki) * (Python committer) Date: 2016-11-08 12:47
(tip)
$ ~/local/py37/bin/python3 -m perf timeit -s 'import importlib, functools' -- 'importlib.reload(functools)'
.....................
Median +- std dev: 1.21 ms +- 0.01 ms

(namedtuple-no-compile.patch)
$ ~/local/py37/bin/python3 -m perf timeit -s 'import importlib, functools' -- 'importlib.reload(functools)'
.....................
Median +- std dev: 677 us +- 8 us

Nice!
msg280303 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2016-11-08 12:55
One of problems with this patch is that it make instantiating a namedtuple much slower (due to parsing arguments by Python code). This can be solved by using eval() for creating only the __new__ method (see commented out line "result.__new__ = eval(...)"). This increases the time of creating named tuple class, but it still is faster than with current code.
msg280356 - (view) Author: Eric V. Smith (eric.smith) * (Python committer) Date: 2016-11-08 22:17
This file is derived from my namedlist project on PyPI.

I've stripped out the namedlist stuff, and just left namedtuple. I've also removed the Python 2 support, and I've removed support for default parameters. After that surgery, I have not tested it very well.

Those are my excuses for why the code is more complex that it would be if I were writing it from scratch.

Anyway, instead of using eval() of a string to create the new() function, I use ast manipulations to generate a function that does all of the correct type checking. It calls eval() too, of course, but with a code object.

I originally wrote this as an exercise in learning how to generate AST's. I can't say it's the best way to solve this problem, and I haven't benchmarked it ever. So just consider it as a proof of concept, or ignore it if you're not interested.
msg280543 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2016-11-10 21:31
> half a millisecond is reduced.

I would like to caution against any significant changes to save microscopic amounts of time.  Twisting the code into knots for minor time savings is rarely worth it and it not what Python is all about.

> Half milliseconds is small, but it isn't negligible on some situation.

I would say that it is almost always negligible and reflects a need for a better sense of proportion and perspective.

Also, in the past we've found that efforts to speed start-up time were measurable only in trivial cases.  Tools like mercurial end-up importing and using a substantial chunk of the standard library anyway, so those tools got zero benefit from the contortions we did to move _collections_abc.py from underneath the collections module.

In the case of functools, if the was a real need (and I don't believe there is), I would be willing to accept INADA's original patch replacing the namedtuple call with its source.

That said, I don't think half millisecond is worth the increase in code volume and the double maintenance problem keeping it in-sync with any future changes to namedtuple.   In my opinion, accumulating technical debt in this fashion is a poor software design practice.
msg280560 - (view) Author: Nick Coghlan (ncoghlan) * (Python committer) Date: 2016-11-11 04:40
I'll echo Raymond's concerns here, as we simply don't have the collective maintenance capacity to sustain a plethora of special case micro-optimisations aimed at avoiding importing common standard library modules.

I will note however, that there has been relatively little work done on optimising CPython's code generator, as the use of pyc files and the fact namedtuples are typically only defined at start-up already keeps it out of the critical path in most applications.

While work invested there would technically still be a micro-optimisation at the language level, it would benefit more cases than just avoiding the use of namedtuple in functools would.

Alternatively, rather than manually duplicating the namedtuple code and having to keep it in sync by hand, you could investigate the work Larry Hastings has already done for Argument Clinic in Python's C files: https://docs.python.org/3/howto/clinic.html

Argument Clinic already includes the machinery necessary to assist with automated maintenance of generated code (at least in C), and hence could potentially be adapted to the task of "named tuple inlining". If Victor's AST transformation pipeline and function guard proposals in PEP's 511 and 510 are accepted at some point in the future, then such inlining could potentially even be performed implicitly some day.

Caring about start-up performance is certainly a good thing, but when considering potential ways to improve the situation, structural enhancements to the underlying systems are preferable to ad hoc special cases that complicate future development efforts.
msg280561 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2016-11-11 05:51
Thanks Nick.  I'm going to mark this as closed, as the proposal to microscopic to warrant incurring technical debt.

If someone comes forward with more fully formed idea for code generation or overall structural enchancement, that can be put in another tracker item.
msg281336 - (view) Author: INADA Naoki (inada.naoki) * (Python committer) Date: 2016-11-21 09:10
> If someone comes forward with more fully formed idea for code generation or overall structural enchancement, that can be put in another tracker item.

I noticed argument clinic supports Python [1]. So there is one way to code generation already.
Attached patch uses Argument Clinic and Makefile to generate source.

[1]: https://docs.python.org/3.5/howto/clinic.html#using-argument-clinic-in-python-files
msg281339 - (view) Author: INADA Naoki (inada.naoki) * (Python committer) Date: 2016-11-21 09:35
Updated patch: fixed small issue in argument clinic, and added
comment why we use code generation.
msg281356 - (view) Author: Nick Coghlan (ncoghlan) * (Python committer) Date: 2016-11-21 13:57
Ah, I had forgotten that Larry had already included Python support in Argument Clinic.

With the inline code auto-generated from the pure Python implementation, that addresses the main maintenance concerns I had. I did briefly wonder about the difficulties of bootstrapping Argument Clinic (since it uses functools), but that's already accounted for in the comment-based design of Argument Clinic itself (i.e. since the generated code is checked in, the previous iteration can be used to generate the updated one when the namedtuple template changes).

Raymond, how does this variant look to you?
msg282172 - (view) Author: INADA Naoki (inada.naoki) * (Python committer) Date: 2016-12-01 13:04
(reopen the issue to discuss about using Argument Clinic)
msg282178 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2016-12-01 14:09
Argument Clinic is not needed, since we can use Makefile.
msg282182 - (view) Author: Nick Coghlan (ncoghlan) * (Python committer) Date: 2016-12-01 14:30
The concern with using the "generate a private module that can be cached" approach is that it doesn't generalise well - any time you want to micro-optimise a new module that way, you have to add a custom Makefile rule.

By contrast, Argument Clinic is a general purpose tool - adopting it for micro-optimisation in another file would just be a matter of adding that file to the list of files that trigger a clinic run. functools.py would be somewhat notable as the first Python file we do that for, but it isn't a novel concept overall.

That leads into my main comment on the AC patch: the files that are explicitly listed as triggering a new clinic run should be factored out into a named variable and that list commented accordingly.
msg282278 - (view) Author: INADA Naoki (inada.naoki) * (Python committer) Date: 2016-12-03 10:04
> That leads into my main comment on the AC patch: the files that are explicitly listed as triggering a new clinic run should be factored out into a named variable and that list commented accordingly.

done.
msg282279 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2016-12-03 10:37
Argument Clinic is used just for running the generating code and inlining the result. This is the simplest part of Argument Clinic and using it looks an overhead. Argument Clinic has other disadvantages:

* In any case you need a rule in Makefile, otherwise the generated code can became outdated.

* Generated code depends not just on the generator code, but on the code of the collections module.

* Even tiny change in the generating code, namedtuple implementation or Argument Clinic code could need regenerating generated code with different checksums.

My second idea, more general solution, was making namedtuple itself using external caching. This would add a benefit for all users of namedtuple without changing a user code or with minimal changes.

namedtuple itself can save a bytecode and a source in files (like Java creates additional .class files for internal classes) and use a bytecode if it is not outdated. Generalized import machinery could be used for supporting generated code in a sync.
msg282412 - (view) Author: INADA Naoki (inada.naoki) * (Python committer) Date: 2016-12-05 10:48
I think external cache system introduces more complexity and startup overhead than AC.

I think functools is the only "very common" module using namedtuple, because
`functools.wraps()` is used to create decorator functions.

But if general solution for all namedtuple is necessary to make agreement,
I think C implemented namedtuple may be better.
structseq is faster than namedtuple, not only when building type, but also
using instance.


$ ./python -m perf timeit -s 'import sys; vi = sys.version_info' -- 'vi.major, vi.minor, vi.micro'
.....................
Median +- std dev: 130 ns +- 2 ns

$ ./python -m perf timeit -s 'from collections import namedtuple; VersionInfo=namedtuple("VersionInfo", "major minor micro releaselevel serial"); vi=VersionInfo(3, 7, 0, "alpha", 0)' -- 'vi.major, vi.minor, vi.micro'
.....................
Median +- std dev: 212 ns +- 4 ns
msg285615 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2017-01-17 06:38
Sorry INADA but I think this is all a foolish and inconsequential optimization that complicates the code in a way that isn't worth it (saving only a 1/2 millisecond in a single import. Also, we don't want the argument clinic code to start invading the pure python code which is used by other Python implementations.
msg298400 - (view) Author: Jelle Zijlstra (Jelle Zijlstra) * Date: 2017-07-15 17:58
I'm also concerned that the slowness of namedtuple creation is causing people to avoid using it. I can see why we wouldn't want a complicated solution like using Argument Clinic, but it's not clear to me why Serhiy's approach in namedtuple-no-compile.patch was rejected. This approach could provide a speedup for all namedtuple instantiations without complicating the implementation. I wrote a similar implementation in https://github.com/JelleZijlstra/cpython/commit/5634af4ccfd06a2fabc2cc2cfcc9c014caf6f389 and found that it speeds up namedtuple creation, uses less code, and creates only one necessary backwards compatibility break (we no longer have _source).
msg298444 - (view) Author: INADA Naoki (inada.naoki) * (Python committer) Date: 2017-07-16 17:52
I like your idea.  Would you make pull request?
msg298453 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2017-07-17 00:54
> creates only one necessary backwards compatibility break 
> (we no longer have _source).

IMO, this is an essential feature.  It allows people to easily build their own variants, to divorce the generated code from the generator, and to fully understand what named tuples do (that is in part why we get so few questions about how they work).

You all seem to be in rush to redesign code that has been stable and well served the needs of users for a very long time.  This all seems to be driven by a relentless desire for micro-optimizations regardless of actual need.

BTW, none of the new contributors seem to be aware of named tuple's history.  It was an amalgamation of many separate implementations that had sprung up in the wild (it was being reinvented many times).  It was posted as ASPN recipe and went through a long period of maturation that incorporated the suggestions of over a dozen engineers based on use in the field.  It went through further refinement when examined and discussed on the pythoh-dev incorporating reviews from Guido, Alex, and Tim.  Since that time, the tools has been broadly deployed and met the needs of enormous numbers of users. Its use is considered a best practice.  The code and API have maintained and improved an intentionally slow and careful pace.

I really, really do not want to significantly revised the stable code and undermine the premise of its implementation so that you can save a few micro-seconds in the load of some module.  That is contrary to our optimization philosophy for CPython.  

As is, the code is very understandable, easy to maintain, easy to understand, easy to create variants, easy to verify that it is bug free. It works great for CPython, IronPython, PyPy, and Jython without modification.
msg298457 - (view) Author: Nick Coghlan (ncoghlan) * (Python committer) Date: 2017-07-17 05:21
I agree with Raymond here - the standard library's startup benchmarks are *NOT* normal code execution paths, since normal code execution is dominated by the actual operation being performed, and hence startup micro-optimizations vanish into the noise.

Accordingly, we should *not* be redesigning existing standard interfaces simply for the sake of allowing them to be used during startup without significantly slowing down the interpreter startup benchmark.

By contrast, it *is* entirely OK to introduce specialised types specifically for internal use (including during startup), and only making them available at the Python level through the types module (e.g. types.MappingProxyType, types.SimpleNamespace).

At the moment, the internal PyStructSequence type used to define sys.flags, sys.version_info, etc *isn't* exposed that way, so efforts to allow the use of namedtuple-style interfaces in modules that don't want to use namedtuple itself would likely be better directed towards making that established type available and usable through the types module, rather than towards altering namedtuple.

That approach would have the potential to solve both the interpreter startup optimisation problem (as the "types" module mainly just exposes thing defined by the interpreter implementation, not new Python level classes), *and* provide an alternate option for folks that have pre-emptively decided that namedtuple is going to be "too slow" for their purposes without actually measuring the relative performance in the context of their application.
msg298482 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2017-07-17 11:41
I disagree with the rejection of this request.  The idea that "_source is an essential feature" should be backed by usage statistics instead of being hand-waved as rejection cause.
msg298485 - (view) Author: Nick Coghlan (ncoghlan) * (Python committer) Date: 2017-07-17 12:14
Folks, you're talking about removing a *public*, *documented* API from the standard library. The onus would thus be on you to prove *lack* of use, *and* provide adequate justification for the compatibility break, not on anyone else to prove that it's "sufficiently popular" to qualify for the standard backwards compatibility guarantees. Those guarantees apply by default and are only broken for compelling reasons - that's why we call them guarantees

Don't be fooled by the leading underscore - that's an artifact of how namedtuple avoids colliding with arbitrary field names, not an indicator that this is a private API: https://docs.python.org/3/library/collections.html#collections.somenamedtuple._source

"It would be faster" isn't adequate justification, since speed increases only matter in code that has been identified as a bottleneck, and startup time in general (let alone namedtuple definitions in particular) is rarely the bottleneck.

So please, just stop, and find a more productive way of expending your energy (such as by making PyStructSequence available via the "types" module, since that also allows for C level micro-optimizations when *used*, not just at definition time).
msg298486 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2017-07-17 12:19
Nick, can you stop closing an issue where the discussion hasn't been settled?  This isn't civil.
msg298487 - (view) Author: Nick Coghlan (ncoghlan) * (Python committer) Date: 2017-07-17 12:21
There's a path for escalation when you disagree with the decision of a module/API maintainer (in this case, Raymond): bringing the issue closure up on python-dev for wider discussion.

It *isn't* repeatedly reopening the issue after they have already made their decision and attempting to pester them into changing their mind.
msg298488 - (view) Author: Nick Coghlan (ncoghlan) * (Python committer) Date: 2017-07-17 12:22
So unless and until he gets overruled by Guido, Raymond's decision to reject the proposed change stands.
msg298489 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2017-07-17 12:23
Just because I disagree with you doesn't mean I'm pestering anyone.  Can you stop being so obnoxious?
msg298490 - (view) Author: Nick Coghlan (ncoghlan) * (Python committer) Date: 2017-07-17 12:28
Check the issue history - the issue has been rejected by Raymond, and then reopened for further debate by other core developers multiple times.

That's not a reasonable approach to requesting reconsideration of a module/API maintainers design decision.

I acknowledge that those earlier reopenings weren't by you, but the issue should still remain closed until *Raymond* agrees to reconsider it (and given the alternative option of instead making the lower overhead PyStructSequence visible at the Python level, I'd be surprised if he does).
msg298491 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2017-07-17 12:33
Sorry, I don't have much data at this point, but it's not the first time that I noticed that namedtuple is super slow. We have much more efficient code like structseq in C. Why not reusing it at least in our stdlib modules?

About the _source attribute, honestly, I'm not aware of anyone using it. I don't think that the fact that a *private* attribute is document should prevent it to make Python faster.

I already noticed the _source attribute when I studied the Python memory usage. See my old isuse #19640: "Drop _source attribute of namedtuple (waste memory)", I later changed the title to "Dynamically generate the _source attribute of namedtuple to save memory)".

About "Python startup time doesn't matter", this is just plain wrong. Multiple core developers spent a lot of time on optimizing exactly that. Tell me if you really need a long rationale to work on that.

While I'm not sure about Naoki's exact optimization, I agree about the issue title: "Optimize namedtuple creation", and I like the idea of keeping the issue open to find a solution.
msg298493 - (view) Author: Nick Coghlan (ncoghlan) * (Python committer) Date: 2017-07-17 12:42
Yes, I'm saying you need a really long justification to explain why you want to break backwards compatibility solely for a speed increase.

For namedtuple instances, the leading underscore does *NOT* indicate a private attribute - it's just there to avoid colliding with field names.

Speed isn't everything, and it certainly isn't adequate justification for breaking public APIs that have been around for years.

Now, you can either escalate that argument to python-dev, and try to convince Guido to overrule Raymond on this point, *or* you can look at working out a Python level API to dynamically define PyStructSequence subclasses. That won't be entirely straightforward (as my recollection is that structseq is designed to build on static C structs), but if you're successful, it will give you something that should be faster than namedtuple in every way, not just at definition time.
msg298499 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2017-07-17 13:01
Benchmark comparing collections.namedtuple to structseq, to get an attribute:

* Getting an attribute by name (obj.attr):
  Mean +- std dev: [name_structseq] 24.1 ns +- 0.5 ns -> [name_namedtuple] 45.7 ns +- 1.9 ns: 1.90x slower (+90%)
* Getting an attribute by its integer index (obj[0]):
  (not significant)

So structseq is 1.9x faster than namedtuple to get an attribute by name.


haypo@speed-python$ ./bin/python3  -m perf timeit -s "from collections import namedtuple; Point=namedtuple('Point', 'x y'); p=Point(1,2)" "p.x" --duplicate=1024 -o name_namedtuple.json
Mean +- std dev: 45.7 ns +- 1.9 ns
haypo@speed-python$ ./bin/python3  -m perf timeit -s "from collections import namedtuple; Point=namedtuple('Point', 'x y'); p=Point(1,2)" "p[0]" --duplicate=1024 -o int_namedtuple.json
Mean +- std dev: 17.6 ns +- 0.0 ns


haypo@speed-python$ ./bin/python3  -m perf timeit -s "from sys import flags" "flags.debug" --duplicate=1024 -o name_structseq.json
Mean +- std dev: 24.1 ns +- 0.5 ns
haypo@speed-python$ ./bin/python3  -m perf timeit -s "from sys import flags" "flags[0]" --duplicate=1024 -o int_structseq.json
Mean +- std dev: 17.6 ns +- 0.2 ns

---

Getting an attribute by its integer index is as fast as tuple:

haypo@speed-python$ ./bin/python3  -m perf timeit --inherit=PYTHONPATH -s "p=(1,2)" "p[0]" --duplicate=1024 -o int_tuple.json
.....................
Mean +- std dev: 17.6 ns +- 0.0 ns
msg298500 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2017-07-17 13:04
> So structseq is 1.9x faster than namedtuple to get an attribute by name.

Oops, I wrote it backward: So namedtuple is 1.9x slower than structseq to get an attribute by name.

(1.9x slower doesn't mean 1.9x faster, sorry.)
msg298503 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2017-07-17 13:06
> Speed isn't everything, and it certainly isn't adequate justification for breaking public APIs that have been around for years.

What about the memory usage?

> See my old issue #19640 (...)

msg203271:

"""
I found this issue while using my tracemalloc module to analyze the memory consumption of Python. On the Python test suite, the _source attribute is the 5th line allocating the most memory:

/usr/lib/python3.4/collections/__init__.py: 676.2 kB
"""
msg298514 - (view) Author: INADA Naoki (inada.naoki) * (Python committer) Date: 2017-07-17 13:30
I respect Raymond's rejection.  But I want to write down why I like Jelle's approach.

Currently, functools is the only module which is very popular.
But leaving this means every new namedtuple makes startup time about 0.6ms slower.

This is also problem for applications heavily depending on namedtuple.
Creating namedtuple is more than 15 times slower than normal class. It's not predictable or reasonable overhead.
It's not once I profiled application startup time and found namedtuple
account non-negligible percentage.

It's possible to keep `_source` with Jelle's approach. `_source` can be equivalent source rather than exact source eval()ed.
I admit it's not ideal. But all namedtuple user
and all Python implementation can benefit from it.

It's possible to expose StructSeq somewhere.  It can make it faster to
import `functools`.
But it's ugly too that applications and libraries tries it first
and falls back to namedtuple.
And when it is used widely, other Python implementations will be forced
to implement it.

That's why I'm willing collections.namedtuple overhead is reasonable and predictable.
msg298515 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2017-07-17 13:33
> It's possible to expose StructSeq somewhere.

Hum, when I mentioned structseq: my idea was more to reimplement
namedtuple using the existing structseq code, since structseq is well
tested and very fast.
msg298566 - (view) Author: Guido van Rossum (gvanrossum) * (Python committer) Date: 2017-07-17 23:44
On python-dev Raymond agreed to reopen the issue and consider Jelle's implementation (https://github.com/python/cpython/pull/2736).
msg298570 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2017-07-18 03:09
Re-opening per discussion on python-dev.

Goals:

* Extend Jelle's patch to incorporate lazy support for "_source" and "verbose" so that the API is unchanged from the user's point of view.

* Make sure the current test suite still passes and that the current docs remain valid.

* Get better measurements of benefits so we know what is actually being achieved.

* Test to see if there are new positive benefits for PyPy and Jython as well.
msg298571 - (view) Author: Jelle Zijlstra (Jelle Zijlstra) * Date: 2017-07-18 03:35
Should we consider a C-based implementation like https://github.com/llllllllll/cnamedtuple? It could improve speed even more, but would be harder to maintain and test and harder to keep compatible. My sense is that it's not worth it unless benchmarks show a really dramatic difference.

As for Raymond's list of goals, my PR now preserves _source and verbose=True and the test suite passes. I think the only docs change needed is in the description for _source (https://docs.python.org/3/library/collections.html#collections.somenamedtuple._source), which is no longer "used to create the named tuple class". I'll add that to my PR. I haven't done anything towards the last two goals yet.

Should the change be applied to 3.6? It is fully backwards compatible, but perhaps the change is too disruptive to be included in the 3.6 series at this point.
msg298574 - (view) Author: Guido van Rossum (gvanrossum) * (Python committer) Date: 2017-07-18 04:35
Thanks Raymond and Jelle.

The bar for a reimplementation in C is much higher (though we'll have to agree that Jelle's version is fast enough before we reject it).

The bar for backporting this to 3.6 is much higher as well and I think it's not worth disturbing the peace (people depend on the craziest things staying the same between bugfix releases, but for feature releases they have reasons to do thorough testing).
msg298581 - (view) Author: Christoph Reiter (lazka) Date: 2017-07-18 10:53
Why not just do the following:

>>> from collections import namedtuple
>>> Point = namedtuple('Point', ['x', 'y'])
>>> Point._source
"from collections import namedtuple\nPoint = namedtuple('Point', ['x', 'y'])\n"
>>> 

The docs make it seems as if the primary use case of the _source attribute is
to serialize the definition. Returning a source which produces a class with
different performance/memory characteristics goes against that.
msg298601 - (view) Author: Giampaolo Rodola' (giampaolo.rodola) * (Python committer) Date: 2017-07-18 15:52
> Should we consider a C-based implementation like https://github.com/llllllllll/cnamedtuple? 
> It could improve speed even more, but would be harder to maintain and
> test and harder to keep compatible. My sense is that it's not worth
> it unless benchmarks show a really dramatic difference.

I've just filed a ticket for this: https://github.com/llllllllll/cnamedtuple/issues/7
msg298630 - (view) Author: Joe Jevnik (llllllllll) * Date: 2017-07-19 04:41
I added a benchmark suite (using Victor's perf utility) to cnamedtuple. The results are here: https://github.com/llllllllll/cnamedtuple#benchmarks

To summarize: type creation is much faster; instance creation and named attribute access are a bit faster.
msg298631 - (view) Author: Jelle Zijlstra (Jelle Zijlstra) * Date: 2017-07-19 04:42
I benchmarked some common namedtuple operations with the following script:

#!/bin/bash
echo 'namedtuple creation'
./python -m timeit -s 'from collections import namedtuple' 'x = namedtuple("x", ["a", "b", "c"])'

echo 'namedtuple instantiation'
./python -m timeit -s 'from collections import namedtuple; x = namedtuple("x", ["a", "b", "c"])' 'x(1, 2, 3)'

echo 'namedtuple attribute access'
./python -m timeit -s 'from collections import namedtuple; x = namedtuple("x", ["a", "b", "c"]); i = x(1, 2, 3)' 'i.a'

echo 'namedtuple _make'
./python -m timeit -s 'from collections import namedtuple; x = namedtuple("x", ["a", "b", "c"])' 'x._make((1, 2, 3))'


--------------------------------------
With my patch as it stands now I get:

$ ./ntbenchmark.sh 
namedtuple creation
2000 loops, best of 5: 101 usec per loop
namedtuple instantiation
500000 loops, best of 5: 477 nsec per loop
namedtuple attribute access
5000000 loops, best of 5: 59.9 nsec per loop
namedtuple _make
500000 loops, best of 5: 430 nsec per loop


--------------------------------------
With unpatched CPython master I get:

$ ./ntbenchmark.sh 
namedtuple creation
500 loops, best of 5: 409 usec per loop
namedtuple instantiation
500000 loops, best of 5: 476 nsec per loop
namedtuple attribute access
5000000 loops, best of 5: 60 nsec per loop
namedtuple _make
1000000 loops, best of 5: 389 nsec per loop


So creating a class is about 4x faster (similar to the benchmarks various other people have run) and calling _make() is 10% slower. That's probably because of the line "if len(result) != cls._num_fields:" in my implementation, which would have been something like "if len(result) != 3" in the exec-based implementation.

I also cProfiled class creation with my patch. These are results for creating 10000 3-element namedtuple classes:

         390005 function calls in 2.793 seconds

   Ordered by: cumulative time

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
    10000    0.053    0.000    2.826    0.000 <ipython-input-5-c37fa4922f0a>:1(make_nt)
    10000    1.099    0.000    2.773    0.000 /home/jelle/qython/cpython/Lib/collections/__init__.py:380(namedtuple)
    10000    0.948    0.000    0.981    0.000 {built-in method builtins.exec}
   100000    0.316    0.000    0.316    0.000 {method 'format' of 'str' objects}
    10000    0.069    0.000    0.220    0.000 {method 'join' of 'str' objects}
    40000    0.071    0.000    0.152    0.000 /home/jelle/qython/cpython/Lib/collections/__init__.py:439(<genexpr>)
    10000    0.044    0.000    0.044    0.000 {built-in method builtins.repr}
    30000    0.033    0.000    0.033    0.000 {method 'startswith' of 'str' objects}
    40000    0.031    0.000    0.031    0.000 {method 'isidentifier' of 'str' objects}
    40000    0.025    0.000    0.025    0.000 {method '__contains__' of 'frozenset' objects}
    10000    0.022    0.000    0.022    0.000 {method 'replace' of 'str' objects}
    10000    0.022    0.000    0.022    0.000 {built-in method sys._getframe}
    30000    0.020    0.000    0.020    0.000 {method 'add' of 'set' objects}
    20000    0.018    0.000    0.018    0.000 {built-in method builtins.len}
    10000    0.013    0.000    0.013    0.000 {built-in method builtins.isinstance}
    10000    0.009    0.000    0.009    0.000 {method 'get' of 'dict' objects}

So about 35% of time is still spent in the exec() call to create __new__. Another 10% is in .format() calls, so using f-strings instead of .format() might also be worth it.
msg298637 - (view) Author: Jelle Zijlstra (Jelle Zijlstra) * Date: 2017-07-19 05:39
Thanks Joe! I adapted your benchmark suite to also run my implementation. See https://github.com/JelleZijlstra/cnamedtuple/commit/61b6fbf4de37f8131ab43c619593327004974e52 for the code and results. The results are consistent with what we've seen before.

Joe's cnamedtuple is about 40x faster for class creation than the current implementation, and my PR only speeds class creation up by 4x. That difference is big enough that I think we should seriously consider using the C implementation.
msg298641 - (view) Author: INADA Naoki (inada.naoki) * (Python committer) Date: 2017-07-19 08:00
I want to focus on pure Python implementation in this issue.

While "40x faster" is more 10x faster than "4x faster", C implementation
can boost only CPython and makes maintenance more harder.

And sometimes "more 10x faster" is not so important.
For example, say application startup takes 1sec and namedtuple
creation took 0.4sec of the 1sec:

  4x faster: 1sec -> 0.7sec  (-30%)
 40x faster: 1sec -> 0.61sec (-39%)

In this case, "4x faster" reduces 0.3sec and "more 10x faster" reduces
only 0.09sec.

Of course, 1.9x faster attribute access (http://bugs.python.org/issue28638#msg298499) is attractive.
But this issue is too long already.
msg298648 - (view) Author: Giampaolo Rodola' (giampaolo.rodola) * (Python committer) Date: 2017-07-19 09:20
> While "40x faster" is more 10x faster than "4x faster", C 
> implementation can boost only CPython and makes maintenance more harder.

As a counter argument against "let's not do it because it'll be harder to maintain" I'd like to point out that namedtuple API is already kind of over engineered (see: "verbose", "rename", "module" and "_source") and as such it seems likely it will remain pretty much the same in the future. So why not treat namedtuple like any other basic data structure, boost its internal implementation and simply use the existing unit tests to make sure there are no regressions? It seems the same barrier does not apply to tuples, lists and sets.

> Of course, 1.9x faster attribute access (http://bugs.python.org/issue28638#msg298499) is attractive.

It is indeed and it makes a huge difference in situations like busy loops. E.g. in case of asyncio 1.9x faster literally means being able to serve twice the number of reqs/sec:
https://github.com/python/cpython/blob/3e2ad8ec61a322370a6fbdfb2209cf74546f5e08/Lib/asyncio/selector_events.py#L523
msg298653 - (view) Author: INADA Naoki (inada.naoki) * (Python committer) Date: 2017-07-19 09:52
I didn't say "let's not do it".
I just want to focus on pure Python implementation at this issue,
because this thread is too long already.
Feel free to open new issue about C implementation.

Even if C implementation is added later, pure Python optimization
can boost PyPy performance. (https://github.com/python/cpython/pull/2736#issuecomment-316014866)
msg298670 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2017-07-19 11:34
General note about this issue: while the issie title is "Optimize namedtuple creation", it would be *nice* to not only optimization the creation but also attribute access by name:
http://bugs.python.org/issue28638#msg298499

Maybe we can have a very fast C implementation using structseq, and a fast Python implementation (faster than the current Python implementation) fallback for non-CPython.
msg298681 - (view) Author: Guido van Rossum (gvanrossum) * (Python committer) Date: 2017-07-19 15:31
Yeah, it looks like the standard `_pickle` and `pickle` solution would work
here.
msg298730 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2017-07-20 15:33
>  it would be *nice* to not only optimization the creation 
> but also attribute access by name

FWIW, once the property/itemgetter pair are instantiated in the NT class, the actual lookup runs through them at C speed (no pure python steps).  There is not much fluff here.
msg301700 - (view) Author: Josh Rosenberg (josh.r) * Date: 2017-09-08 16:55
Side-note: Some of the objections to a C level namedtuple implementation appear to be based on the maintenance hurdle, and other have noted that a structseq-based namedtuple might be an option. I have previously attempted to write a C replacement for namedtuple that dynamically created a StructSequence. I ran into a roadblock due to PyStructSequence_NewType (the API that exists to allow creation of runtime defined structseq) being completely broken (#28709).

If the struct sequence API was fixed, it should be a *lot* easier to implement a C level namedtuple with minimal work, removing (some) of the maintenance objections by simply reducing the amount of custom code involved.

The testnewtype.c code attached to #28709 (that demonstrates the bug) is 66 lines of code, and implements a basic C level namedtuple creator function (full support omitted for brevity, but aside from _source, most of it would be easy). I'd expect a finished version to be low three digit lines of custom code, a third or less of what the cnamedtuple project needed to write the whole thing from scratch.
msg301804 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2017-09-10 07:46
Microbenchmark for caching docstrings:

$ ./python -m perf timeit -s "from collections import namedtuple; names = ['field%d' % i for i in range(1000)]" -- "namedtuple('A', names)"

With sys.intern(): Mean +- std dev: 3.57 ms +- 0.05 ms
With Python-level caching: Mean +- std dev: 3.25 ms +- 0.05 ms
msg301819 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2017-09-10 17:23
New changeset 8b57d7363916869357848e666d03fa7614c47897 by Raymond Hettinger in branch 'master':
bpo-28638: Optimize namedtuple() creation time by minimizing use of exec() (#3454)
https://github.com/python/cpython/commit/8b57d7363916869357848e666d03fa7614c47897
History
Date User Action Args
2017-09-10 17:25:13rhettingersetstatus: open -> closed
resolution: fixed
stage: patch review -> resolved
2017-09-10 17:23:38rhettingersetmessages: + msg301819
2017-09-10 07:46:18serhiy.storchakasetmessages: + msg301804
2017-09-08 16:55:52josh.rsetnosy: + josh.r
messages: + msg301700
2017-09-08 07:16:32rhettingersetpull_requests: + pull_request3451
2017-08-27 02:15:42Ethan Smithsetnosy: + Ethan Smith
2017-07-20 15:33:23rhettingersetmessages: + msg298730
2017-07-19 15:31:34gvanrossumsetmessages: + msg298681
2017-07-19 11:34:15vstinnersetmessages: + msg298670
2017-07-19 09:52:07inada.naokisetmessages: + msg298653
2017-07-19 09:20:15giampaolo.rodolasetmessages: + msg298648
2017-07-19 08:00:23inada.naokisetmessages: + msg298641
2017-07-19 05:39:47Jelle Zijlstrasetmessages: + msg298637
2017-07-19 04:42:27Jelle Zijlstrasetmessages: + msg298631
2017-07-19 04:41:07llllllllllsetnosy: + llllllllll
messages: + msg298630
2017-07-18 15:52:19giampaolo.rodolasetmessages: + msg298601
2017-07-18 14:16:17pitrousetstage: resolved -> patch review
2017-07-18 10:53:41lazkasetnosy: + lazka
messages: + msg298581
2017-07-18 04:35:03gvanrossumsetmessages: + msg298574
2017-07-18 03:36:12Jelle Zijlstrasetresolution: rejected -> (no value)
2017-07-18 03:35:55Jelle Zijlstrasetresolution: rejected
messages: + msg298571
2017-07-18 03:09:06rhettingersetstatus: closed -> open
resolution: rejected -> (no value)
messages: + msg298570
2017-07-17 23:44:01gvanrossumsetnosy: + gvanrossum
messages: + msg298566
2017-07-17 13:33:49vstinnersetmessages: + msg298515
2017-07-17 13:31:51giampaolo.rodolasetnosy: + giampaolo.rodola
2017-07-17 13:30:28inada.naokisetmessages: + msg298514
2017-07-17 13:06:02vstinnersetmessages: + msg298503
2017-07-17 13:04:28vstinnersetmessages: + msg298500
2017-07-17 13:01:03vstinnersetmessages: + msg298499
2017-07-17 12:42:48ncoghlansetmessages: + msg298493
2017-07-17 12:33:51vstinnersetnosy: + vstinner
messages: + msg298491
2017-07-17 12:28:41ncoghlansetmessages: + msg298490
2017-07-17 12:23:04pitrousetmessages: + msg298489
2017-07-17 12:22:42ncoghlansetstatus: open -> closed
resolution: rejected
messages: + msg298488

stage: resolved
2017-07-17 12:21:40ncoghlansetmessages: + msg298487
2017-07-17 12:19:03pitrousetstatus: closed -> open
resolution: rejected -> (no value)
messages: + msg298486

stage: resolved -> (no value)
2017-07-17 12:14:45ncoghlansetstatus: open -> closed
resolution: rejected
messages: + msg298485

stage: resolved
2017-07-17 11:41:39pitrousetstatus: closed -> open

nosy: + pitrou
messages: + msg298482

resolution: rejected -> (no value)
stage: resolved -> (no value)
2017-07-17 06:52:06rhettingersetstatus: open -> closed
resolution: rejected
stage: resolved
2017-07-17 05:21:39ncoghlansetmessages: + msg298457
2017-07-17 00:54:08rhettingersetmessages: + msg298453
2017-07-16 22:10:56Jelle Zijlstrasetpull_requests: + pull_request2796
2017-07-16 17:52:53inada.naokisetstatus: closed -> open
resolution: rejected -> (no value)
messages: + msg298444

title: Creating namedtuple is too slow to be used in common stdlib (e.g. functools) -> Optimize namedtuple creation
2017-07-15 17:58:49Jelle Zijlstrasetnosy: + Jelle Zijlstra
messages: + msg298400
2017-01-17 06:38:37rhettingersetstatus: open -> closed
resolution: rejected
messages: + msg285615
2016-12-05 10:48:19inada.naokisetmessages: + msg282412
2016-12-03 10:37:53serhiy.storchakasetmessages: + msg282279
2016-12-03 10:04:48inada.naokisetfiles: + namedtuple-clinic3.patch

messages: + msg282278
2016-12-01 14:30:38ncoghlansetmessages: + msg282182
2016-12-01 14:09:42serhiy.storchakasetfiles: + functools-CacheInfo-Makefile.patch

messages: + msg282178
2016-12-01 13:04:36inada.naokisetstatus: closed -> open
resolution: rejected -> (no value)
messages: + msg282172
2016-11-21 13:57:26ncoghlansetmessages: + msg281356
2016-11-21 09:35:02inada.naokisetfiles: + namedtuple-clinic2.diff

messages: + msg281339
2016-11-21 09:10:59inada.naokisetfiles: + namedtuple-clinic.diff

messages: + msg281336
2016-11-11 05:51:22rhettingersetstatus: open -> closed
resolution: rejected
messages: + msg280561
2016-11-11 04:40:20ncoghlansetmessages: + msg280560
2016-11-10 21:31:05rhettingersetmessages: + msg280543
2016-11-08 22:17:14eric.smithsetfiles: + namedtuple1.py
nosy: + eric.smith
messages: + msg280356

2016-11-08 12:55:03serhiy.storchakasetmessages: + msg280303
2016-11-08 12:47:58inada.naokisetmessages: + msg280300
2016-11-08 12:36:59serhiy.storchakasetfiles: + namedtuple-no-compile.patch

messages: + msg280298
2016-11-08 12:16:08inada.naokisetmessages: + msg280297
2016-11-08 11:17:32serhiy.storchakasetassignee: rhettinger
messages: + msg280291
versions: - Python 3.6
2016-11-08 09:01:30serhiy.storchakasetnosy: + serhiy.storchaka
messages: + msg280288
2016-11-08 07:16:41inada.naokisetmessages: + msg280285
2016-11-08 06:41:02xiang.zhangsetmessages: + msg280284
2016-11-08 06:09:16inada.naokisetmessages: + msg280283
2016-11-08 05:45:50xiang.zhangsetnosy: + rhettinger, ncoghlan
2016-11-08 05:45:07xiang.zhangsetnosy: + xiang.zhang
messages: + msg280282
2016-11-08 04:21:11inada.naokisetfiles: + 28638-functools-no-namedtuple.patch
keywords: + patch
messages: + msg280279
2016-11-08 04:08:39inada.naokisettitle: Creating namedtuple is too slow -> Creating namedtuple is too slow to be used in common stdlib (e.g. functools)
components: + Library (Lib)
versions: + Python 3.6, Python 3.7
2016-11-08 04:07:24inada.naokicreate