Message 298631 - Python tracker

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author	JelleZijlstra
Recipients	JelleZijlstra, eric.smith, giampaolo.rodola, gvanrossum, lazka, llllllllll, methane, ncoghlan, pitrou, rhettinger, serhiy.storchaka, vstinner, xiang.zhang
Date	2017-07-19.04:42:25
SpamBayes Score	-1.0
Marked as misclassified	Yes
Message-id	<1500439347.48.0.503234161256.issue28638@psf.upfronthosting.co.za>
In-reply-to

Content
I benchmarked some common namedtuple operations with the following script: #!/bin/bash echo 'namedtuple creation' ./python -m timeit -s 'from collections import namedtuple' 'x = namedtuple("x", ["a", "b", "c"])' echo 'namedtuple instantiation' ./python -m timeit -s 'from collections import namedtuple; x = namedtuple("x", ["a", "b", "c"])' 'x(1, 2, 3)' echo 'namedtuple attribute access' ./python -m timeit -s 'from collections import namedtuple; x = namedtuple("x", ["a", "b", "c"]); i = x(1, 2, 3)' 'i.a' echo 'namedtuple _make' ./python -m timeit -s 'from collections import namedtuple; x = namedtuple("x", ["a", "b", "c"])' 'x._make((1, 2, 3))' -------------------------------------- With my patch as it stands now I get: $ ./ntbenchmark.sh namedtuple creation 2000 loops, best of 5: 101 usec per loop namedtuple instantiation 500000 loops, best of 5: 477 nsec per loop namedtuple attribute access 5000000 loops, best of 5: 59.9 nsec per loop namedtuple _make 500000 loops, best of 5: 430 nsec per loop -------------------------------------- With unpatched CPython master I get: $ ./ntbenchmark.sh namedtuple creation 500 loops, best of 5: 409 usec per loop namedtuple instantiation 500000 loops, best of 5: 476 nsec per loop namedtuple attribute access 5000000 loops, best of 5: 60 nsec per loop namedtuple _make 1000000 loops, best of 5: 389 nsec per loop So creating a class is about 4x faster (similar to the benchmarks various other people have run) and calling _make() is 10% slower. That's probably because of the line "if len(result) != cls._num_fields:" in my implementation, which would have been something like "if len(result) != 3" in the exec-based implementation. I also cProfiled class creation with my patch. These are results for creating 10000 3-element namedtuple classes: 390005 function calls in 2.793 seconds Ordered by: cumulative time ncalls tottime percall cumtime percall filename:lineno(function) 10000 0.053 0.000 2.826 0.000 <ipython-input-5-c37fa4922f0a>:1(make_nt) 10000 1.099 0.000 2.773 0.000 /home/jelle/qython/cpython/Lib/collections/__init__.py:380(namedtuple) 10000 0.948 0.000 0.981 0.000 {built-in method builtins.exec} 100000 0.316 0.000 0.316 0.000 {method 'format' of 'str' objects} 10000 0.069 0.000 0.220 0.000 {method 'join' of 'str' objects} 40000 0.071 0.000 0.152 0.000 /home/jelle/qython/cpython/Lib/collections/__init__.py:439(<genexpr>) 10000 0.044 0.000 0.044 0.000 {built-in method builtins.repr} 30000 0.033 0.000 0.033 0.000 {method 'startswith' of 'str' objects} 40000 0.031 0.000 0.031 0.000 {method 'isidentifier' of 'str' objects} 40000 0.025 0.000 0.025 0.000 {method '__contains__' of 'frozenset' objects} 10000 0.022 0.000 0.022 0.000 {method 'replace' of 'str' objects} 10000 0.022 0.000 0.022 0.000 {built-in method sys._getframe} 30000 0.020 0.000 0.020 0.000 {method 'add' of 'set' objects} 20000 0.018 0.000 0.018 0.000 {built-in method builtins.len} 10000 0.013 0.000 0.013 0.000 {built-in method builtins.isinstance} 10000 0.009 0.000 0.009 0.000 {method 'get' of 'dict' objects} So about 35% of time is still spent in the exec() call to create __new__. Another 10% is in .format() calls, so using f-strings instead of .format() might also be worth it.

I benchmarked some common namedtuple operations with the following script:

#!/bin/bash
echo 'namedtuple creation'
./python -m timeit -s 'from collections import namedtuple' 'x = namedtuple("x", ["a", "b", "c"])'

echo 'namedtuple instantiation'
./python -m timeit -s 'from collections import namedtuple; x = namedtuple("x", ["a", "b", "c"])' 'x(1, 2, 3)'

echo 'namedtuple attribute access'
./python -m timeit -s 'from collections import namedtuple; x = namedtuple("x", ["a", "b", "c"]); i = x(1, 2, 3)' 'i.a'

echo 'namedtuple _make'
./python -m timeit -s 'from collections import namedtuple; x = namedtuple("x", ["a", "b", "c"])' 'x._make((1, 2, 3))'


--------------------------------------
With my patch as it stands now I get:

$ ./ntbenchmark.sh 
namedtuple creation
2000 loops, best of 5: 101 usec per loop
namedtuple instantiation
500000 loops, best of 5: 477 nsec per loop
namedtuple attribute access
5000000 loops, best of 5: 59.9 nsec per loop
namedtuple _make
500000 loops, best of 5: 430 nsec per loop


--------------------------------------
With unpatched CPython master I get:

$ ./ntbenchmark.sh 
namedtuple creation
500 loops, best of 5: 409 usec per loop
namedtuple instantiation
500000 loops, best of 5: 476 nsec per loop
namedtuple attribute access
5000000 loops, best of 5: 60 nsec per loop
namedtuple _make
1000000 loops, best of 5: 389 nsec per loop


So creating a class is about 4x faster (similar to the benchmarks various other people have run) and calling _make() is 10% slower. That's probably because of the line "if len(result) != cls._num_fields:" in my implementation, which would have been something like "if len(result) != 3" in the exec-based implementation.

I also cProfiled class creation with my patch. These are results for creating 10000 3-element namedtuple classes:

         390005 function calls in 2.793 seconds

   Ordered by: cumulative time

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
    10000    0.053    0.000    2.826    0.000 <ipython-input-5-c37fa4922f0a>:1(make_nt)
    10000    1.099    0.000    2.773    0.000 /home/jelle/qython/cpython/Lib/collections/__init__.py:380(namedtuple)
    10000    0.948    0.000    0.981    0.000 {built-in method builtins.exec}
   100000    0.316    0.000    0.316    0.000 {method 'format' of 'str' objects}
    10000    0.069    0.000    0.220    0.000 {method 'join' of 'str' objects}
    40000    0.071    0.000    0.152    0.000 /home/jelle/qython/cpython/Lib/collections/__init__.py:439(<genexpr>)
    10000    0.044    0.000    0.044    0.000 {built-in method builtins.repr}
    30000    0.033    0.000    0.033    0.000 {method 'startswith' of 'str' objects}
    40000    0.031    0.000    0.031    0.000 {method 'isidentifier' of 'str' objects}
    40000    0.025    0.000    0.025    0.000 {method '__contains__' of 'frozenset' objects}
    10000    0.022    0.000    0.022    0.000 {method 'replace' of 'str' objects}
    10000    0.022    0.000    0.022    0.000 {built-in method sys._getframe}
    30000    0.020    0.000    0.020    0.000 {method 'add' of 'set' objects}
    20000    0.018    0.000    0.018    0.000 {built-in method builtins.len}
    10000    0.013    0.000    0.013    0.000 {built-in method builtins.isinstance}
    10000    0.009    0.000    0.009    0.000 {method 'get' of 'dict' objects}

So about 35% of time is still spent in the exec() call to create __new__. Another 10% is in .format() calls, so using f-strings instead of .format() might also be worth it.

History
Date	User	Action	Args
2017-07-19 04:42:27	JelleZijlstra	set	recipients: + JelleZijlstra, gvanrossum, rhettinger, ncoghlan, pitrou, vstinner, eric.smith, giampaolo.rodola, methane, serhiy.storchaka, llllllllll, xiang.zhang, lazka
2017-07-19 04:42:27	JelleZijlstra	set	messageid: <1500439347.48.0.503234161256.issue28638@psf.upfronthosting.co.za>
2017-07-19 04:42:27	JelleZijlstra	link	issue28638 messages
2017-07-19 04:42:25	JelleZijlstra	create