This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: use map function instead of genexpr in capwords
Type: performance Stage: resolved
Components: Library (Lib) Versions:
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: Nosy List: rhettinger, speedrun-program
Priority: normal Keywords:

Created on 2021-09-16 18:54 by speedrun-program, last changed 2022-04-11 14:59 by admin. This issue is now closed.

Pull Requests
URL Status Linked Edit
PR 28342 merged speedrun-program, 2021-09-16 18:54
Messages (3)
msg401981 - (view) Author: speedrun-program (speedrun-program) * Date: 2021-09-16 18:54
In string.py, the capwords function passes str.join a generator expression, but the map function
could be used instead. This is how capwords is currently written:

--------------------
```py
def capwords(s, sep=None):
    """capwords(s [,sep]) -> string
    
    Split the argument into words using split, capitalize each
    word using capitalize, and join the capitalized words using
    join.  If the optional second argument sep is absent or None,
    runs of whitespace characters are replaced by a single space
    and leading and trailing whitespace are removed, otherwise
    sep is used to split and join the words.
    
    """
    return (sep or ' ').join(x.capitalize() for x in s.split(sep))
```
--------------------

This is how capwords could be written:

--------------------
```py
def capwords(s, sep=None):
    """capwords(s [,sep]) -> string
    
    Split the argument into words using split, capitalize each
    word using capitalize, and join the capitalized words using
    join.  If the optional second argument sep is absent or None,
    runs of whitespace characters are replaced by a single space
    and leading and trailing whitespace are removed, otherwise
    sep is used to split and join the words.
    
    """
    return (sep or ' ').join(map(str.capitalize, s.split(sep)))
```
--------------------

These are the benefits:

1. Faster performance which increases with the number of times the str is split.

2. Very slightly smaller .py and .pyc file sizes.

3. Source code is slightly more concise.

This is the performance test code in ipython:

--------------------
```py
def capwords_current(s, sep=None):
    return (sep or ' ').join(x.capitalize() for x in s.split(sep))
​
def capwords_new(s, sep=None):
    return (sep or ' ').join(map(str.capitalize, s.split(sep)))
​
tests = ["a " * 10**n for n in range(9)]
tests.append("a " * (10**9 // 2)) # I only have 16GB of RAM
```
--------------------

These are the results of a performance test using %timeit in ipython:

--------------------

%timeit x = capwords_current("")
835 ns ± 15.2 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)

%timeit x = capwords_new("")
758 ns ± 35.1 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
- - - - - - - - - - - - - - - - - - - - 
%timeit x = capwords_current(tests[0])
977 ns ± 16.9 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)

%timeit x = capwords_new(tests[0])
822 ns ± 30 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
- - - - - - - - - - - - - - - - - - - - 
%timeit x = capwords_current(tests[1])
3.07 µs ± 88.8 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)

%timeit x = capwords_new(tests[1])
2.17 µs ± 194 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
- - - - - - - - - - - - - - - - - - - - 
%timeit x = capwords_current(tests[2])
28 µs ± 896 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)

%timeit x = capwords_new(tests[2])
19.4 µs ± 352 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
- - - - - - - - - - - - - - - - - - - - 
%timeit x = capwords_current(tests[3])
236 µs ± 14.5 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

%timeit x = capwords_new(tests[3])
153 µs ± 2 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)
- - - - - - - - - - - - - - - - - - - - 
%timeit x = capwords_current(tests[4])
2.12 ms ± 106 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

%timeit x = capwords_new(tests[4])
1.5 ms ± 9.61 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
- - - - - - - - - - - - - - - - - - - - 
%timeit x = capwords_current(tests[5])
23.8 ms ± 1.38 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)

%timeit x = capwords_new(tests[5])
15.6 ms ± 355 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
- - - - - - - - - - - - - - - - - - - - 
%timeit x = capwords_current(tests[6])
271 ms ± 10.6 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

%timeit x = capwords_new(tests[6])
192 ms ± 807 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
- - - - - - - - - - - - - - - - - - - - 
%timeit x = capwords_current(tests[7])
2.66 s ± 14.3 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

%timeit x = capwords_new(tests[7])
1.95 s ± 26.7 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
- - - - - - - - - - - - - - - - - - - - 
%timeit x = capwords_current(tests[8])
25.9 s ± 80.2 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

%timeit x = capwords_new(tests[8])
18.4 s ± 123 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
- - - - - - - - - - - - - - - - - - - - 
%timeit x = capwords_current(tests[9])
6min 17s ± 29 s per loop (mean ± std. dev. of 7 runs, 1 loop each)

%timeit x = capwords_new(tests[9])
5min 36s ± 24.8 s per loop (mean ± std. dev. of 7 runs, 1 loop each)

--------------------
msg401985 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2021-09-16 19:49
New changeset a59ede244714455aa9ee8637608e019a20fa2ca6 by speedrun-program in branch 'main':
bpo-45225: use map function instead of genexpr in capwords (GH-28342)
https://github.com/python/cpython/commit/a59ede244714455aa9ee8637608e019a20fa2ca6
msg401986 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2021-09-16 19:50
Thanks for the PR.
History
Date User Action Args
2022-04-11 14:59:50adminsetgithub: 89388
2021-09-16 19:50:29rhettingersetstatus: open -> closed
resolution: fixed
messages: + msg401986

stage: resolved
2021-09-16 19:49:41rhettingersetnosy: + rhettinger
messages: + msg401985
2021-09-16 18:54:54speedrun-programcreate