classification
Title: Remove unnecessary tuples, lists, sets, and dicts from Lib
Type: Stage: resolved
Components: Library (Lib) Versions: Python 3.7
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: serhiy.storchaka Nosy List: jdufresne, miss-islington, rhettinger, serhiy.storchaka, steven.daprano, terry.reedy
Priority: normal Keywords: patch

Created on 2017-05-07 13:08 by jdufresne, last changed 2018-02-26 16:25 by serhiy.storchaka. This issue is now closed.

Pull Requests
URL Status Linked Edit
PR 1489 merged jdufresne, 2017-05-07 13:08
PR 2624 merged serhiy.storchaka, 2017-07-07 19:23
PR 5908 merged miss-islington, 2018-02-26 14:51
Messages (16)
msg293190 - (view) Author: Jon Dufresne (jdufresne) * Date: 2017-05-07 13:08
Lib has some patterns that could be easily discovered and cleaned up. Doing so will reduce the number of unnecessary temporary lists in memory and unnecessary function calls. It will also take advantage of Python's own rich features in a way that better dog foods the language. For example:

* Replace list(<generator expression>) with list comprehension
* Replace dict(<generator expression>) with dict comprehension
* Replace set(<generator expression>) with set comprehension
* Replace builtin func(<list comprehension>) with func(<generator expression>) when supported (e.g. any(), all(), tuple(), min(), & max())
msg293193 - (view) Author: Steven D'Aprano (steven.daprano) * (Python committer) Date: 2017-05-07 13:42
Is this unnecessary code churn?

That's not a rhetorical question. Fixing code that isn't broken is not always a good idea.

``func(<list comprehension>)`` is not always identical to ``func(<generator expression>)``, there are situations where there is a significant performance difference between the two. E.g. str.join() is significantly faster when passed a list than when passed a generator expression. According to my tests:

    ''.join(['abc' for x in range(1000000)])

is about 40% faster than

    ''.join('abc' for x in range(1000000))
msg293195 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2017-05-07 13:55
* Some functions that accept an arbitrary iterable first convert it to temporary list and iterate that list. In these cases there is no significant difference in memory usage between list comprehension and generator, but the variant with list comprehension is slightly faster.

* foo(bar(x) for x in a) can be written also as foo(map(bar, a)). Both expressions are idiomatic, different users have different preferences, the variant with map can be faster for larger loops. The variant with map can even be faster than comprehensions.

* Proposed patch changes some files which should be kept compatible with old Python versions. Old Python versions can not support set and dict comprehensions.

* In general we avoid making changes to many files that doesn't have obvious positive effect. Most changes doesn't have measurable effect, they are just style changes. But different users have different preferences and existing code is not obviously bad.

Assigned to Raymond because it usually opposed to such kind of changes but made very similar changes in issue22823.
msg293201 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2017-05-07 16:20
Jon, we usually try to avoid sweeping changes across the library and instead preference to focus on one module at a time (Guido calls this "holistic refactoring").  There is a risk of introducing bugs and of destabiliizing code that has sat untouched but working for a long time.

Also, as Serhiy and Steven have pointed out, there are sometimes subtle reasons to prevent the existing code.  For example, I find the original to be clearer in this example:

 - return list(set(item.name for item in self.list))		 +        
 + return list({item.name for item in self.list})

and the original to be both clearer and faster in this example:

 - return list(map(_convert, node.elts))
 + return [_convert(v) for v in node.elts]

In some cases, using "dict(...)" or "set(...)" is clearer about the intension or more readable because it is parallel to the surrounding code (such as the parallelism of successive line in ast.py).

In some cases such as _pydecimal we have a aversion to changing code that aspires to be backwards compatible across many versions.

That said, there are a few places where it might be worthwhile for making a change.  Converting to set literals is almost always a win:

 -    interesting = set(("include", "lib", "libpath", "path"))		 
 +    interesting = {"include", "lib", "libpath", "path"}

As Serhiy noted, in some places using map() would be an improvement, so we should look at doing a few of those (where there is a win in term of speed, space, and readability):

 - version_info = tuple([int(x) for x in version.split(".")])
 + version_info = tuple(map(int, version.split(".")))

Overall, I think most of these shouldn't be done, but I will mark a few where some change might be considered an improvement.  Steven or Serhiy are both welcome to veto any one of those.  Likewise, if the original author of the code is still active, they have a veto as well.  The status quo usually wins until we're refactoring a single module and make improvements while thinking about that module.

You picked a difficult first patch because you picked a topic that we usually avoid (wholesale sweeps through the standard library) and an area where deciding "what is best" involves subtleties with respect to clarity, parallelism with surrounding code, speed, clear intention, back compatibility, and how the original author thinks about the code.
msg293571 - (view) Author: Terry J. Reedy (terry.reedy) * (Python committer) Date: 2017-05-12 20:55
As I said on review, I want idlelib changes either not made or in a separate patch that can be backported as is.  Partial backport/cherry-picks are a major nuisance.  I don't know if anything else should be treated similarly.
msg293588 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2017-05-12 22:39
Yes, please do keep any proposed idlelib changes separate.  Also, I don't any of these to be backported (it is stylistic and not about bugs).
msg293684 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2017-05-15 08:17
Serhiy, the latest version of this PR looks okay to me.  Do you care to veto any of the remaining suggested changes?
msg293923 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2017-05-18 14:35
New changeset 3972628de3d569c88451a2a176a1c94d8822b8a6 by Raymond Hettinger (Jon Dufresne) in branch 'master':
bpo-30296 Remove unnecessary tuples, lists, sets, and dicts (#1489)
https://github.com/python/cpython/commit/3972628de3d569c88451a2a176a1c94d8822b8a6
msg293926 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2017-05-18 15:05
Oh, the PR was merged while I made a review.

tuple([i for i in a]) is 2 times faster than tuple(i for i in a). Since there is no significant difference in readability I would keep the variant with a list comprehension if it is initial.
msg293935 - (view) Author: Terry J. Reedy (terry.reedy) * (Python committer) Date: 2017-05-18 18:31
I made a note of the idlelib.multicall changes for when I review the module to modernize and test.
msg297906 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2017-07-07 19:26
Since some changes looks worse than the old code to me (see my comments on GitHub), I have created PR 2624 which reverts changes with which I disagree or applies alternate changes.
msg297911 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2017-07-07 21:01
Why do you think that adding an unnecessary list comprehension around tuple() call makes them better?

Also, I don't think you should be able to override and revert all the judgments I made when carefully reviewing all the OP's PR.  You are NOT the sole arbiter of what goes into Python.
msg297912 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2017-07-07 21:29
> Why do you think that adding an unnecessary list comprehension around tuple() call makes them better?

I just restore the original code because I don't think the changes make it better.

> Also, I don't think you should be able to override and revert all the judgments I made when carefully reviewing all the OP's PR.  You are NOT the sole arbiter of what goes into Python.

You asked my review and vetoing, and assigned this issue to me. I started reviewing, but when I finished it I found that the PR already is merged.
msg309022 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2017-12-25 00:20
What will you say Raymond?
msg312929 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2018-02-26 14:50
New changeset 3f2e6f15d64d81633b1fc0b308afc0d6e9026b61 by Serhiy Storchaka in branch 'master':
Revert unneccessary changes made in bpo-30296 and apply other improvements. (GH-2624)
https://github.com/python/cpython/commit/3f2e6f15d64d81633b1fc0b308afc0d6e9026b61
msg312936 - (view) Author: miss-islington (miss-islington) Date: 2018-02-26 16:22
New changeset f8a3485dcd41a00c5e99c5be6adc67cb2638b366 by Miss Islington (bot) in branch '3.7':
Revert unneccessary changes made in bpo-30296 and apply other improvements. (GH-2624)
https://github.com/python/cpython/commit/f8a3485dcd41a00c5e99c5be6adc67cb2638b366
History
Date User Action Args
2018-02-26 16:25:24serhiy.storchakasetstatus: open -> closed
resolution: fixed
stage: patch review -> resolved
2018-02-26 16:22:26miss-islingtonsetnosy: + miss-islington
messages: + msg312936
2018-02-26 14:51:49miss-islingtonsetkeywords: + patch
pull_requests: + pull_request5678
2018-02-26 14:50:13serhiy.storchakasetmessages: + msg312929
2017-12-25 00:20:47serhiy.storchakasetmessages: + msg309022
2017-07-07 21:29:55serhiy.storchakasetmessages: + msg297912
2017-07-07 21:01:54rhettingersetmessages: + msg297911
2017-07-07 19:26:10serhiy.storchakasetstatus: closed -> open
resolution: fixed -> (no value)
messages: + msg297906

stage: resolved -> patch review
2017-07-07 19:23:13serhiy.storchakasetpull_requests: + pull_request2690
2017-05-18 18:31:48terry.reedysetmessages: + msg293935
2017-05-18 15:05:49serhiy.storchakasetmessages: + msg293926
2017-05-18 14:36:36rhettingersetstatus: open -> closed
resolution: fixed
stage: resolved
2017-05-18 14:35:58rhettingersetmessages: + msg293923
2017-05-15 08:17:20rhettingersetassignee: rhettinger -> serhiy.storchaka
messages: + msg293684
2017-05-12 22:39:25rhettingersetmessages: + msg293588
2017-05-12 20:55:32terry.reedysetnosy: + terry.reedy
messages: + msg293571
2017-05-07 16:20:17rhettingersetmessages: + msg293201
2017-05-07 13:55:58serhiy.storchakasetassignee: rhettinger

messages: + msg293195
nosy: + serhiy.storchaka, rhettinger
2017-05-07 13:42:56steven.dapranosetnosy: + steven.daprano
messages: + msg293193
2017-05-07 13:08:04jdufresnecreate