Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use set literals instead of creating a set from a list #67012

Closed
rhettinger opened this issue Nov 9, 2014 · 38 comments
Closed

Use set literals instead of creating a set from a list #67012

rhettinger opened this issue Nov 9, 2014 · 38 comments
Assignees
Labels
easy stdlib Python modules in the Lib dir type-feature A feature request or enhancement

Comments

@rhettinger
Copy link
Contributor

BPO 22823
Nosy @rhettinger, @terryjreedy, @vstinner, @larryhastings, @benjaminp, @ezio-melotti, @voidspace, @berkerpeksag, @serhiy-storchaka
Files
  • set_literal.patch: Set literal patch
  • more_set_literals.patch: More set literals
  • set_literal_2.patch
  • set_literal_clinic.patch
  • set_literal_mock.patch
  • issue22823-mock.diff
  • set_literal_2to3.patch
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = 'https://github.com/serhiy-storchaka'
    closed_at = <Date 2014-12-13.19:56:38.138>
    created_at = <Date 2014-11-09.03:04:47.462>
    labels = ['easy', 'type-feature', 'library']
    title = 'Use set literals instead of creating a set from a list'
    updated_at = <Date 2014-12-13.19:56:38.138>
    user = 'https://github.com/rhettinger'

    bugs.python.org fields:

    activity = <Date 2014-12-13.19:56:38.138>
    actor = 'serhiy.storchaka'
    assignee = 'serhiy.storchaka'
    closed = True
    closed_date = <Date 2014-12-13.19:56:38.138>
    closer = 'serhiy.storchaka'
    components = ['Library (Lib)']
    creation = <Date 2014-11-09.03:04:47.462>
    creator = 'rhettinger'
    dependencies = []
    files = ['37159', '37160', '37161', '37174', '37201', '37404', '37413']
    hgrepos = []
    issue_num = 22823
    keywords = ['patch', 'easy']
    message_count = 38.0
    messages = ['230879', '230881', '230898', '230900', '230903', '230904', '230905', '230906', '230910', '230911', '230912', '230914', '230915', '230916', '230917', '230918', '230920', '230921', '231002', '231005', '231009', '231142', '231204', '231206', '231223', '231224', '231230', '232406', '232453', '232462', '232463', '232464', '232465', '232468', '232496', '232580', '232618', '232619']
    nosy_count = 10.0
    nosy_names = ['rhettinger', 'terry.reedy', 'vstinner', 'larry', 'benjamin.peterson', 'ezio.melotti', 'michael.foord', 'python-dev', 'berker.peksag', 'serhiy.storchaka']
    pr_nums = []
    priority = 'normal'
    resolution = 'fixed'
    stage = 'resolved'
    status = 'closed'
    superseder = None
    type = 'enhancement'
    url = 'https://bugs.python.org/issue22823'
    versions = ['Python 3.5']

    @rhettinger
    Copy link
    Contributor Author

    There are many places where the old-style of creating a set from a list still persists. The literal notation is idiomatic, cleaner looking, and faster.

    Here's a typical change:

    diff --git a/Lib/sre_compile.py b/Lib/sre_compile.py
    --- a/Lib/sre_compile.py
    +++ b/Lib/sre_compile.py
    @@ -22,10 +22,10 @@
    else:
    MAXCODE = 0xFFFFFFFF

    -_LITERAL_CODES = set([LITERAL, NOT_LITERAL])
    -_REPEATING_CODES = set([REPEAT, MIN_REPEAT, MAX_REPEAT])
    -_SUCCESS_CODES = set([SUCCESS, FAILURE])
    -_ASSERT_CODES = set([ASSERT, ASSERT_NOT])
    +_LITERAL_CODES = {LITERAL, NOT_LITERAL}
    +_REPEATING_CODES = {REPEAT, MIN_REPEAT, MAX_REPEAT}
    +_SUCCESS_CODES = {SUCCESS, FAILURE}
    +_ASSERT_CODES = {ASSERT, ASSERT_NOT}

    Here are typical timings:

      $ py34 -m timeit '{10, 20, 30}'
       10000000 loops, best of 3: 0.145 usec per loop
    
      $ py34 -m timeit 'set([10, 20, 30])'
      1000000 loops, best of 3: 0.477 usec per loop

    @rhettinger rhettinger added stdlib Python modules in the Lib dir easy type-feature A feature request or enhancement labels Nov 9, 2014
    @rhettinger
    Copy link
    Contributor Author

    Note, to keep the tests stable, nothing in Lib/tests should be changed. Any update should target the rest of Lib and Doc.

    @terryjreedy
    Copy link
    Member

    I will prepare a 3.5 patch for this. There are not many instances other than those you found (but several times as many in tests). I presume that most non-test instances were converted by the 2to3 fixer.

    How about frozenset([...]) to frozenset({...})? There are 4 occurrences of this. The semantic match between frozenset and {...} is better than with [...], but the visual gain in nearly nil.

    I will leave the one idlelib instance in CodeContext for when I am editing the file anyway (for both 3.4 and 3.5), which should be soon.

    @terryjreedy
    Copy link
    Member

    I did not look at Docs yet.

    I could not repeat the timing results on my machine running from the command line, as I got '0.015 usec per loop' for both, and same for both frozenset variations. Running timeit.repeat interactively and selecting the best reproduced your timing ratio: .16 to .42. For frozenset, I get .36 to .42 in favor of changing to frozenset({...}).

    @serhiy-storchaka
    Copy link
    Member

    Isn't such changes considered code churn?

    If it is not, I have a huge patch which makes Python sources to use more modern idioms, including replacing set constructors with set literals (I have counted three occurrences not in tests). Are you interesting to look on it Raymond?

    @rhettinger
    Copy link
    Contributor Author

    [I will prepare a 3.5 patch for this.]

    Thanks, I will review when you're done.

    [How about frozenset([...]) to frozenset({...})? ]

    Yes, the frozenset() examples should change to match the actual repr:

       >>> frozenset([10, 20, 30])
       frozenset({10, 20, 30})

    @rhettinger
    Copy link
    Contributor Author

    [Isn't such changes considered code churn?]

    This sort of thing is always a judgment call. The patch will affect very few lines of code, give a little speed-up, and make the code easier to read. In the case of the docs, it is almost always worthwhile to update to the current, idiomatic form. Also, the set literal case is special because it has built-in language support, possible peephole optimizations, and there was a repr change as well. That said, it is rarely a good idea to change tests because we don't have tests for tests and because the end-user will never see any value.

    On the balance, I think this one is a reasonable thing to do, but I would show a great deal more hesitancy for a "a huge patch which makes Python sources to use more modern idioms."

    @terryjreedy
    Copy link
    Member

    My timing for set((1,2,3)) is .29, faster than for set([1,2,3]) (.42) but still slower than for {1,2,3} (.16). So I will change such instances also.

    The same timing for frozenset((1,2,3)) (.29) is faster than the best timing for frozenset({1,2,3}), (.36), so I will not change that unless discussed and agreed on.

    @rhettinger
    Copy link
    Contributor Author

    The same timing for frozenset((1,2,3)) (.29) is faster than the best
    timing for frozenset({1,2,3}), (.36),

    I don't see the tuple form used anywhere in the code.
    The timing is a bit quicker for the tuple form because the peephole optimizer constant folds the tuple (use dis to see this).

    so I will not change that
    unless discussed and agreed on.

    Maybe, I should just make the patch. It's becoming harder to talk about than to just fix.

    @rhettinger rhettinger self-assigned this Nov 9, 2014
    @terryjreedy
    Copy link
    Member

    Serhiy, about your 'huge patch' to modernize code:

    I am more positive than some because:

    1. To me, a one-time gentile change is not 'churning'.

    2. As we link to many, most, or even all python-coded stdlib modules (I think there is a proposal for 'all'), there is more benefit to using modern idioms.

    On the other hand, 'huge' patches can be too much to discuss, justify, and review all at once.

    Using {.. } for sets consistently is a nice-sized chunk to consider. We can identify, discuss, and decide on each sub-case (I have identified 4 so far). It has the additional benefit of being a performance enhancement.
    ---

    'set((...' is used in distutils (which I will not change) and in many tests. So that is not an issue. 'frozenset((' is used 5 times in regular module code.

    @rhettinger
    Copy link
    Contributor Author

    Attaching a patch. Doesn't change tests for the reasons mentioned above.
    Leaves idle, 2-to-3, and mocking for their respective module maintainers to deal with holistically (as part of their routine maintenance).

    @rhettinger
    Copy link
    Contributor Author

    Okay, I missed the frozenset(( examples in my search. There are all in one-time set-up code. Attaching a patch for them as well.

    @serhiy-storchaka
    Copy link
    Member

    You have missed Parser/asdl.py and Tools/clinic/clinic.py.

    @terryjreedy
    Copy link
    Member

    Serhiy, as I said before, please omit idlelib/CodeContext.

    You both skipped reprlib.py. Should it be changed to produce the standard repr() result? The existing lines:

    F:\Python\dev\35\lib\reprlib.py: 91: return self._repr_iterable(x, level, 'set([', '])', self.maxset)
    F:\Python\dev\35\lib\reprlib.py: 95: return self._repr_iterable(x, level, 'frozenset([', '])',

    If it is, its tests will have to be changed too.

    @rhettinger
    Copy link
    Contributor Author

    Hmm, didn't look at those parts of the tree. I'll change the one-line in Parser and leave the little atrocities in clinic.py for Larry to fix :-)

    Reprlib was skipped intentionally. There is a separate tracker item for it. http://bugs.python.org/issue22824

    @rhettinger
    Copy link
    Contributor Author

    If there are no objections, I would like to apply my two patches (plus the one-line asdl.py change) and leave the rest to the discretion the module maintainers (mock, code context, clinic, and 2-to-3).

    @python-dev
    Copy link
    Mannequin

    python-dev mannequin commented Nov 9, 2014

    New changeset 4480506137ed by Raymond Hettinger in branch 'default':
    Issue bpo-22823: Use set literals instead of creating a set from a list
    https://hg.python.org/cpython/rev/4480506137ed

    @rhettinger
    Copy link
    Contributor Author

    Larry, would you care to apply or approve Serhiy's updates to clinic.py?

    @larryhastings
    Copy link
    Contributor

    Serhiy: set_literal_2.patch doesn't apply cleanly, so I don't get a "review" link. And apparently Raymond checked in some other changes separately. Could you redo your patch so it has the Clinic changes, and ensure I get a "review" link?

    @serhiy-storchaka
    Copy link
    Member

    Here is updated patch for clinic only.

    @larryhastings
    Copy link
    Contributor

    The patch is totally fine. I wonder why it was like that in the first place!

    @rhettinger
    Copy link
    Contributor Author

    Serhiy, go ahead and apply the clinic.py patch. Can you also make a separate mock patch and assign it to Michael Foord for review?

    @python-dev
    Copy link
    Mannequin

    python-dev mannequin commented Nov 15, 2014

    New changeset f4e75efdc7f1 by Serhiy Storchaka in branch 'default':
    Issue bpo-22823: Use set literals instead of creating a set from a tuple.
    https://hg.python.org/cpython/rev/f4e75efdc7f1

    @serhiy-storchaka
    Copy link
    Member

    Can you also make a separate mock patch and assign it to Michael Foord for review?

    Here is a patch. It also replaces constructing sets from generators with set comprehensions.

    @terryjreedy
    Copy link
    Member

    mock patch LGTM

    @rhettinger
    Copy link
    Contributor Author

    IMO, the _non_defaults set comprehension in mock.py ought to be replaced with a set of internable string constants.

    @terryjreedy
    Copy link
    Member

    OK, someone can copy and paste this.

    non_defaults = {
         '__get__', '__set__', '__delete__', '__reversed__', '__missing__',
         '__reduce__', '__reduce_ex'__, '__getinitargs__', '__getnewargs__',
         '__getstate__', '__setstate__', '__getformat__', '__setformat__',
         '__repr__', '__dir__', '__subclasses__', '__format__',
    )

    @berkerpeksag
    Copy link
    Member

    Updated Serhiy's patch.

    @voidspace
    Copy link
    Contributor

    Patch looks good to me.

    @python-dev
    Copy link
    Mannequin

    python-dev mannequin commented Dec 11, 2014

    New changeset b6e6a86a92a7 by Serhiy Storchaka in branch 'default':
    Issue bpo-22823: Use set literals instead of creating a set from a list.
    https://hg.python.org/cpython/rev/b6e6a86a92a7

    New changeset 86a694781bee by Serhiy Storchaka in branch '3.4':
    Issue bpo-22823: Fixed an output of sets in examples.
    https://hg.python.org/cpython/rev/86a694781bee

    @serhiy-storchaka
    Copy link
    Member

    Docs changes were applied to 3.4 too.

    Here is a patch for lib2to3.

    @vstinner
    Copy link
    Member

    Here is a patch for lib2to3.

    In Python 3.5, I still found some "set([" and "frozenset([" in Lib/lib2to3, Lib/test/, Lib/stringrep.py, Lib/unittest/test/ and Lib/idlelib/CodeContext.py if someone is motived to patch them. (Ok, Serhiy wrote a patch for lib2to3.)

    @serhiy-storchaka
    Copy link
    Member

    Tests are intentionally omitted, Lib/stringrep.py is very special case (it's
    code is generated and outdated, see bpo-15239), idlelib is deferred by Terry.
    And there is yet one one-line change to Lib/distutils/msvc9compiler.py in
    set_literal_3.patch.

    @python-dev
    Copy link
    Mannequin

    python-dev mannequin commented Dec 11, 2014

    New changeset ce66b65ad8d6 by Terry Jan Reedy in branch '2.7':
    bpo-22823: Use set literal in idlelib.
    https://hg.python.org/cpython/rev/ce66b65ad8d6

    New changeset daec40891d43 by Terry Jan Reedy in branch '3.4':
    bpo-22823: Use set literal in idlelib.
    https://hg.python.org/cpython/rev/daec40891d43

    @python-dev
    Copy link
    Mannequin

    python-dev mannequin commented Dec 11, 2014

    New changeset 7c2811521261 by Victor Stinner in branch 'default':
    Issue bpo-22823: Fix typo in unittest/mock.py
    https://hg.python.org/cpython/rev/7c2811521261

    @benjaminp
    Copy link
    Contributor

    2to3 patch lgtm. Please apply to 3.4, too, though.

    @python-dev
    Copy link
    Mannequin

    python-dev mannequin commented Dec 13, 2014

    New changeset c3f960cff3e6 by Serhiy Storchaka in branch '3.4':
    Issue bpo-22823: Use set literals in lib2to3.
    https://hg.python.org/cpython/rev/c3f960cff3e6

    New changeset d3e43f7ecca8 by Serhiy Storchaka in branch 'default':
    Issue bpo-22823: Use set literals in lib2to3.
    https://hg.python.org/cpython/rev/d3e43f7ecca8

    @serhiy-storchaka
    Copy link
    Member

    That's all I think. Distutils is too conservative for such changes.

    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    easy stdlib Python modules in the Lib dir type-feature A feature request or enhancement
    Projects
    None yet
    Development

    No branches or pull requests

    8 participants