Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

enum.Flag should be more set-like #82431

Closed
belm0 mannequin opened this issue Sep 22, 2019 · 19 comments
Closed

enum.Flag should be more set-like #82431

belm0 mannequin opened this issue Sep 22, 2019 · 19 comments
Assignees
Labels
3.10 only security fixes stdlib Python modules in the Lib dir type-feature A feature request or enhancement

Comments

@belm0
Copy link
Mannequin

belm0 mannequin commented Sep 22, 2019

BPO 38250
Nosy @belm0, @ethanfurman, @vedgar, @hroncok, @Zheaoli, @belm0, @hauntsaninja
PRs
  • bpo-32218: make Flag and IntFlag members iterable #22221
  • bpo-38250: minor Flag refactor #22734
  • bpo-38250: [Enum] single-bit flags are canonical #24215
  • bpo-38250: [Enum] only include .rst test if file available #24342
  • bpo-38250: add version added for FlagBoundary #25820
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = 'https://github.com/ethanfurman'
    closed_at = <Date 2021-01-26.22:15:42.201>
    created_at = <Date 2019-09-22.11:14:02.624>
    labels = ['type-feature', 'library', '3.10']
    title = 'enum.Flag should be more set-like'
    updated_at = <Date 2021-05-26.19:32:15.310>
    user = 'https://github.com/belm0'

    bugs.python.org fields:

    activity = <Date 2021-05-26.19:32:15.310>
    actor = 'hroncok'
    assignee = 'ethan.furman'
    closed = True
    closed_date = <Date 2021-01-26.22:15:42.201>
    closer = 'vstinner'
    components = ['Library (Lib)']
    creation = <Date 2019-09-22.11:14:02.624>
    creator = 'John Belmonte'
    dependencies = []
    files = []
    hgrepos = []
    issue_num = 38250
    keywords = ['patch']
    message_count = 19.0
    messages = ['352967', '365051', '365072', '378436', '378439', '378455', '378456', '378461', '378552', '378553', '378555', '378606', '385062', '385672', '385677', '385701', '385735', '385737', '394458']
    nosy_count = 7.0
    nosy_names = ['jbelmonte', 'ethan.furman', 'veky', 'hroncok', 'Manjusaka', 'John Belmonte', 'hauntsaninja']
    pr_nums = ['22221', '22734', '24215', '24342', '25820']
    priority = 'normal'
    resolution = 'fixed'
    stage = 'resolved'
    status = 'closed'
    superseder = None
    type = 'enhancement'
    url = 'https://bugs.python.org/issue38250'
    versions = ['Python 3.10']

    @belm0
    Copy link
    Mannequin Author

    belm0 mannequin commented Sep 22, 2019

    I would like Flag class instances to have more set-like abilities:

    1. iteration, to walk through each set bit of the value
    2. len corresponding to Support "bpo-" in Misc/NEWS #1
    3. subset operator

    I may be told "implement it yourself as instance methods", or that #3 has an idiom (a & b is b). Ideally though, every Flag user should be able to rely on these being implemented consistently.

    When trying to implement #1 without devolving into bit fiddling, naturally one might try to use the class iterator. Unfortunately the semantics of that enumeration include 0, aliases, and compound values. I've used Flag in several situations and projects, and so far there hasn't been a case where that was the desired semantics. Interesting though, if #1 were implemented in the standard library, then we could enumerate all bits of the Flag via iteration of ~MyFlag(0)... though that's obscuring things behind another idiom.

    Thank you for considering.

    @belm0 belm0 mannequin added 3.9 only security fixes stdlib Python modules in the Lib dir type-feature A feature request or enhancement labels Sep 22, 2019
    @ethanfurman ethanfurman self-assigned this Mar 25, 2020
    @vedgar
    Copy link
    Mannequin

    vedgar mannequin commented Mar 26, 2020

    1. +0 _if_ the implementation is easy to explain. If backward compatibility is an issue, we can always add a property:
      for flag in flags.set:
      (though set might imply unorderedness:)
    2. -1. Guido said long ago that all lens should be O(1).
      (Of course, if you do make it O(1), I have no objection.)
    3. +1, absolutely.

    @Zheaoli
    Copy link
    Mannequin

    Zheaoli mannequin commented Mar 26, 2020

    1. not sure I gett the Point

    2. not sure

    3. absolutely yes

    @belm0
    Copy link
    Mannequin Author

    belm0 mannequin commented Oct 11, 2020

    Part of this issue (#1) was intended to be addressed by #22221 which added an __iter__ implementation to Flag and IntFlag. (The PR did not reference this issue, and was already merged last month.)

    However that PR seems problematic on several counts:

    1. __iter__ diverges from the existing __contains__. The latter includes 0 and compound values
    2. the implementation is fairly heavy
    3. len() on an enum instance is going to be O(n)

    I've put post-merge comments on the PR.

    I think it would be safer to have an explicitly named bits() iterator on flag instances, rather than use __iter__().

    @belm0 belm0 mannequin added 3.10 only security fixes and removed 3.9 only security fixes labels Oct 11, 2020
    @vedgar
    Copy link
    Mannequin

    vedgar mannequin commented Oct 11, 2020

    Just a comment, (1) is analogous to str. iter('abc') gives only 'a', 'b' and 'c', while contains accepts '', 'ab', 'bc', and 'abc' too. At least in my mind, it's a pretty strong analogy.

    @belm0
    Copy link
    Mannequin Author

    belm0 mannequin commented Oct 11, 2020

    Just a comment, (1) is analogous to str. iter('abc') gives only 'a', 'b' and 'c', while contains accepts '', 'ab', 'bc', and 'abc' too. At least in my mind, it's a pretty strong analogy.

    I don't agree. The "zero" bit does not exist, so having contains return True on Foo(0) in x is misaligned with the iterator. And having contains return True for specific compound values just because they happen to be explicitly defined, while returning False for others, is arbitrary. contains seems to be of very little use, and moreover a trap for the unwary. Assuming we have to live with that until Python 4, it's better to make an explicit iterator like bits() so that the API doesn't contradict itself.

    @vedgar
    Copy link
    Mannequin

    vedgar mannequin commented Oct 11, 2020

    Of course, if it returns True only on _some_ bits combinations it doesn't make sense. I thought every element of a Boolean span would be _in_ the Foo.

    But about "zero bit", I still think it's perfectly analogous to '' in 'abc'.

    @belm0
    Copy link
    Mannequin Author

    belm0 mannequin commented Oct 11, 2020

    I think #22221 should be reverted (considering the design issue, performance issue, and bugs), and lets have a proper design and review.

    While just reading the code, I found an existing bug in Flag. And the new __iter__ uses the buggy internal function, and so itself has bugs.

    #22221 (comment)

    @belm0
    Copy link
    Mannequin Author

    belm0 mannequin commented Oct 13, 2020

    It's completely undocumented, but today I noticed that Flag.__contains__() is actually a subset operation.

        def __contains__(self, other):
            ...
            return other._value_ & self._value_ == other._value_

    It's an unfortunate departure from the set type, which uses in for membership test and issubset() / <= for subset test.

    For set operations, the Flag individual bits should be considered the members of a set (not Flag compound values, which are themselves equivalent to a set).

    @vedgar
    Copy link
    Mannequin

    vedgar mannequin commented Oct 13, 2020

    Again, I disagree. str used to work like this in Py2.0 (or somewhere around then), only 'x' was in 'xyz', not 'xy'. Then Guido came to his senses. :-)

    This is not set theory, this is mereology. You don't differentiate between a digit and a one-digit number, a char and a one-char string, and in the same way you shouldn't differentiate between a bit and a one-bit flag.

    @belm0
    Copy link
    Mannequin Author

    belm0 mannequin commented Oct 13, 2020

    I agree that a bit and one-bit flag are the same.

    only 'x' was in 'xyz', not 'xy

    I don't understand the comparison, because str 'a in b' tests if 'a' is a subsequence of 'b'. It is not a subset operation ('xz' in 'xyz' is false).

    I can understand the argument that Flag has a subset operator (currently __contains__), and given that one-bit flags can be used freely with the subset operator, there is no reason to add a bit membership operator.

    However, since flag values are arguably a set of enabled bits, I think the use of in for subset is a confusing departure from the set API.

    @vedgar
    Copy link
    Mannequin

    vedgar mannequin commented Oct 14, 2020

    Flag values are _not_ a set of enabled bits.

    At least, it is a weird kind of set where a is the same as {a}. That's why I mentioned mereology... there is no reasonable "membership", just "inclusion". I understand that str is not a perfect analogy, but sets are even more wrong.

    @ethanfurman
    Copy link
    Member

    The code sample:

        class Color(IntFlag):
            BLACK = 0
            RED = 1
            GREEN = 2
            BLUE = 4
            PURPLE = RED | BLUE
            WHITE = RED | GREEN | BLUE

    Here's the summary of the changes:

    • single-bit flags are canonical
    • multi-bit and zero-bit flags are aliases
      + only canonical flags are returned during iteration
        >>> list(Color.WHITE)
        [<Color.RED: 1>, <Color.GREEN: 2>, <Color.BLUE: 4>]
    • negating a flag or flag set returns a new flag/flag set with the
      corresponding positive integer value

      Color.GREEN
      <Color.GREEN: 2>

      ~Color.GREEN
      <Color.PURPLE: 5>

    • names of pseudo-flags are constructed from their members' names

        >>> (Color.RED | Color.GREEN).name
        'RED|GREEN'
    • multi-bit flags, aka aliases, can be returned from operations
        >>> Color.RED | Color.BLUE
        <Color.PURPLE: 5>
    
        >>> Color(7)  # or Color(-1)
        <Color.WHITE: 7>
    • membership / containment checking has changed slightly -- zero valued flags
      are never considered to be contained:
        >>> Color.BLACK in Color.WHITE
        False

    otherwise, if all bits of one flag are in the other flag, True is returned:

        >>> Color.PURPLE in Color.WHITE
        True

    There is a new boundary mechanism that controls how out-of-range / invalid bits are handled: STRICT, CONFORM, EJECT', and KEEP':

    STRICT --> raises an exception when presented with invalid values
    CONFORM --> discards any invalid bits
    EJECT --> lose Flag status and become a normal int with the given value
    KEEP --> keep the extra bits
    - keeps Flag status and extra bits
    - they don't show up in iteration
    - they do show up in repr() and str()

    The default for Flag is STRICT, the default for IntFlag is DISCARD, and the default for _convert_ is KEEP (see ssl.Options for an example of when KEEP is needed).

    @ethanfurman
    Copy link
    Member

    New changeset 7aaeb2a by Ethan Furman in branch 'master':
    bpo-38250: [Enum] single-bit flags are canonical (GH-24215)
    7aaeb2a

    @ethanfurman
    Copy link
    Member

    Thank you to everyone involved. :-)

    To answer the first three points that started this issue:

    1. iteration -> each single-bit flag in the entire flag, or a
      combinations of flags, is returned one at a time -- not the
      empty set, not other multi-bit values

    2. length is implemented -> len(Color.BLUE | Color.RED) == 2

    3. subset is implemented as containment checking:
      Color.BLUE in (Color.RED | Color.BLUE) is True

    @vstinner
    Copy link
    Member

    test_enum fails when Python is installed:

    PPC64LE Fedora Rawhide Clang Installed 3.x:
    https://buildbot.python.org/all/#builders/312/builds/597

    0:01:37 load avg: 8.99 [232/426/1] test_enum failed -- running: test_tokenize (1 min 37 sec), test_unparse (31.7 sec), test_concurrent_futures (37.5 sec)
    Failed to call load_tests:
    Traceback (most recent call last):
      File "/home/buildbot/buildarea/3.x.cstratak-fedora-rawhide-ppc64le.clang-installed/build/target/lib/python3.10/unittest/loader.py", line 130, in loadTestsFromModule
        return load_tests(self, tests, pattern)
      File "/home/buildbot/buildarea/3.x.cstratak-fedora-rawhide-ppc64le.clang-installed/build/target/lib/python3.10/test/test_enum.py", line 20, in load_tests
        tests.addTests(doctest.DocFileSuite(
      File "/home/buildbot/buildarea/3.x.cstratak-fedora-rawhide-ppc64le.clang-installed/build/target/lib/python3.10/doctest.py", line 2511, in DocFileSuite
        suite.addTest(DocFileTest(path, **kw))
      File "/home/buildbot/buildarea/3.x.cstratak-fedora-rawhide-ppc64le.clang-installed/build/target/lib/python3.10/doctest.py", line 2433, in DocFileTest
        doc, path = _load_testfile(path, package, module_relative,
      File "/home/buildbot/buildarea/3.x.cstratak-fedora-rawhide-ppc64le.clang-installed/build/target/lib/python3.10/doctest.py", line 231, in _load_testfile
        file_contents = loader.get_data(filename)
      File "<frozen importlib._bootstrap_external>", line 1023, in get_data
    FileNotFoundError: [Errno 2] No such file or directory: '/home/buildbot/buildarea/3.x.cstratak-fedora-rawhide-ppc64le.clang-installed/build/target/lib/python3.10/test/../../Doc/library/enum.rst'
    
    test test_enum crashed -- Traceback (most recent call last):
      File "/home/buildbot/buildarea/3.x.cstratak-fedora-rawhide-ppc64le.clang-installed/build/target/lib/python3.10/test/libregrtest/runtest.py", line 272, in _runtest_inner
        refleak = _runtest_inner2(ns, test_name)
      File "/home/buildbot/buildarea/3.x.cstratak-fedora-rawhide-ppc64le.clang-installed/build/target/lib/python3.10/test/libregrtest/runtest.py", line 236, in _runtest_inner2
        test_runner()
      File "/home/buildbot/buildarea/3.x.cstratak-fedora-rawhide-ppc64le.clang-installed/build/target/lib/python3.10/test/libregrtest/runtest.py", line 210, in _test_module
        raise Exception("errors while loading tests")
    Exception: errors while loading tests

    Tests are loaded by Lib/test/test_enum.py with:

    def load_tests(loader, tests, ignore):
        tests.addTests(doctest.DocTestSuite(enum))
        tests.addTests(doctest.DocFileSuite(
                '../../Doc/library/enum.rst',
                optionflags=doctest.ELLIPSIS|doctest.NORMALIZE_WHITESPACE,
                ))
        return tests

    @vstinner vstinner reopened this Jan 26, 2021
    @ethanfurman
    Copy link
    Member

    New changeset 01faf45 by Ethan Furman in branch 'master':
    bpo-38250: [Enum] only include .rst test if file available (GH-24342)
    01faf45

    @vstinner
    Copy link
    Member

    PPC64LE Fedora Rawhide Clang Installed 3.x is back to green, thanks for the fix ;-)
    https://buildbot.python.org/all/#/builders/312/builds/600

    @hroncok
    Copy link
    Mannequin

    hroncok mannequin commented May 26, 2021

    I've reported https://bugs.python.org/issue44242 because I believe there is a regression in this change.

    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    3.10 only security fixes stdlib Python modules in the Lib dir type-feature A feature request or enhancement
    Projects
    None yet
    Development

    No branches or pull requests

    2 participants