Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

index() and count() methods of bytes and bytearray should accept byte ints #56379

Closed
max-alleged mannequin opened this issue May 24, 2011 · 22 comments
Closed

index() and count() methods of bytes and bytearray should accept byte ints #56379

max-alleged mannequin opened this issue May 24, 2011 · 22 comments
Labels
interpreter-core (Objects, Python, Grammar, and Parser dirs) type-bug An unexpected behavior, bug, or error

Comments

@max-alleged
Copy link
Mannequin

max-alleged mannequin commented May 24, 2011

BPO 12170
Nosy @rhettinger, @terryjreedy, @jcea, @pitrou, @vstinner, @ezio-melotti, @merwok, @florentx, @IIIIllllIIIIllllIIIIllllIIIIllllIIIIll, @akheron
Files
  • issue12170.patch
  • issue12170_v2.patch
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = None
    closed_at = <Date 2011-10-20.21:58:46.441>
    created_at = <Date 2011-05-24.19:49:09.812>
    labels = ['interpreter-core', 'type-bug']
    title = 'index() and count() methods of bytes and bytearray should accept byte ints'
    updated_at = <Date 2012-06-26.07:28:04.227>
    user = 'https://bugs.python.org/max-alleged'

    bugs.python.org fields:

    activity = <Date 2012-06-26.07:28:04.227>
    actor = 'python-dev'
    assignee = 'none'
    closed = True
    closed_date = <Date 2011-10-20.21:58:46.441>
    closer = 'pitrou'
    components = ['Interpreter Core']
    creation = <Date 2011-05-24.19:49:09.812>
    creator = 'max-alleged'
    dependencies = []
    files = ['22733', '23465']
    hgrepos = []
    issue_num = 12170
    keywords = ['patch', 'needs review']
    message_count = 22.0
    messages = ['136786', '136787', '136878', '136883', '140903', '141016', '141033', '141216', '141490', '141491', '145797', '145799', '145929', '145931', '146055', '146056', '148232', '148237', '148287', '149722', '149723', '164054']
    nosy_count = 12.0
    nosy_names = ['rhettinger', 'terry.reedy', 'jcea', 'pitrou', 'vstinner', 'ezio.melotti', 'eric.araujo', 'flox', 'xuanji', 'max-alleged', 'python-dev', 'petri.lehtinen']
    pr_nums = []
    priority = 'normal'
    resolution = 'fixed'
    stage = 'resolved'
    status = 'closed'
    superseder = None
    type = 'behavior'
    url = 'https://bugs.python.org/issue12170'
    versions = ['Python 3.3']

    @max-alleged
    Copy link
    Mannequin Author

    max-alleged mannequin commented May 24, 2011

    Bytes objects when indexed provide integers, but do not accept them to many functions, making them inconsistent with other sequences.

    Basic example:
    >>> test = b'012'
    >>> n = test[1]
    >>> n
    49
    >>> n in test
    True
    >>> test.index(n)
    TypeError: expected an object with the buffer interface.

    It is certainly unusual for n to be in the sequence, but not to be able to find it. I would expect the result to be 1. This set of commands with list, strings, tuples, but not bytes objects.

    I suspect, from issue bpo-10616, that all the following functions would be affected:
    "bytes methods: partition, rpartition, find, index, rfind, rindex, count, translate, replace, startswith, endswith"

    It would make more sense to me that instead of only supporting buffer interface objects, they also accept a single integer, and treat it as if it were provided a length-1 bytes object.

    The use case I came across this problem was something like this:

    Given seq1 and seq2, sequences of the same type:
    [seq1.index(x) for x in seq2]

    This works for strings, lists, tuples, but not bytes.

    @max-alleged max-alleged mannequin added the interpreter-core (Objects, Python, Grammar, and Parser dirs) label May 24, 2011
    @max-alleged
    Copy link
    Mannequin Author

    max-alleged mannequin commented May 24, 2011

    "This set of commands with list, strings, tuples, but not bytes objects."
    should read
    "This set of commands works with list, strings, tuples, but not bytes objects."

    @terryjreedy
    Copy link
    Member

    It is certainly unusual for n to be in the sequence, but not to be able to find it.

    Agreed. Doc Lib: 4.6. Sequence Types — str, bytes, bytearray, list, tuple, range says '''
    s.index(i) index of the first occurence of i in s
    s.count(i) total number of occurences of i in s '''
    so everything *in* a bytes should be valid for .index and .count.

    >>> test = b'0120'
    >>> z = b'0'
    >>> zo = ord(z)
    >>> z in test
    True
    >>> zo in test
    True
    >>> test.index(z)
    0
    >>> test.index(zo)
    ...
    TypeError: expected an object with the buffer interface
    >>> test.count(z)
    2
    >>> test.count(zo)
    ...
    TypeError: expected an object with the buffer interface
    # longer subsequences like b'01' also work

    So I think the code for 3.2+ bytes.count() and bytes.index() should do the same branching as the code for bytes.__contains__.

    The other functions you list, including .rindex are not general sequence functions but are string functions defined as taking subsequences as inputs. So they would never be used in generic code like .count and .index can be.

    @terryjreedy terryjreedy changed the title Bytes objects do not accept integers to many functions Bytes.index() and bytes.count() do not accept byte ints May 25, 2011
    @terryjreedy terryjreedy added the type-bug An unexpected behavior, bug, or error label May 25, 2011
    @terryjreedy terryjreedy changed the title Bytes.index() and bytes.count() do not accept byte ints Bytes.index() and bytes.count() should accept byte ints May 25, 2011
    @rhettinger rhettinger self-assigned this May 25, 2011
    @max-alleged
    Copy link
    Mannequin Author

    max-alleged mannequin commented May 25, 2011

    Fair enough.

    I think it would make sense for the string methods to also accept single ints where possible as well:

    For haystack and needles both strings:
    [haystack.find(n) for n in needles]

    For both bytes, it's a bit contortionist:
    [haystack.find(needles[i:i+1]) for i in range(len(needles))]

    One ends up doing a lot of the [i:i+1] bending when using bytes functions.

    @max-alleged max-alleged mannequin added type-bug An unexpected behavior, bug, or error and removed type-bug An unexpected behavior, bug, or error labels May 25, 2011
    @akheron
    Copy link
    Member

    akheron commented Jul 22, 2011

    This affects bytearray as well as bytes.

    What comes to supporting integer argument to str methods, I'm -1 on that. str's "contained items" are strings of length 1.

    @akheron akheron changed the title Bytes.index() and bytes.count() should accept byte ints index() and count() methods of bytes and bytearray should accept byte ints Jul 22, 2011
    @akheron
    Copy link
    Member

    akheron commented Jul 23, 2011

    Attached a patch with the following changes:

    Allow an integer argument in range(0, 256) for the following bytes and
    bytearray methods: count, find, index, rfind, rindex. Initially, only
    count and index were targeted, but as index is implemented in a helper
    function that is also used to implement find, rfind and rindex, these
    functions were affected too.

    The bytes methods were changed to use the new buffer protocol instead
    of the deprecated PyObject_AsCharBuffer, for consistency with the
    bytearray code.

    Tests for all the modified functions were expanded to cover the new
    functionality. While at it, the tests for count, index and rindex were
    also further expanded (to test for slices, for example), as they were
    initially quite minimal.

    A paragraph describing the additional semantics of the five methods
    was added to the documentation.

    The error messages of index and rindex were left untouched
    ("substring not found" and "subsection not found"). In a case where
    the first argument is an integer, the error messages could talk about
    a byte instead of substring/subsection. This would have been a bit
    non-straightforward to implement, so I didn't.

    The docstrings were also left unchanged, as I couldn't find a good
    wording for them. The problem is not that the first argument may now
    be an integer, but as it can now be more than a substring or
    subsection, we might have to specify what a substring or subsection
    really means. And that explanation would be lengthy (because of the buffer protocol, that's not a concept that a regular Python programmer is, or even needs to be, familiar with)...

    And finally, there's one thing that I'm unsure of:

    When an integer out of range(0, 256) is passed as the first argument,
    should we raise a ValueError or a TypeError? Currently, a ValueError
    is raised, but this may be bad for index and rindex, as they raise a
    ValueError when the substring or byte is not found. I made the
    decision to raise a ValueError decision because __contains__ of both
    bytes and bytearray raise a ValueError when passed an integer not in
    range(0, 256).

    @ezio-melotti
    Copy link
    Member

    When an integer out of range(0, 256) is passed as the first argument,
    should we raise a ValueError or a TypeError?

    ValueError = Inappropriate argument value (of correct type).
    TypeError = Inappropriate argument type.

    Currently, a ValueError raised, but this may be bad for index and
    rindex, as they raise a ValueError when the substring or byte is not found.

    Then the users should check if the value is in range(256) before passing it to (r)index.

    @akheron
    Copy link
    Member

    akheron commented Jul 27, 2011

    Ok, so the current raising semantics should be good.

    @rhettinger
    Copy link
    Contributor

    See also bpo-12631 regarding the remove() method for bytearray.

    @akheron
    Copy link
    Member

    akheron commented Aug 1, 2011

    See also bpo-12631 regarding the remove() method for bytearray.

    AFAICS, it's about bytearray.remove() working but bytearray.index() not working as documented, and that's why I marked is as a duplicate of this issue.

    @pitrou
    Copy link
    Member

    pitrou commented Oct 18, 2011

    I made the
    decision to raise a ValueError decision because __contains__ of both
    bytes and bytearray raise a ValueError when passed an integer not in
    range(0, 256).

    That sounds reasonable. OverflowError would have been another choice, but I agree that consistency with __contains__ is sensible.

    @pitrou
    Copy link
    Member

    pitrou commented Oct 18, 2011

    Doc/library/stdtypes.rst needs a "versionadded" tag for the additional semantics.

    Also, the patch doesn't compile fine on current default:

    In file included from Objects/unicodeobject.c:487:0:
    Objects/stringlib/find.h: In function ‘stringlib_parse_args_finds_byte’:
    Objects/stringlib/find.h:158:5: attention : implicit declaration of function ‘stringlib_parse_args_finds’
    In file included from Objects/unicodeobject.c:497:0:
    Objects/stringlib/find.h: Hors de toute fonction :
    Objects/stringlib/find.h:151:1: erreur: redefinition of ‘stringlib_parse_args_finds_byte’
    Objects/stringlib/find.h:151:1: note: previous definition of ‘stringlib_parse_args_finds_byte’ was here

    I'd say you need to either define your function as STRINGLIB(parse_args_finds_byte) (to avoid name collisions), or avoid defining it if STRINGLIB_IS_UNICODE.

    @akheron
    Copy link
    Member

    akheron commented Oct 19, 2011

    Thanks for the review, Antoine. Attached an updated the patch:

    • The function definition now uses STRINGLIB(...) and the function is only defined when !STRINGLIB_IS_UNICODE (so I took both approaches)

    • Added a "versionadded:: 3.3" to the documentation.

    @akheron
    Copy link
    Member

    akheron commented Oct 19, 2011

    Fixed a minor inconsistency.

    @python-dev
    Copy link
    Mannequin

    python-dev mannequin commented Oct 20, 2011

    New changeset c1effa2cdd20 by Antoine Pitrou in branch 'default':
    Issue bpo-12170: The count(), find(), rfind(), index() and rindex() methods
    http://hg.python.org/cpython/rev/c1effa2cdd20

    @pitrou
    Copy link
    Member

    pitrou commented Oct 20, 2011

    Patch committed, thank you!

    @akheron
    Copy link
    Member

    akheron commented Nov 24, 2011

    Just a thought: Would this change be worthy for the "What's new in 3.3" list?

    @pitrou
    Copy link
    Member

    pitrou commented Nov 24, 2011

    Just a thought: Would this change be worthy for the "What's new in 3.3"
    list?

    I think so.

    @python-dev
    Copy link
    Mannequin

    python-dev mannequin commented Nov 24, 2011

    New changeset 736b0aec412b by Petri Lehtinen in branch 'default':
    Add a "What's New" entry for bpo-12170
    http://hg.python.org/cpython/rev/736b0aec412b

    @python-dev
    Copy link
    Mannequin

    python-dev mannequin commented Dec 18, 2011

    New changeset 75648db1b3f3 by Victor Stinner in branch 'default':
    Issue bpo-13623: Fix a performance regression introduced by issue bpo-12170 in
    http://hg.python.org/cpython/rev/75648db1b3f3

    @vstinner
    Copy link
    Member

    New changeset 75648db1b3f3 by Victor Stinner in branch 'default':
    http://hg.python.org/cpython/rev/75648db1b3f3
    Issue bpo-13623: Fix a performance regression introduced by issue bpo-12170 in bytes.find() and handle correctly OverflowError (raise the same ValueError than the error for -1).

    @python-dev
    Copy link
    Mannequin

    python-dev mannequin commented Jun 26, 2012

    New changeset 018fe1dee9b3 by Petri Lehtinen in branch 'default':
    What's new: Add myself as the contributor of bpo-12170
    http://hg.python.org/cpython/rev/018fe1dee9b3

    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    interpreter-core (Objects, Python, Grammar, and Parser dirs) type-bug An unexpected behavior, bug, or error
    Projects
    None yet
    Development

    No branches or pull requests

    6 participants