Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add "buffer protocol" to glossary #60722

Closed
cjerdonek opened this issue Nov 21, 2012 · 34 comments
Closed

add "buffer protocol" to glossary #60722

cjerdonek opened this issue Nov 21, 2012 · 34 comments
Labels
docs Documentation in the Doc dir type-feature A feature request or enhancement

Comments

@cjerdonek
Copy link
Member

BPO 16518
Nosy @birkenfeld, @rhettinger, @terryjreedy, @pitrou, @ezio-melotti, @merwok, @bitdancer, @skrah, @florentx, @cjerdonek, @serhiy-storchaka
Files
  • issue16518.diff
  • issue16518-2.diff: Patch to use "bytes-like object" in throughout the docs
  • issue16518-3.diff
  • issue16518-4.diff
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = None
    closed_at = <Date 2014-10-05.16:15:14.491>
    created_at = <Date 2012-11-21.06:03:27.806>
    labels = ['type-feature', 'docs']
    title = 'add "buffer protocol" to glossary'
    updated_at = <Date 2014-10-31.19:08:28.740>
    user = 'https://github.com/cjerdonek'

    bugs.python.org fields:

    activity = <Date 2014-10-31.19:08:28.740>
    actor = 'ezio.melotti'
    assignee = 'docs@python'
    closed = True
    closed_date = <Date 2014-10-05.16:15:14.491>
    closer = 'r.david.murray'
    components = ['Documentation']
    creation = <Date 2012-11-21.06:03:27.806>
    creator = 'chris.jerdonek'
    dependencies = []
    files = ['30065', '30089', '30124', '30138']
    hgrepos = []
    issue_num = 16518
    keywords = ['patch']
    message_count = 34.0
    messages = ['176042', '176238', '176242', '176244', '176245', '176247', '176248', '176249', '176251', '176252', '176253', '176254', '176256', '176257', '176262', '176264', '177801', '188078', '188183', '188207', '188208', '188209', '188368', '188369', '188404', '188406', '188453', '188484', '188485', '228585', '228587', '228596', '228694', '230380']
    nosy_count = 13.0
    nosy_names = ['georg.brandl', 'rhettinger', 'terry.reedy', 'pitrou', 'ezio.melotti', 'eric.araujo', 'r.david.murray', 'skrah', 'flox', 'chris.jerdonek', 'docs@python', 'python-dev', 'serhiy.storchaka']
    pr_nums = []
    priority = 'normal'
    resolution = 'fixed'
    stage = 'resolved'
    status = 'closed'
    superseder = None
    type = 'enhancement'
    url = 'https://bugs.python.org/issue16518'
    versions = ['Python 2.7', 'Python 3.3', 'Python 3.4']

    @cjerdonek
    Copy link
    Member Author

    This issue is to add "buffer protocol" (or perhaps "buffer object") to the glossary. The concept is currently described here:

    http://docs.python.org/dev/c-api/buffer.html#buffer-protocol

    Éric initially suggested doing this in the comments to bpo-13538.

    Such a glossary entry would be useful because the buffer protocol (or buffer object) should likely be cited, for example, wherever a function accepts a bytes object, bytearray object, or any object that supports the buffer protocol. The str() constructor is one example where this is done:

    http://hg.python.org/cpython/file/59acd5cac8b5/Doc/library/functions.rst#l1275

    "Buffer object" might be the more useful term to add to the glossary because it would help to have a briefer way of saying "any object that supports the buffer protocol." (I'm assuming this is what "buffer object" actually means.)

    The patch for this issue should also do a comprehensive review of occurrences of buffer object/protocol throughout the docs and add or update links and index entries where appropriate.

    @cjerdonek cjerdonek added docs Documentation in the Doc dir type-feature A feature request or enhancement labels Nov 21, 2012
    @terryjreedy
    Copy link
    Member

    I would use the term that is currently used in some error messages.

    @pitrou
    Copy link
    Member

    pitrou commented Nov 23, 2012

    "Buffer protocol" is the right term. "Buffer object" doesn't mean anything in Python 3 and, furthermore, it might be mixed up with the Python 2 buffer type.

    As for the error messages, they are generally very bad on this topic, so I would vote to change them :-)

    @cjerdonek
    Copy link
    Member Author

    Do we have a recommended (and preferably briefer) way of saying, "any object that supports the buffer protocol"?

    @cjerdonek
    Copy link
    Member Author

    s/any//

    @ezio-melotti
    Copy link
    Member

    "Buffer object" doesn't mean anything in Python 3 and, furthermore,
    it might be mixed up with the Python 2 buffer type.

    Agreed.

    As for the error messages, they are generally very bad on this topic,
    so I would vote to change them :-)

    I would say that they are verbose maybe, but not necessary bad.
    Using "any object that supports the buffer protocol" without explicitly mentioning bytes (and bytearray) might end up being even more confusing (if that's what it's being proposed).

    @pitrou
    Copy link
    Member

    pitrou commented Nov 23, 2012

    Do we have a recommended (and preferably briefer) way of saying, "any
    object that supports the buffer protocol"?

    It depends where. There's no recommended way yet, but I would vote for
    "bytes-like object" in error messages that are targetted at the average
    developer.

    The docs (glossary?) could explain that "bytes-like object" is the same
    as "buffer-providing object" or "object implementing the buffer
    protocol".

    @ezio-melotti
    Copy link
    Member

    I would vote for "bytes-like object"

    Sounds like a good compromise between brevity and clarity to me.

    @skrah
    Copy link
    Mannequin

    skrah mannequin commented Nov 23, 2012

    I wouldn't use "bytes-like object". One can certainly argue that *memoryview*
    should be bytes-like as a matter of preference, but the buffer protocol
    specifies strongly (or even statically) typed multi-dimensional arrays.

    PEP-3118 Py_buffer structs are essentially how NumPy works internally.

    @pitrou
    Copy link
    Member

    pitrou commented Nov 23, 2012

    I wouldn't use "bytes-like object". One can certainly argue that *memoryview*
    should be bytes-like as a matter of preference, but the buffer protocol
    specifies strongly (or even statically) typed multi-dimensional arrays.

    Ach :-(

    PEP-3118 Py_buffer structs are essentially how NumPy works internally.

    Well, we should still write a Python documentation, not a NumPy
    documentation (on this tracker anyway). Outside of NumPy, there's little
    use for multi-dimensional objects.

    @cjerdonek
    Copy link
    Member Author

    I wouldn't use "bytes-like object".

    What about "buffer-like object"?

    @pitrou
    Copy link
    Member

    pitrou commented Nov 23, 2012

    > I wouldn't use "bytes-like object".

    What about "buffer-like object"?

    "buffer-like" means "like a buffer" which is wrong on two points:

    • "buffer" is not defined at this point, so the user doesn't understand
      what it means
    • we are not talking about an object which is "like a buffer", but which
      "provides a buffer"

    @skrah
    Copy link
    Mannequin

    skrah mannequin commented Nov 23, 2012

    Antoine Pitrou <report@bugs.python.org> wrote:

    > PEP-3118 Py_buffer structs are essentially how NumPy works internally.

    Well, we should still write a Python documentation, not a NumPy
    documentation (on this tracker anyway). Outside of NumPy, there's little
    use for multi-dimensional objects.

    Ok, but people should not be surprised if their (Python) array.array() of
    double or their array of ctypes structs is silently accepted by some byte
    consuming function.

    How about "object does not provide a byte buffer" for error messages
    and "(byte) buffer provider" as a shorthand for "any buffer provider
    that exposes its memory as a sequence of unsigned bytes in response
    to a PyBUF_SIMPLE request"?

    @pitrou
    Copy link
    Member

    pitrou commented Nov 23, 2012

    > Well, we should still write a Python documentation, not a NumPy
    > documentation (on this tracker anyway). Outside of NumPy, there's little
    > use for multi-dimensional objects.

    Ok, but people should not be surprised if their (Python) array.array() of
    double or their array of ctypes structs is silently accepted by some byte
    consuming function.

    Probably. My own (humble :-)) opinion is that array.array() is a
    historical artifact, and its use doesn't seem to be warranted in modern
    Python code. ctypes is obviously a very special library, and not for the
    faint of heart.

    How about "object does not provide a byte buffer" for error messages
    and "(byte) buffer provider" as a shorthand for "any buffer provider
    that exposes its memory as a sequence of unsigned bytes in response
    to a PyBUF_SIMPLE request"?

    It's not too bad, I think. However, what I think is important is that
    the average (non-expert) Python developer understand that the function
    really accepts a bytes object, and other similar types (because, really,
    bytes is the only bytes-like type most developers will ever face).
    That's why I'm proposing "bytes-like object".

    @skrah
    Copy link
    Mannequin

    skrah mannequin commented Nov 23, 2012

    Antoine Pitrou <report@bugs.python.org> wrote:

    > How about "object does not provide a byte buffer" for error messages
    > and "(byte) buffer provider" as a shorthand for "any buffer provider
    > that exposes its memory as a sequence of unsigned bytes in response
    > to a PyBUF_SIMPLE request"?

    It's not too bad, I think. However, what I think is important is that
    the average (non-expert) Python developer understand that the function
    really accepts a bytes object, and other similar types (because, really,
    bytes is the only bytes-like type most developers will ever face).
    That's why I'm proposing "bytes-like object".

    If it is somehow possible to establish the term as a shorthand for the real
    meaning, then I guess it's the most economical option for documenting Python
    methods (I don't think it should be used in the C-API docs though).

    help (b''.join) for example would sound better with "bytes-like object"
    than with "(byte) buffer provider".

    @cjerdonek
    Copy link
    Member Author

    > That's why I'm proposing "bytes-like object".

    If it is somehow possible to establish the term as a shorthand for the real
    meaning,

    This can be established via the glossary. We can still use "buffer provider" for the general case, if we find that it is useful in certain circumstances.

    @cjerdonek
    Copy link
    Member Author

    After this issue is resolved, the binascii docs can be updated as suggested in bpo-16724.

    @ezio-melotti
    Copy link
    Member

    Here's a patch that adds "bytes-like object" to the glossary, links to the buffer protocol docs0 and provides bytes and bytearray as examples.

    @python-dev
    Copy link
    Mannequin

    python-dev mannequin commented Apr 30, 2013

    New changeset 474f28bf67b3 by Ezio Melotti in branch '3.3':
    bpo-16518: add "bytes-like object" to the glossary.
    http://hg.python.org/cpython/rev/474f28bf67b3

    New changeset 747cede24367 by Ezio Melotti in branch 'default':
    bpo-16518: merge with 3.3.
    http://hg.python.org/cpython/rev/747cede24367

    New changeset 1b92a0112f5d by Ezio Melotti in branch '2.7':
    bpo-16518: add "bytes-like object" to the glossary.
    http://hg.python.org/cpython/rev/1b92a0112f5d

    @python-dev
    Copy link
    Mannequin

    python-dev mannequin commented May 1, 2013

    New changeset d1aa8a9eba44 by Ezio Melotti in branch '2.7':
    bpo-16518: fix links in glossary entry.
    http://hg.python.org/cpython/rev/d1aa8a9eba44

    @ezio-melotti
    Copy link
    Member

    The attached patch replaces things like "object that support the buffer protocol/interface/API" with "bytes-like objects" throughout the docs.
    The patch doesn't change error messages/docstrings.

    I also noticed that on 2.70, the section about the buffer protocol in Doc/c-api/buffer.rst is called "Buffers and Memoryview Objects" and it's not as clear as the one on 3.x1. Should this section be backported?

    @pitrou
    Copy link
    Member

    pitrou commented May 1, 2013

    I also noticed that on 2.7[0], the section about the buffer protocol
    in Doc/c-api/buffer.rst is called "Buffers and Memoryview Objects" and
    it's not as clear as the one on 3.x[1]. Should this section be
    backported?

    The "buffer protocol" situation is different on 2.x, please let's
    concentrate on 3.x :-)

    @python-dev
    Copy link
    Mannequin

    python-dev mannequin commented May 4, 2013

    New changeset 003e4eb92683 by Ezio Melotti in branch '3.3':
    bpo-16518: use "bytes-like object" throughout the docs.
    http://hg.python.org/cpython/rev/003e4eb92683

    New changeset d4912244cce6 by Ezio Melotti in branch 'default':
    bpo-16518: merge with 3.3.
    http://hg.python.org/cpython/rev/d4912244cce6

    @ezio-melotti
    Copy link
    Member

    The attached patch uses "bytes-like objects" in the error messages.

    @pitrou
    Copy link
    Member

    pitrou commented May 4, 2013

    The attached patch uses "bytes-like objects" in the error messages.

    I'm surprised your patch doesn't touch Python/getargs.c.

    @ezio-melotti
    Copy link
    Member

    FWIW I was grepping for buffer protocol/interface/api, and then double-checking for "buffer" in the resulting files. Python/getargs.c doesn't seem to mention the buffer protocol/interface/api at all.

    @ezio-melotti
    Copy link
    Member

    Updated patch to include getargs.c too.

    @rhettinger
    Copy link
    Contributor

    At first-reading, it looks like matters were made more confusing with "bytes-like object" as a defined term.

    @ezio-melotti
    Copy link
    Member

    Can you elaborate?

    @python-dev
    Copy link
    Mannequin

    python-dev mannequin commented Oct 5, 2014

    New changeset e7e8a218737a by R David Murray in branch 'default':
    bpo-16518: Bring error messages in harmony with docs ("bytes-like object")
    https://hg.python.org/cpython/rev/e7e8a218737a

    @bitdancer
    Copy link
    Member

    Committed the message changes to 3.5 only, since it will probably cause tests to fail in various projects, despite messages not being a formal part of the python API.

    Per IRC conversation with Ezio and Antoine, I posted a note to python-dev to let people know we now have a consistent terminology in the docs and error messages, and to provide a last opportunity for objections (it is easy enough to back the patch out if there is an outcry, but I don't expect one).

    @serhiy-storchaka
    Copy link
    Member

    There are other unfixed messages (may be introduced after 3.3):

    >>> b''.join([''])
    Traceback (most recent call last):
      File "<stdin>", line 1, in <module>
    TypeError: sequence item 0: expected bytes, bytearray, or an object with the buffer interface, str found
    >>> str(42, 'utf8')
    Traceback (most recent call last):
      File "<stdin>", line 1, in <module>
    TypeError: coercing to str: need bytes, bytearray or buffer-like object, int found
    >>> import array; array.array('B').frombytes(array.array('I'))
    Traceback (most recent call last):
      File "<stdin>", line 1, in <module>
    TypeError: string/buffer of bytes required.
    >>> import socket; print(socket.socket.sendmsg.__doc__)
    sendmsg(buffers[, ancdata[, flags[, address]]]) -> count

    Send normal and ancillary data to the socket, gathering the
    non-ancillary data from a series of buffers and concatenating it into
    a single message. The buffers argument specifies the non-ancillary
    data as an iterable of buffer-compatible objects (e.g. bytes objects).
    The ancdata argument specifies the ancillary data (control messages)
    as an iterable of zero or more tuples (cmsg_level, cmsg_type,
    cmsg_data), where cmsg_level and cmsg_type are integers specifying the
    protocol level and protocol-specific type respectively, and cmsg_data
    is a buffer-compatible object holding the associated data. The flags
    argument defaults to 0 and has the same meaning as for send(). If
    address is supplied and not None, it sets a destination address for
    the message. The return value is the number of bytes of non-ancillary
    data sent.

    And there are several mentions of "buffer-like" or "buffer-compatible" in the documentation.

    @birkenfeld
    Copy link
    Member

    Please open a new issue for those.

    @ezio-melotti
    Copy link
    Member

    See bpo-22581.

    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    docs Documentation in the Doc dir type-feature A feature request or enhancement
    Projects
    None yet
    Development

    No branches or pull requests

    8 participants