Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BufferedReader.read1(size) signature incompatible with BufferedIOBase.read1(size=-1) #67403

Closed
vadmium opened this issue Jan 10, 2015 · 14 comments
Labels

Comments

@vadmium
Copy link
Member

vadmium commented Jan 10, 2015

BPO 23214
Nosy @pitrou, @vstinner, @benjaminp, @vadmium, @serhiy-storchaka
PRs
  • [Do Not Merge] Convert Misc/NEWS so that it is managed by towncrier #552
  • Files
  • read1-arbitrary.patch
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = None
    closed_at = <Date 2016-10-22.00:24:26.409>
    created_at = <Date 2015-01-10.01:46:22.694>
    labels = ['3.7', 'expert-IO']
    title = 'BufferedReader.read1(size) signature incompatible with BufferedIOBase.read1(size=-1)'
    updated_at = <Date 2017-03-31.16:36:34.664>
    user = 'https://github.com/vadmium'

    bugs.python.org fields:

    activity = <Date 2017-03-31.16:36:34.664>
    actor = 'dstufft'
    assignee = 'none'
    closed = True
    closed_date = <Date 2016-10-22.00:24:26.409>
    closer = 'martin.panter'
    components = ['IO']
    creation = <Date 2015-01-10.01:46:22.694>
    creator = 'martin.panter'
    dependencies = []
    files = ['42213']
    hgrepos = []
    issue_num = 23214
    keywords = ['patch']
    message_count = 14.0
    messages = ['233794', '233864', '261672', '261685', '261698', '261703', '262019', '279093', '279097', '279100', '279102', '279113', '279163', '279167']
    nosy_count = 7.0
    nosy_names = ['pitrou', 'vstinner', 'benjamin.peterson', 'stutzbach', 'python-dev', 'martin.panter', 'serhiy.storchaka']
    pr_nums = ['552']
    priority = 'normal'
    resolution = 'fixed'
    stage = 'resolved'
    status = 'closed'
    superseder = None
    type = None
    url = 'https://bugs.python.org/issue23214'
    versions = ['Python 3.7']

    @vadmium
    Copy link
    Member Author

    vadmium commented Jan 10, 2015

    I am trying to make LZMAFile (which implements BufferedIOBase) use a BufferedReader in read mode. However this broke test_lzma.FileTestCase.test_read1_multistream(), which calls read1() with the default size argument. This is because BufferedReader.read1() does not accept size=-1:

    >>> stdin = open(0, "rb", closefd=False)
    >>> stdin
    <_io.BufferedReader name=0>
    >>> stdin.read1()  # Parameter is mandatory
    Traceback (most recent call last):
      File "<stdin>", line 1, in <module>
    TypeError: read1() takes exactly 1 argument (0 given)
    >>> stdin.read1(-1)  # Does not accept the BufferedIOBase default
    Traceback (most recent call last):
      File "<stdin>", line 1, in <module>
    ValueError: read length must be positive
    >>> stdin.read1(0)  # Technically not positive
    b''

    Also, the BufferedIOBase documentation does not say what the size=-1 value means, only that it reads and returns up to -1 bytes.

    @vadmium
    Copy link
    Member Author

    vadmium commented Jan 11, 2015

    Looking at the test suite:

    • read1() of LZMAFile and GzipFile (both implementing BufferedIOBase) are asserted to return a non-zero result until EOF
    • LZMAFile.read1(0) is asserted to return an empty string
    • BufferedReader.read1(-1) is asserted to raise ValueError
    • There are also tests of read1() methods on HTTPResponse and ZipFile.open() objects, but those methods are undocumented

    It seems the most consistent way forward would be to:

    • Define BufferedIOBase.read1(-1) to read and return an arbitrary number of bytes, more than zero unless none are available due to EOF or non-blocking mode. Maybe suggest that it would return the current buffered data or try to read a full buffer of data (with one call) and return it if applicable.
    • Change the signature to BufferedReader.read1(size=-1) and implement the size=-1 behaviour

    @vadmium
    Copy link
    Member Author

    vadmium commented Mar 13, 2016

    Looking at this again, I think a less intrusive way forward would be to:

    • Document that in 3.6, the required signature is now BufferedIOBase.read1(size). An implementation no longer has to provide a default size, and no longer has to accept negative sizes.

    • Explicitly document the behaviour of each concrete implementation like GzipFile.read1(-1) etc, if this behaviour is intentional

    • Fix the BufferedReader error so that “read length must not be negative”

    Relaxing the read1() signature would allow wider or easier use of BufferedReader, e.g. to implement HTTPResponse as I suggested in bpo-26499. The advantage would be using existing code that is well tested, used, optimized, etc, rather than a custom BufferedIOBase implementation which for the HTTP case is buggy.

    @serhiy-storchaka
    Copy link
    Member

    Shouldn't read1(-1) be the same as read1(sys.maxsize)? I.e. read from a buffer without limit.

    @vadmium
    Copy link
    Member Author

    vadmium commented Mar 13, 2016

    Calling BufferedReader.read1(sys.maxsize) gives me a MemoryError. Making read1(-1) equivalent to read1(sys.maxsize) only makes sense where the return value already has a predetermined size, and only a limited buffer needs to be allocated.

    Another interpretation is to return an arbitrary, modest buffer size. This is what I ended up doing with LZMAFile.read1() in bpo-23529: return no more than 8 KiB. It is not equivalent to sys.maxsize because more than 8 KiB is possible if you ask for it. HTTPResponse (for non-chunked responses) is similar, but uses a default of 16 KiB.

    @pitrou
    Copy link
    Member

    pitrou commented Mar 13, 2016

    Define BufferedIOBase.read1(-1) to read and return an arbitrary number
    of bytes, more than zero unless none are available due to EOF or
    non-blocking mode. Maybe suggest that it would return the current
    buffered data or try to read a full buffer of data (with one call) and
    return it if applicable.

    This sounds reasonable to me.

    @vadmium
    Copy link
    Member Author

    vadmium commented Mar 19, 2016

    Okay here is a patch implementing read1(-1) in BufferedReader and BytesIO (my original proposal). Other changes:

    • Changed read1(size=-1) → read1([size]), because BufferedReader and BytesIO do not accept keyword arguments (see also bpo-23738)
    • Defined size=-1 to mean an arbitrary non-zero size
    • Change BufferedReader.read1() to return a buffer of data
    • Change BytesIO.read1() to read until EOF
    • Add tests to complement existing read1(size) tests for BufferedReader (includes BufferedRandom), BufferedRWPair, and BytesIO

    @python-dev
    Copy link
    Mannequin

    python-dev mannequin commented Oct 21, 2016

    New changeset b6886ac88e28 by Martin Panter in branch 'default':
    Issue bpo-23214: Implement optional BufferedReader, BytesIO read1() argument
    https://hg.python.org/cpython/rev/b6886ac88e28

    @python-dev
    Copy link
    Mannequin

    python-dev mannequin commented Oct 21, 2016

    New changeset d4fce64b1c65 by Martin Panter in branch 'default':
    Issue bpo-23214: Remove BufferedReader.read1(-1) workaround
    https://hg.python.org/cpython/rev/d4fce64b1c65

    @vstinner
    Copy link
    Member

    • .. method:: read1(size=-1)
      + .. method:: read1([size])

    I don't understand this change: the default size is documented as -1. Did I miss something?

    + If *size* is −1 (the default)

    Is the "−" character deliberate in "−1"? I would expected -1.

    @vadmium
    Copy link
    Member Author

    vadmium commented Oct 21, 2016

    The minus sign might have been deliberate at some stage, but I realize it is more accessible etc if it was ASCII -1, so I will change that.

    The idea of changing the signature is to avoid people thinking it accepts a keyword argument. See e.g. bpo-25030 for seek(offset, whence=SEEK_SET), bpo-14586 for truncate(size=None).

    @vadmium vadmium added the 3.7 (EOL) end of life label Oct 21, 2016
    @vstinner
    Copy link
    Member

    2016-10-21 5:17 GMT+02:00 Martin Panter <report@bugs.python.org>:

    The idea of changing the signature is to avoid people thinking it accepts a keyword argument. See e.g. bpo-25030 for seek(offset, whence=SEEK_SET), bpo-14586 for truncate(size=None).

    Ah. Maybe we should modify the code to accept a keyword argument? :-)

    Or we might use the strange "read1(\, size=1)" syntax of Argument Clinic.

    @vadmium
    Copy link
    Member Author

    vadmium commented Oct 21, 2016

    Modifying the API to accept a keyword argument is not worthwhile IMO. That means modifying modules like http.client and zipfile (which use the wrong keyword name), and user-defined implementations may become incompatible.

    The question of documentation is discussed more in bpo-23738. I think one or two people were concerned about using the Arg Clinic slash.

    @python-dev
    Copy link
    Mannequin

    python-dev mannequin commented Oct 21, 2016

    New changeset 2af6d94de492 by Martin Panter in branch 'default':
    Issue bpo-23214: Fix formatting of -1
    https://hg.python.org/cpython/rev/2af6d94de492

    @vadmium vadmium closed this as completed Oct 22, 2016
    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    Projects
    None yet
    Development

    No branches or pull requests

    4 participants