msg233794 - (view) |
Author: Martin Panter (martin.panter) *  |
Date: 2015-01-10 01:46 |
I am trying to make LZMAFile (which implements BufferedIOBase) use a BufferedReader in read mode. However this broke test_lzma.FileTestCase.test_read1_multistream(), which calls read1() with the default size argument. This is because BufferedReader.read1() does not accept size=-1:
>>> stdin = open(0, "rb", closefd=False)
>>> stdin
<_io.BufferedReader name=0>
>>> stdin.read1() # Parameter is mandatory
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: read1() takes exactly 1 argument (0 given)
>>> stdin.read1(-1) # Does not accept the BufferedIOBase default
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
ValueError: read length must be positive
>>> stdin.read1(0) # Technically not positive
b''
Also, the BufferedIOBase documentation does not say what the size=-1 value means, only that it reads and returns up to -1 bytes.
|
msg233864 - (view) |
Author: Martin Panter (martin.panter) *  |
Date: 2015-01-11 23:22 |
Looking at the test suite:
* read1() of LZMAFile and GzipFile (both implementing BufferedIOBase) are asserted to return a non-zero result until EOF
* LZMAFile.read1(0) is asserted to return an empty string
* BufferedReader.read1(-1) is asserted to raise ValueError
* There are also tests of read1() methods on HTTPResponse and ZipFile.open() objects, but those methods are undocumented
It seems the most consistent way forward would be to:
* Define BufferedIOBase.read1(-1) to read and return an arbitrary number of bytes, more than zero unless none are available due to EOF or non-blocking mode. Maybe suggest that it would return the current buffered data or try to read a full buffer of data (with one call) and return it if applicable.
* Change the signature to BufferedReader.read1(size=-1) and implement the size=-1 behaviour
|
msg261672 - (view) |
Author: Martin Panter (martin.panter) *  |
Date: 2016-03-13 06:02 |
Looking at this again, I think a less intrusive way forward would be to:
* Document that in 3.6, the required signature is now BufferedIOBase.read1(size). An implementation no longer has to provide a default size, and no longer has to accept negative sizes.
* Explicitly document the behaviour of each concrete implementation like GzipFile.read1(-1) etc, if this behaviour is intentional
* Fix the BufferedReader error so that “read length must not be negative”
Relaxing the read1() signature would allow wider or easier use of BufferedReader, e.g. to implement HTTPResponse as I suggested in Issue 26499. The advantage would be using existing code that is well tested, used, optimized, etc, rather than a custom BufferedIOBase implementation which for the HTTP case is buggy.
|
msg261685 - (view) |
Author: Serhiy Storchaka (serhiy.storchaka) *  |
Date: 2016-03-13 07:44 |
Shouldn't read1(-1) be the same as read1(sys.maxsize)? I.e. read from a buffer without limit.
|
msg261698 - (view) |
Author: Martin Panter (martin.panter) *  |
Date: 2016-03-13 12:01 |
Calling BufferedReader.read1(sys.maxsize) gives me a MemoryError. Making read1(-1) equivalent to read1(sys.maxsize) only makes sense where the return value already has a predetermined size, and only a limited buffer needs to be allocated.
Another interpretation is to return an arbitrary, modest buffer size. This is what I ended up doing with LZMAFile.read1() in Issue 23529: return no more than 8 KiB. It is not equivalent to sys.maxsize because more than 8 KiB is possible if you ask for it. HTTPResponse (for non-chunked responses) is similar, but uses a default of 16 KiB.
|
msg261703 - (view) |
Author: Antoine Pitrou (pitrou) *  |
Date: 2016-03-13 18:38 |
> Define BufferedIOBase.read1(-1) to read and return an arbitrary number
> of bytes, more than zero unless none are available due to EOF or
> non-blocking mode. Maybe suggest that it would return the current
> buffered data or try to read a full buffer of data (with one call) and
> return it if applicable.
This sounds reasonable to me.
|
msg262019 - (view) |
Author: Martin Panter (martin.panter) *  |
Date: 2016-03-19 06:32 |
Okay here is a patch implementing read1(-1) in BufferedReader and BytesIO (my original proposal). Other changes:
* Changed read1(size=-1) → read1([size]), because BufferedReader and BytesIO do not accept keyword arguments (see also Issue 23738)
* Defined size=-1 to mean an arbitrary non-zero size
* Change BufferedReader.read1() to return a buffer of data
* Change BytesIO.read1() to read until EOF
* Add tests to complement existing read1(size) tests for BufferedReader (includes BufferedRandom), BufferedRWPair, and BytesIO
|
msg279093 - (view) |
Author: Roundup Robot (python-dev)  |
Date: 2016-10-21 00:17 |
New changeset b6886ac88e28 by Martin Panter in branch 'default':
Issue #23214: Implement optional BufferedReader, BytesIO read1() argument
https://hg.python.org/cpython/rev/b6886ac88e28
|
msg279097 - (view) |
Author: Roundup Robot (python-dev)  |
Date: 2016-10-21 02:00 |
New changeset d4fce64b1c65 by Martin Panter in branch 'default':
Issue #23214: Remove BufferedReader.read1(-1) workaround
https://hg.python.org/cpython/rev/d4fce64b1c65
|
msg279100 - (view) |
Author: STINNER Victor (vstinner) *  |
Date: 2016-10-21 02:28 |
- .. method:: read1(size=-1)
+ .. method:: read1([size])
I don't understand this change: the default size is documented as -1. Did I miss something?
+ If *size* is −1 (the default)
Is the "−" character deliberate in "−1"? I would expected ``-1``.
|
msg279102 - (view) |
Author: Martin Panter (martin.panter) *  |
Date: 2016-10-21 03:17 |
The minus sign might have been deliberate at some stage, but I realize it is more accessible etc if it was ASCII -1, so I will change that.
The idea of changing the signature is to avoid people thinking it accepts a keyword argument. See e.g. Issue 25030 for seek(offset, whence=SEEK_SET), Issue 14586 for truncate(size=None).
|
msg279113 - (view) |
Author: STINNER Victor (vstinner) *  |
Date: 2016-10-21 08:20 |
2016-10-21 5:17 GMT+02:00 Martin Panter <report@bugs.python.org>:
> The idea of changing the signature is to avoid people thinking it accepts a keyword argument. See e.g. Issue 25030 for seek(offset, whence=SEEK_SET), Issue 14586 for truncate(size=None).
Ah. Maybe we should modify the code to accept a keyword argument? :-)
Or we might use the strange "read1(\, size=1)" syntax of Argument Clinic.
|
msg279163 - (view) |
Author: Martin Panter (martin.panter) *  |
Date: 2016-10-21 21:51 |
Modifying the API to accept a keyword argument is not worthwhile IMO. That means modifying modules like http.client and zipfile (which use the wrong keyword name), and user-defined implementations may become incompatible.
The question of documentation is discussed more in Issue 23738. I think one or two people were concerned about using the Arg Clinic slash.
|
msg279167 - (view) |
Author: Roundup Robot (python-dev)  |
Date: 2016-10-21 23:00 |
New changeset 2af6d94de492 by Martin Panter in branch 'default':
Issue #23214: Fix formatting of -1
https://hg.python.org/cpython/rev/2af6d94de492
|
|
Date |
User |
Action |
Args |
2022-04-11 14:58:11 | admin | set | github: 67403 |
2017-03-31 16:36:34 | dstufft | set | pull_requests:
+ pull_request1074 |
2016-10-22 00:24:26 | martin.panter | set | status: open -> closed resolution: fixed stage: patch review -> resolved |
2016-10-21 23:00:51 | python-dev | set | messages:
+ msg279167 |
2016-10-21 21:51:35 | martin.panter | set | messages:
+ msg279163 |
2016-10-21 08:20:22 | vstinner | set | messages:
+ msg279113 |
2016-10-21 03:17:35 | martin.panter | set | messages:
+ msg279102 versions:
+ Python 3.7, - Python 3.6 |
2016-10-21 02:28:14 | vstinner | set | messages:
+ msg279100 |
2016-10-21 02:00:18 | python-dev | set | messages:
+ msg279097 |
2016-10-21 00:17:07 | python-dev | set | nosy:
+ python-dev messages:
+ msg279093
|
2016-04-14 12:55:26 | vstinner | set | nosy:
+ vstinner
|
2016-03-19 06:32:07 | martin.panter | set | files:
+ read1-arbitrary.patch keywords:
+ patch messages:
+ msg262019
stage: patch review |
2016-03-13 18:38:21 | pitrou | set | messages:
+ msg261703 |
2016-03-13 12:01:15 | martin.panter | set | messages:
+ msg261698 |
2016-03-13 07:44:24 | serhiy.storchaka | set | nosy:
+ stutzbach, serhiy.storchaka, pitrou, benjamin.peterson messages:
+ msg261685
|
2016-03-13 06:02:01 | martin.panter | set | messages:
+ msg261672 versions:
+ Python 3.6, - Python 3.4 |
2015-01-11 23:23:00 | martin.panter | set | messages:
+ msg233864 |
2015-01-10 01:46:22 | martin.panter | create | |