This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: Inefficient BufferedReader.read(-1)
Type: performance Stage: resolved
Components: IO Versions: Python 3.10
process
Status: closed Resolution:
Dependencies: Superseder:
Assigned To: Nosy List: malin
Priority: normal Keywords: patch

Created on 2020-08-01 03:34 by malin, last changed 2022-04-11 14:59 by admin. This issue is now closed.

Files
File name Uploaded Description Edit
demo.py malin, 2020-08-01 03:34
Pull Requests
URL Status Linked Edit
PR 21698 closed malin, 2020-08-01 05:37
Messages (2)
msg374652 - (view) Author: Ma Lin (malin) * Date: 2020-08-01 03:34
BufferedReader's constructor has a `buffer_size` parameter, it's the size of this buffer:

    When reading data from BufferedReader object, a larger
    amount of data may be requested from the underlying raw
    stream, and kept in an internal buffer.
    
    The doc of BufferedReader[1]


If call the BufferedReader.read(size) function:

    1, When `size` is a positive number, it reads `buffer_size`
       bytes from the underlying stream. This is expected behavior.

    2, When `size` is -1, it tries to call underlying stream's
       readall() function [2]. In this case `buffer_size` is not
       be respected.
       
       The underlying stream may be `RawIOBase`, its readall()
       function read `DEFAULT_BUFFER_SIZE` bytes in each read [3].
       
       `DEFAULT_BUFFER_SIZE` currently only 8KB, which is very
       inefficient for BufferedReader.read(-1). If `buffer_size`
       bytes is read every time, will be the expected performance.

Attached file demonstrates this problem.


[1] doc of BufferedReader:
https://docs.python.org/3/library/io.html#io.BufferedReader

[2] BufferedReader.read(-1) tries to call underlying stream's readall() function:
https://github.com/python/cpython/blob/v3.9.0b5/Modules/_io/bufferedio.c#L1538-L1542

[3] RawIOBase.readall() read DEFAULT_BUFFER_SIZE each time:
https://github.com/python/cpython/blob/v3.9.0b5/Modules/_io/iobase.c#L968-L969
msg374657 - (view) Author: Ma Lin (malin) * Date: 2020-08-01 05:59
Some underlying stream has fast-path for .readall().
So close this issue.
History
Date User Action Args
2022-04-11 14:59:34adminsetgithub: 85624
2020-08-01 05:59:43malinsetstatus: open -> closed

messages: + msg374657
stage: patch review -> resolved
2020-08-01 05:37:39malinsetkeywords: + patch
stage: patch review
pull_requests: + pull_request20842
2020-08-01 03:34:14malincreate