classification
Title: Wrong documentation for GzipFile.peek
Type: behavior Stage: patch review
Components: Documentation Versions: Python 3.11, Python 3.10, Python 3.9
process
Status: open Resolution:
Dependencies: Superseder:
Assigned To: docs@python Nosy List: 180909, abacabadabacaba, docs@python, martin.panter, xiang.zhang
Priority: normal Keywords: easy, newcomer friendly, patch

Created on 2016-10-14 16:53 by abacabadabacaba, last changed 2021-11-28 10:14 by 180909.

Pull Requests
URL Status Linked Edit
PR 29820 open 180909, 2021-11-28 10:14
Messages (3)
msg278656 - (view) Author: Evgeny Kapun (abacabadabacaba) Date: 2016-10-14 16:53
From the documentation for GzipFile.peek():

    At most one single read on the compressed stream is done to satisfy the call.

If "compressed stream" means the underlying file object, then this is not true. The method tries to return at least one byte, unless the stream is at EOF. It is possible to create arbitrarily long compressed stream that would decompress to nothing, and the implementation would read the entire stream in this case. Because the length of the stream is not known in advance, several reads may be required for this.

Perhaps the documentation for GzipFile.peek() should be made the same as that for BZ2File.peek() and LZMAFile.peek().
msg278660 - (view) Author: Xiang Zhang (xiang.zhang) * (Python committer) Date: 2016-10-14 17:46
The "compressed stream" is not the underlying file object but _GzipReader. And actually the "at most one single reader" is the characteristic of io.BufferedReader.peek, you can see it in the doc. Maybe it needs multiple reads on the file object in a single peek, but they are all encapsulated in the _GzipReader.read. So at the point of GzipFile.peek, it's still a single read.
msg278671 - (view) Author: Martin Panter (martin.panter) * (Python committer) Date: 2016-10-14 22:20
The peek() method was originally added by Issue 9962, where Antoine was trying to imitate the BufferedReader.peek() API. However because “the number of bytes returned may be more or less than requested”, I never understood what this methods were good for; see also Issue 5811.

I think we could at least remove the claim about “at most one single read”. That is just describing an internal detail.

The documentation for bzip and LZMA is slightly more useful IMO because it says “at least one byte of data will be returned, unless EOF has been reached”. This guarantee is actually missing from the underlying BufferedReader.peek() documentation, though I think both io and _pyio implement it.
History
Date User Action Args
2021-11-28 10:14:51180909setkeywords: + patch
nosy: + 180909

pull_requests: + pull_request28052
stage: patch review
2021-11-19 19:55:53iritkatrielsetkeywords: + easy, newcomer friendly
type: behavior
versions: + Python 3.9, Python 3.10, Python 3.11, - Python 3.5
2016-10-14 22:20:07martin.pantersetmessages: + msg278671
2016-10-14 17:46:47xiang.zhangsetnosy: + xiang.zhang
messages: + msg278660
2016-10-14 17:24:04serhiy.storchakasetnosy: + martin.panter
2016-10-14 16:53:42abacabadabacabacreate