Issue 28445: Wrong documentation for GzipFile.peek

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

This issue has been migrated to GitHub: https://github.com/python/cpython/issues/72631

classification

Title:	Wrong documentation for GzipFile.peek
Type:	behavior	Stage:	patch review
Components:	Documentation	Versions:	Python 3.11, Python 3.10, Python 3.9

process

Status:	open	Resolution:
Dependencies:		Superseder:
Assigned To:	docs@python	Nosy List:	abacabadabacaba, docs@python, martin.panter, wangjiahua, xiang.zhang
Priority:	normal	Keywords:	easy, newcomer friendly, patch

Created on 2016-10-14 16:53 by abacabadabacaba, last changed 2022-04-11 14:58 by admin.

Pull Requests
URL	Status	Linked	Edit
PR 29820	open	wangjiahua, 2021-11-28 10:14

Messages (3)
msg278656 - (view)	Author: Evgeny Kapun (abacabadabacaba)	Date: 2016-10-14 16:53
From the documentation for GzipFile.peek(): At most one single read on the compressed stream is done to satisfy the call. If "compressed stream" means the underlying file object, then this is not true. The method tries to return at least one byte, unless the stream is at EOF. It is possible to create arbitrarily long compressed stream that would decompress to nothing, and the implementation would read the entire stream in this case. Because the length of the stream is not known in advance, several reads may be required for this. Perhaps the documentation for GzipFile.peek() should be made the same as that for BZ2File.peek() and LZMAFile.peek().
msg278660 - (view)	Author: Xiang Zhang (xiang.zhang) *	Date: 2016-10-14 17:46
The "compressed stream" is not the underlying file object but _GzipReader. And actually the "at most one single reader" is the characteristic of io.BufferedReader.peek, you can see it in the doc. Maybe it needs multiple reads on the file object in a single peek, but they are all encapsulated in the _GzipReader.read. So at the point of GzipFile.peek, it's still a single read.
msg278671 - (view)	Author: Martin Panter (martin.panter) *	Date: 2016-10-14 22:20
The peek() method was originally added by Issue 9962, where Antoine was trying to imitate the BufferedReader.peek() API. However because “the number of bytes returned may be more or less than requested”, I never understood what this methods were good for; see also Issue 5811. I think we could at least remove the claim about “at most one single read”. That is just describing an internal detail. The documentation for bzip and LZMA is slightly more useful IMO because it says “at least one byte of data will be returned, unless EOF has been reached”. This guarantee is actually missing from the underlying BufferedReader.peek() documentation, though I think both io and _pyio implement it.

History
Date	User	Action	Args
2022-04-11 14:58:38	admin	set	github: 72631
2021-11-28 10:14:51	wangjiahua	set	keywords: + patch nosy: + wangjiahua pull_requests: + pull_request28052 stage: patch review
2021-11-19 19:55:53	iritkatriel	set	keywords: + easy, newcomer friendly type: behavior versions: + Python 3.9, Python 3.10, Python 3.11, - Python 3.5
2016-10-14 22:20:07	martin.panter	set	messages: + msg278671
2016-10-14 17:46:47	xiang.zhang	set	nosy: + xiang.zhang messages: + msg278660
2016-10-14 17:24:04	serhiy.storchaka	set	nosy: + martin.panter
2016-10-14 16:53:42	abacabadabacaba	create