This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: bz2.peek always peeks all the remaining bytes ignoring n argument
Type: behavior Stage: resolved
Components: Library (Lib) Versions: Python 3.4
process
Status: closed Resolution: not a bug
Dependencies: Superseder:
Assigned To: Nosy List: nadeem.vawda, serhiy.storchaka, vajrasky
Priority: normal Keywords: patch

Created on 2014-03-06 08:56 by vajrasky, last changed 2022-04-11 14:57 by admin. This issue is now closed.

Files
File name Uploaded Description Edit
use_n_argument_in_peek_bz2.patch vajrasky, 2014-03-06 08:56 review
Messages (4)
msg212796 - (view) Author: Vajrasky Kok (vajrasky) * Date: 2014-03-06 08:56
# Bug demo
TEXT_LINES = [
    b'cutecat\n',
    b'promiscuousbonobo\n',
]
TEXT = b''.join(TEXT_LINES)
import bz2
filename = '/tmp/demo.bz2'
with open(filename, 'wb') as f:
    f.write(bz2.compress(TEXT))

with bz2.BZ2File(filename) as bz2f:
    pdata = bz2f.peek(n=7)
    print(pdata)

It outputs b'cutecat\npromiscuousbonobo\n', not b'cutecat'.

Here is the patch to fix the bug.
msg212797 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2014-03-06 09:13
This is documented behavior.

   .. method:: peek([n])

      Return buffered data without advancing the file position. At least one
      byte of data will be returned (unless at EOF). The exact number of bytes
      returned is unspecified.
msg212802 - (view) Author: Vajrasky Kok (vajrasky) * Date: 2014-03-06 10:05
Just curious, why the exact number of bytes returned is unspecified in bz2 (in other words, n argument is ignored)? gzip uses n argument.
msg212804 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2014-03-06 10:31
Because it is unspecified in io.BufferedReader.peek() and in many classes implemented the io.BufferedReader interface.

   .. method:: peek([size])

      Return bytes from the stream without advancing the position.  At most one
      single read on the raw stream is done to satisfy the call. The number of
      bytes returned may be less or more than requested.

I agree that this is weird, but this is a much larger issue than just bz2. We can't just "fix" this for bz2. This worths a discussion on Python-Dev.
History
Date User Action Args
2022-04-11 14:57:59adminsetgithub: 65055
2014-03-06 10:31:43serhiy.storchakasetmessages: + msg212804
2014-03-06 10:05:12vajraskysetmessages: + msg212802
2014-03-06 09:13:56serhiy.storchakasetstatus: open -> closed

nosy: + serhiy.storchaka
messages: + msg212797

resolution: not a bug
stage: resolved
2014-03-06 08:56:39vajraskycreate