classification
Title: bz2.open modes behaving differently than standard open() modes
Type: behavior Stage: resolved
Components: Versions: Python 3.9
process
Status: closed Resolution: wont fix
Dependencies: Superseder:
Assigned To: Nosy List: josh.r, philipp.freyer, rhettinger
Priority: normal Keywords:

Created on 2021-02-22 17:03 by philipp.freyer, last changed 2021-02-24 05:52 by philipp.freyer. This issue is now closed.

Messages (4)
msg387517 - (view) Author: Philipp Freyer (philipp.freyer) Date: 2021-02-22 17:03
The documentation clearly states that bz2.open(mode='r') opens a file in binary mode. I would have to use 'rt' for text mode.

The basic Python open(mode='r') method opens a file in text mode.
This is how I would expect any open(mode='r') method to work, especially since the standard Python implementation of open() treats 'r' as a synonym to 'rt'.

IMHO the behavior of bz2.open() is a possible pitfall for many developers and should be in line with the standard open() behavior.

Sorry, if this is in the wrong place, I am happy to put this elsewhere. However, I do see that as a conceptual issue (in our case this bug was not found for years)
msg387605 - (view) Author: Josh Rosenberg (josh.r) * (Python triager) Date: 2021-02-24 03:52
All of the compression modules (gzip, lzma) have this behavior, not just bz2; it's consistent in that sense. Changing it now, after literally decades with the old behavior, would needlessly break existing programs. As you say, it's documented clearly, I'm not seeing a gain to be had strong enough to violate the existing documentation.
msg387607 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2021-02-24 05:43
I concur with Josh and will mark this as closed.

It is unfortunate, but the time to get an API right is before it lands, not years after people have come to depend on it.
msg387608 - (view) Author: Philipp Freyer (philipp.freyer) Date: 2021-02-24 05:52
I understand and accept that but I would recommend highlighting this difference in the documentation a bit more since this information can be easily skipped when reading the documentation.

I still find it important to hint to this stronger since I've seen countless code snippets and answers on Stackoverflow that either tell the user to use "r" for text mode and "rb" for binary or - if listing file open modes do not mention the "*t" modes at all.

If you can point me to where to suggest a change (or pull request) for the documentation, I am happy to propose a change myself :-)
History
Date User Action Args
2021-02-24 05:52:12philipp.freyersetmessages: + msg387608
2021-02-24 05:43:21rhettingersetstatus: open -> closed

nosy: + rhettinger
messages: + msg387607

resolution: wont fix
stage: resolved
2021-02-24 03:52:21josh.rsetnosy: + josh.r
messages: + msg387605
2021-02-22 17:03:22philipp.freyercreate