classification
Title: File mode wb+ appears as rb+
Type: behavior Stage:
Components: Interpreter Core, IO, Library (Lib) Versions: Python 3.6, Python 3.5, Python 2.7
process
Status: open Resolution:
Dependencies: Superseder:
Assigned To: Nosy List: Arfrever, Mark.Williams, barry, benjamin.peterson, mahmoud, pitrou, serhiy.storchaka, stutzbach, xiang.zhang
Priority: normal Keywords: patch

Created on 2015-10-08 07:42 by Mark.Williams, last changed 2016-04-23 07:08 by serhiy.storchaka.

Files
File name Uploaded Description Edit
file_mode.patch xiang.zhang, 2015-10-10 08:54 treat wb+ and rb+ differently review
Messages (5)
msg252518 - (view) Author: Mark Williams (Mark.Williams) * Date: 2015-10-08 07:42
There is at least one mode in which a file can be opened that cannot be represented in its mode attribute: wb+.  This mode instead appears as 'rb+' in the mode attribute:

Python 3.5.0 (default, Oct  3 2015, 10:40:38)
[GCC 4.2.1 Compatible FreeBSD Clang 3.4.1 (tags/RELEASE_34/dot1-final 208032)] on freebsd10
Type "help", "copyright", "credits" or "license" for more information.
>>> import os
>>> if os.path.exists('some_file'): os.unlink('some_file')
...
>>> with open('some_file', 'r+b') as f: print(f.mode)
...
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
FileNotFoundError: [Errno 2] No such file or directory: 'some_file'
>>> with open('some_file', 'w+b') as f: print(f.mode)
...
rb+
>>> with open('some_file', 'r+b') as f: print(f.mode)
rb+


This means code that interacts with file objects cannot trust the mode of binary files.  For example, you can't use tempfile.TemporaryFile (the mode argument of which defaults to 'wb+') and GzipFile:


>>> import gzip
>>> from tempfile import TemporaryFile
>>> with TemporaryFile() as f:
...     gzip.GzipFile(fileobj=f).write(b'test')
...
Traceback (most recent call last):
  File "<stdin>", line 2, in <module>
  File "/usr/local/lib/python3.5/gzip.py", line 249, in write
    raise OSError(errno.EBADF, "write() on read-only GzipFile object")
OSError: [Errno 9] write() on read-only GzipFile object


This occurs because without a mode argument passed to its initializer, GzipFile checks that the fp object's mode starts with 'w', 'a', or 'x'.

For the sake of completeness/searchability: w+ and r+ are different modes, so rb+ and wb+ must be different modes.  Per https://docs.python.org/3/library/functions.html#open :

"""
For binary read-write access, the mode 'w+b' opens and truncates the file to 0 bytes. 'r+b' opens the file without truncation.
"""


I haven't been able to test this on Windows, but I expect precisely the same behavior given my understanding of the relevant source.

_io_FileIO___init___impl in _io/fileio.c does the right thing and includes O_CREAT and O_TRUNC in the open(2) flags upon seeing 'w' in the mode:

https://hg.python.org/cpython/file/3.5/Modules/_io/fileio.c#l324

this ensures correct interaction with the file system.  But it also sets self->readable and self->writable upon seeing '+' in the mode:

https://hg.python.org/cpython/file/3.5/Modules/_io/fileio.c#l341

The open flags are not retained.  Consequently, when the mode attribute is accessed and the get_mode calls the mode_string function, the instance has insufficient information to differentiate between 'rb+' and 'wb+':

https://hg.python.org/cpython/file/3.5/Modules/_io/fileio.c#l1043

If the FileIO instance did retain the 'flags' variable that's declared and set in its initializer, then mode_string could use it to determine the difference between wb+ and rb+.

I would be happy to write a patch for this.
msg252696 - (view) Author: Xiang Zhang (xiang.zhang) * (Python committer) Date: 2015-10-10 08:27
I think Mark is right. Since wb+ and rb+ have different behaviours they
should be treat separately.

But this behaviour treating wb+ and rb+ as the same is well tested and
seems to intended to do so.
msg252697 - (view) Author: Mark Williams (Mark.Williams) * Date: 2015-10-10 08:48
Python's test suite may test the current behavior but that does not lessen
the problem.

I gave an example of apparently correct code that fails (that was actually
encountered by a Python user) in my original description.  Another such
example: you cannot duplicate a file object -- same path, same mode --- and
be sure that the duplicate is a true duplicate.  Data corruption could
occur in application code if the duplicated file were opened "rb+" instead
of "wb+", as the duplicate would not truncate existing data.

Another way to think about the problem is accuracy of intent.  The mode
attribute on file objects can be incorrect, and by "incorrect" I mean "not
describe the mode under which the file was opened."  Why have a mode
attribute at all, then?  I, for one, would prefer *no* mode attribute to
one that's sometimes incorrect.  But a correct one is even better!

On Sat, Oct 10, 2015 at 1:27 AM, Xiang Zhang <report@bugs.python.org> wrote:

>
> Xiang Zhang added the comment:
>
> I think Mark is right. Since wb+ and rb+ have different behaviours they
> should be treat separately.
>
> But this behaviour treating wb+ and rb+ as the same is well tested and
> seems to intended to do so.
>
> ----------
> nosy: +xiang.zhang
>
> _______________________________________
> Python tracker <report@bugs.python.org>
> <http://bugs.python.org/issue25341>
> _______________________________________
>
msg252698 - (view) Author: Xiang Zhang (xiang.zhang) * (Python committer) Date: 2015-10-10 08:54
I make a patch which now identifies the difference between wb+ and rb+,
and modifies the corresponding tests. Though I don't know whether this
need to be fixed.
msg264051 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2016-04-23 07:08
> But this behaviour treating wb+ and rb+ as the same is well tested and
seems to intended to do so.

I think this is not intended behavior. Tests just test that the current behavior is not changed accidentally. If I'm right, the patch LGTM. But since third-party code can depend on this behavior, I would fix it only in 3.6.

Tests were added in issue4362 and Barry asked the same question about "w+" (msg76134).

Barry, Benjamin, what are you think about this now?
History
Date User Action Args
2020-10-02 18:56:35serhiy.storchakalinkissue40391 superseder
2016-04-23 07:08:56serhiy.storchakasetnosy: + serhiy.storchaka, barry
messages: + msg264051
2015-10-10 17:07:42Arfreversetnosy: + Arfrever
2015-10-10 08:54:58xiang.zhangsetfiles: + file_mode.patch
keywords: + patch
2015-10-10 08:54:45xiang.zhangsetmessages: + msg252698
2015-10-10 08:48:25Mark.Williamssetmessages: + msg252697
2015-10-10 08:27:12xiang.zhangsetnosy: + xiang.zhang
messages: + msg252696
2015-10-09 20:00:46terry.reedysetnosy: + pitrou, benjamin.peterson, stutzbach
2015-10-08 09:40:24mahmoudsetnosy: + mahmoud
2015-10-08 07:42:58Mark.Williamscreate