Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

File mode wb+ appears as rb+ #69528

Open
markrwilliams mannequin opened this issue Oct 8, 2015 · 7 comments
Open

File mode wb+ appears as rb+ #69528

markrwilliams mannequin opened this issue Oct 8, 2015 · 7 comments
Labels
3.12 bugs and security fixes interpreter-core (Objects, Python, Grammar, and Parser dirs) stdlib Python modules in the Lib dir topic-IO type-bug An unexpected behavior, bug, or error

Comments

@markrwilliams
Copy link
Mannequin

markrwilliams mannequin commented Oct 8, 2015

BPO 25341
Nosy @warsaw, @pitrou, @benjaminp, @serhiy-storchaka, @markrwilliams, @zhangyangyu, @iritkatriel
Files
  • file_mode.patch: treat wb+ and rb+ differently
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = None
    closed_at = None
    created_at = <Date 2015-10-08.07:42:58.456>
    labels = ['interpreter-core', 'type-bug', 'library', 'expert-IO', '3.11']
    title = 'File mode wb+ appears as rb+'
    updated_at = <Date 2021-11-26.18:43:47.286>
    user = 'https://github.com/markrwilliams'

    bugs.python.org fields:

    activity = <Date 2021-11-26.18:43:47.286>
    actor = 'iritkatriel'
    assignee = 'none'
    closed = False
    closed_date = None
    closer = None
    components = ['Interpreter Core', 'Library (Lib)', 'IO']
    creation = <Date 2015-10-08.07:42:58.456>
    creator = 'Mark.Williams'
    dependencies = []
    files = ['40739']
    hgrepos = []
    issue_num = 25341
    keywords = ['patch']
    message_count = 6.0
    messages = ['252518', '252696', '252697', '252698', '264051', '407081']
    nosy_count = 10.0
    nosy_names = ['barry', 'pitrou', 'benjamin.peterson', 'stutzbach', 'Arfrever', 'mahmoud', 'serhiy.storchaka', 'Mark.Williams', 'xiang.zhang', 'iritkatriel']
    pr_nums = []
    priority = 'normal'
    resolution = None
    stage = None
    status = 'open'
    superseder = None
    type = 'behavior'
    url = 'https://bugs.python.org/issue25341'
    versions = ['Python 3.11']

    @markrwilliams
    Copy link
    Mannequin Author

    markrwilliams mannequin commented Oct 8, 2015

    There is at least one mode in which a file can be opened that cannot be represented in its mode attribute: wb+. This mode instead appears as 'rb+' in the mode attribute:

    Python 3.5.0 (default, Oct  3 2015, 10:40:38)
    [GCC 4.2.1 Compatible FreeBSD Clang 3.4.1 (tags/RELEASE_34/dot1-final 208032)] on freebsd10
    Type "help", "copyright", "credits" or "license" for more information.
    >>> import os
    >>> if os.path.exists('some_file'): os.unlink('some_file')
    ...
    >>> with open('some_file', 'r+b') as f: print(f.mode)
    ...
    Traceback (most recent call last):
      File "<stdin>", line 1, in <module>
    FileNotFoundError: [Errno 2] No such file or directory: 'some_file'
    >>> with open('some_file', 'w+b') as f: print(f.mode)
    ...
    rb+
    >>> with open('some_file', 'r+b') as f: print(f.mode)
    rb+

    This means code that interacts with file objects cannot trust the mode of binary files. For example, you can't use tempfile.TemporaryFile (the mode argument of which defaults to 'wb+') and GzipFile:

    >>> import gzip
    >>> from tempfile import TemporaryFile
    >>> with TemporaryFile() as f:
    ...     gzip.GzipFile(fileobj=f).write(b'test')
    ...
    Traceback (most recent call last):
      File "<stdin>", line 2, in <module>
      File "/usr/local/lib/python3.5/gzip.py", line 249, in write
        raise OSError(errno.EBADF, "write() on read-only GzipFile object")
    OSError: [Errno 9] write() on read-only GzipFile object

    This occurs because without a mode argument passed to its initializer, GzipFile checks that the fp object's mode starts with 'w', 'a', or 'x'.

    For the sake of completeness/searchability: w+ and r+ are different modes, so rb+ and wb+ must be different modes. Per https://docs.python.org/3/library/functions.html#open :

    """
    For binary read-write access, the mode 'w+b' opens and truncates the file to 0 bytes. 'r+b' opens the file without truncation.
    """

    I haven't been able to test this on Windows, but I expect precisely the same behavior given my understanding of the relevant source.

    _io_FileIO___init___impl in _io/fileio.c does the right thing and includes O_CREAT and O_TRUNC in the open(2) flags upon seeing 'w' in the mode:

    https://hg.python.org/cpython/file/3.5/Modules/_io/fileio.c#l324

    this ensures correct interaction with the file system. But it also sets self->readable and self->writable upon seeing '+' in the mode:

    https://hg.python.org/cpython/file/3.5/Modules/_io/fileio.c#l341

    The open flags are not retained. Consequently, when the mode attribute is accessed and the get_mode calls the mode_string function, the instance has insufficient information to differentiate between 'rb+' and 'wb+':

    https://hg.python.org/cpython/file/3.5/Modules/_io/fileio.c#l1043

    If the FileIO instance did retain the 'flags' variable that's declared and set in its initializer, then mode_string could use it to determine the difference between wb+ and rb+.

    I would be happy to write a patch for this.

    @markrwilliams markrwilliams mannequin added interpreter-core (Objects, Python, Grammar, and Parser dirs) stdlib Python modules in the Lib dir topic-IO type-bug An unexpected behavior, bug, or error labels Oct 8, 2015
    @zhangyangyu
    Copy link
    Member

    I think Mark is right. Since wb+ and rb+ have different behaviours they
    should be treat separately.

    But this behaviour treating wb+ and rb+ as the same is well tested and
    seems to intended to do so.

    @markrwilliams
    Copy link
    Mannequin Author

    markrwilliams mannequin commented Oct 10, 2015

    Python's test suite may test the current behavior but that does not lessen
    the problem.

    I gave an example of apparently correct code that fails (that was actually
    encountered by a Python user) in my original description. Another such
    example: you cannot duplicate a file object -- same path, same mode --- and
    be sure that the duplicate is a true duplicate. Data corruption could
    occur in application code if the duplicated file were opened "rb+" instead
    of "wb+", as the duplicate would not truncate existing data.

    Another way to think about the problem is accuracy of intent. The mode
    attribute on file objects can be incorrect, and by "incorrect" I mean "not
    describe the mode under which the file was opened." Why have a mode
    attribute at all, then? I, for one, would prefer *no* mode attribute to
    one that's sometimes incorrect. But a correct one is even better!

    On Sat, Oct 10, 2015 at 1:27 AM, Xiang Zhang <report@bugs.python.org> wrote:

    Xiang Zhang added the comment:

    I think Mark is right. Since wb+ and rb+ have different behaviours they
    should be treat separately.

    But this behaviour treating wb+ and rb+ as the same is well tested and
    seems to intended to do so.

    ----------
    nosy: +xiang.zhang


    Python tracker <report@bugs.python.org>
    <http://bugs.python.org/issue25341\>


    @zhangyangyu
    Copy link
    Member

    I make a patch which now identifies the difference between wb+ and rb+,
    and modifies the corresponding tests. Though I don't know whether this
    need to be fixed.

    @serhiy-storchaka
    Copy link
    Member

    But this behaviour treating wb+ and rb+ as the same is well tested and
    seems to intended to do so.

    I think this is not intended behavior. Tests just test that the current behavior is not changed accidentally. If I'm right, the patch LGTM. But since third-party code can depend on this behavior, I would fix it only in 3.6.

    Tests were added in bpo-4362 and Barry asked the same question about "w+" (msg76134).

    Barry, Benjamin, what are you think about this now?

    @iritkatriel
    Copy link
    Member

    Reproduced on 3.11.

    @iritkatriel iritkatriel added the 3.11 only security fixes label Nov 26, 2021
    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    @iritkatriel
    Copy link
    Member

    >>> import gzip
    >>> from tempfile import TemporaryFile
    >>> with TemporaryFile() as f:
    ...     gzip.GzipFile(fileobj=f).write(b'test')
    ...
    Traceback (most recent call last):
      File "<stdin>", line 2, in <module>
      File "/usr/local/lib/python3.5/gzip.py", line 249, in write
        raise OSError(errno.EBADF, "write() on read-only GzipFile object")
    OSError: [Errno 9] write() on read-only GzipFile object
    

    This occurs because without a mode argument passed to its initializer, GzipFile checks that the fp object's mode starts with 'w', 'a', or 'x'.

    I think this is a problem in gzip. It has only READ and WRITE modes, and it ignores the '+'.

    @iritkatriel iritkatriel added 3.12 bugs and security fixes and removed 3.11 only security fixes labels Apr 11, 2023
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    3.12 bugs and security fixes interpreter-core (Objects, Python, Grammar, and Parser dirs) stdlib Python modules in the Lib dir topic-IO type-bug An unexpected behavior, bug, or error
    Projects
    None yet
    Development

    No branches or pull requests

    3 participants