Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

FileIO not 64-bit safe under Windows #53820

Closed
pitrou opened this issue Aug 15, 2010 · 28 comments
Closed

FileIO not 64-bit safe under Windows #53820

pitrou opened this issue Aug 15, 2010 · 28 comments
Labels
extension-modules C modules in the Modules dir OS-windows topic-IO type-bug An unexpected behavior, bug, or error

Comments

@pitrou
Copy link
Member

pitrou commented Aug 15, 2010

BPO 9611
Nosy @loewis, @amauryfa, @pitrou, @vstinner, @tjguk, @briancurtin
Files
  • read_write_32bits.patch
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = None
    closed_at = <Date 2011-07-05.09:49:52.592>
    created_at = <Date 2010-08-15.17:25:41.177>
    labels = ['extension-modules', 'type-bug', 'expert-IO', 'OS-windows']
    title = 'FileIO not 64-bit safe under Windows'
    updated_at = <Date 2011-07-05.09:51:53.940>
    user = 'https://github.com/pitrou'

    bugs.python.org fields:

    activity = <Date 2011-07-05.09:51:53.940>
    actor = 'vstinner'
    assignee = 'none'
    closed = True
    closed_date = <Date 2011-07-05.09:49:52.592>
    closer = 'vstinner'
    components = ['Extension Modules', 'Windows', 'IO']
    creation = <Date 2010-08-15.17:25:41.177>
    creator = 'pitrou'
    dependencies = []
    files = ['19899']
    hgrepos = []
    issue_num = 9611
    keywords = ['patch']
    message_count = 28.0
    messages = ['113974', '113980', '113985', '113988', '113990', '113992', '113995', '113997', '120388', '120389', '120394', '120430', '120431', '120434', '120443', '120444', '120445', '120446', '120448', '123041', '123042', '125257', '125313', '125961', '139836', '139838', '139840', '139841']
    nosy_count = 8.0
    nosy_names = ['loewis', 'amaury.forgeotdarc', 'pitrou', 'vstinner', 'tim.golden', 'brian.curtin', 'mspacek', 'python-dev']
    pr_nums = []
    priority = 'normal'
    resolution = 'fixed'
    stage = None
    status = 'closed'
    superseder = None
    type = 'behavior'
    url = 'https://bugs.python.org/issue9611'
    versions = ['Python 3.1', 'Python 2.7', 'Python 3.2']

    @pitrou
    Copy link
    Member Author

    pitrou commented Aug 15, 2010

    Modules/_io/fileio.c assumes that read() and write() allow a Py_ssize_t length, but under Windows the length is an int, limiting chunk size to 2GB.

    It should be easy enough to read or write in multiple chunks, although testing might be difficult.

    @pitrou pitrou added extension-modules C modules in the Modules dir OS-windows type-bug An unexpected behavior, bug, or error labels Aug 15, 2010
    @pitrou
    Copy link
    Member Author

    pitrou commented Aug 15, 2010

    os.write() (in posixmodule.c) is also affected. os.read(), however, is limited to 32-bit inputs.

    @loewis
    Copy link
    Mannequin

    loewis mannequin commented Aug 15, 2010

    Using chunked writes is tricky. If the first write succeeds, and the second one fails, how do you report the result?

    @pitrou
    Copy link
    Member Author

    pitrou commented Aug 15, 2010

    The write() man page says:

       The number of bytes written may be less than count if, for example, there is  insufficient  space  on
       the underlying physical medium, or the RLIMIT_FSIZE resource limit is encountered (see setrlimit(2)),
       or the call was interrupted by a signal handler after having written less  than  count  bytes.   (See
       also pipe(7).)
    

    So, we could return the number of bytes successfully written, and let the next call fail.

    Another possibility is to only write 2GB-1 and let the caller retry.
    Most people use buffered I/O, and the buffered layer automatically retries.

    @loewis
    Copy link
    Mannequin

    loewis mannequin commented Aug 15, 2010

    Most people use buffered I/O, and the buffered layer automatically retries.

    I see. I think this is already slightly problematic: if you send an
    interrupt, it won't oblige. IMO, any such loop ought to be
    interruptable.

    @pitrou
    Copy link
    Member Author

    pitrou commented Aug 15, 2010

    > Most people use buffered I/O, and the buffered layer automatically retries.

    I see. I think this is already slightly problematic: if you send an
    interrupt, it won't oblige. IMO, any such loop ought to be
    interruptable.

    Well, the loop stops when an error status is returned by the raw IO
    layer. At that point, the buffered IO layer re-raises the error after a
    bit of internal cleanup.

    @loewis
    Copy link
    Mannequin

    loewis mannequin commented Aug 15, 2010

    Well, the loop stops when an error status is returned by the raw IO
    layer. At that point, the buffered IO layer re-raises the error after a
    bit of internal cleanup.

    Assume the following case:

    1. writing starts, and writes some data
    2. Ctrl-C is pressed, raises a signal, and interrupts the current
      system call (EINTR)
    3. having already written data, the signal is discarded, and the
      number of successfully written bytes is returned.
    4. the loop retries to write the rest. Not receiving any signal
      anymore, the subsequent write operations wait for completion.

    End consequence: the signal is discarded without any effect.

    @pitrou
    Copy link
    Member Author

    pitrou commented Aug 15, 2010

    Le dimanche 15 août 2010 à 18:53 +0000, Martin v. Löwis a écrit :

    Martin v. Löwis <martin@v.loewis.de> added the comment:

    > Well, the loop stops when an error status is returned by the raw IO
    > layer. At that point, the buffered IO layer re-raises the error after a
    > bit of internal cleanup.

    Assume the following case:

    1. writing starts, and writes some data
    2. Ctrl-C is pressed, raises a signal, and interrupts the current
      system call (EINTR)
    3. having already written data, the signal is discarded, and the
      number of successfully written bytes is returned.
    4. the loop retries to write the rest. Not receiving any signal
      anymore, the subsequent write operations wait for completion.

    Ok, I guess the loop should run PyErr_CheckSignals() somewhere.

    Simulate such a situation in an unit test will be a bit tricky.
    Perhaps we can use os.pipe() and depend on the fact that writes greater
    than the pipe buffer size will be blocking.

    @mspacek
    Copy link
    Mannequin

    mspacek mannequin commented Nov 4, 2010

    We've got a near duplicate bpo-9015. This thread seems more considered though :) Note that writing more than 2**32-1 bytes at once results in a hung process with 100% CPU in 64-bit Windows, which has to be killed with Task Manager. So I think that qualifies as a crash. This is apparently due to fwrite limitations in msvc, even in win64.

    @mspacek mspacek mannequin added the topic-IO label Nov 4, 2010
    @amauryfa
    Copy link
    Member

    amauryfa commented Nov 4, 2010

    Fortunately, the lower-level write() has no such bug, at least when used in binary mode as FileIO does: it's almost a direct call to WriteFile().
    This issue is more considered because it's not a bug in the Microsoft CRT, but in the Python implementation of the IO stack.

    About the issue, I'd suggest to just clamp the length to 2**32-1, and don't bother retrying; leave this to the buffered layer.

    @pitrou
    Copy link
    Member Author

    pitrou commented Nov 4, 2010

    About the issue, I'd suggest to just clamp the length to 2**32-1, and
    don't bother retrying; leave this to the buffered layer.

    Yes, I think it's reasonable.
    (or perhaps clamp to something page-aligned, such as 2**32-4096).
    Also, the signal issue which was raised by Martin above has since been fixed in r84239.

    @loewis
    Copy link
    Mannequin

    loewis mannequin commented Nov 4, 2010

    I propose a different solution: On Windows, instead of calling write(), we call WriteFile directly. We try to faithfully follow the CRT implementation as much as possible, knowing that what we write to actually is a binary file (in particular, the file handle should get locked). We should perhaps make an exception for the standard handles (0,1,2), and fall back to call the CRT write unless we determine they are also in binary mode.

    @pitrou
    Copy link
    Member Author

    pitrou commented Nov 4, 2010

    I propose a different solution: On Windows, instead of calling
    write(), we call WriteFile directly.

    If I'm not mistaken, WriteFile takes the length as a DWORD, which is
    still 32 bits under Win64.

    @loewis
    Copy link
    Mannequin

    loewis mannequin commented Nov 4, 2010

    If I'm not mistaken, WriteFile takes the length as a DWORD, which is
    still 32 bits under Win64.

    Oops, ignore me, then... I agree that clamping is fine, assuming the
    buffering layer takes that into account.

    @amauryfa
    Copy link
    Member

    amauryfa commented Nov 4, 2010

    On a second thought... is there another example where a *blocking* stream does not write all the data without raising an exception?

    @loewis
    Copy link
    Mannequin

    loewis mannequin commented Nov 4, 2010

    On a second thought... is there another example where a *blocking* stream does not write all the data without raising an exception?

    Why do you think this would be somehow an example for a blocking stream
    that does not write all data without raising an exception?

    @amauryfa
    Copy link
    Member

    amauryfa commented Nov 4, 2010

    Why do you think this would be somehow an example for a blocking stream
    that does not write all data without raising an exception?
    Well, that's what "clamping" means, isn't it?

    @pitrou
    Copy link
    Member Author

    pitrou commented Nov 4, 2010

    On a second thought... is there another example where a *blocking*
    stream does not write all the data without raising an exception?

    It can happen with pipes or sockets, if some buffer can only store part
    of the data. Or any regular stream if a signal is received after part of
    the data has been written.

    @loewis
    Copy link
    Mannequin

    loewis mannequin commented Nov 4, 2010

    Am 04.11.2010 22:28, schrieb Amaury Forgeot d'Arc:

    Amaury Forgeot d'Arc <amauryfa@gmail.com> added the comment:

    > Why do you think this would be somehow an example for a blocking stream
    > that does not write all data without raising an exception?
    Well, that's what "clamping" means, isn't it?

    Ah, then I misunderstood. We certainly should discard any data.
    I understood the proposal as rejecting write attempts for larger blocks
    with an exception.

    @vstinner
    Copy link
    Member

    vstinner commented Dec 2, 2010

    I agree that clamping is a nice solution, and attached patch implements it.

    About the question of the loop: FileIO.readall() uses while(1) without checking for signals. It looks like new_buffersize() maximum size is not BIGCHUNK but (BIGCHUNK-1)*2:

        /* Keep doubling until we reach BIGCHUNK;
           then keep adding BIGCHUNK. */
        if (currentsize <= BIGCHUNK)
            return currentsize + currentsize;
        else
            return currentsize + BIGCHUNK;
    

    Is it a bug? But (BIGCHUNK-1)*2 is always smaller than INT_MAX. So readall() isn't affected by this bug. Should it be patched?

    posix.read() doesn't have the bug because it uses an "int" for the size.

    @vstinner
    Copy link
    Member

    vstinner commented Dec 2, 2010

    (oops, the patch contained unrelated changes: remove trailing spaces)

    @vstinner
    Copy link
    Member

    vstinner commented Jan 4, 2011

    As asked by Antoine, I commited my patch: r87722.

    ... But I don't know if it fixes the issue or not, I don't have access to a Windows with more than 4 GB of memory.

    @vstinner
    Copy link
    Member

    vstinner commented Jan 4, 2011

    r87734 fixes stdprinter.write().

    @vstinner
    Copy link
    Member

    r87917 removes (useless and dangerous) conversion to size_t.

    @python-dev
    Copy link
    Mannequin

    python-dev mannequin commented Jul 5, 2011

    New changeset 6abbc5f68e20 by Victor Stinner in branch '2.7':
    Issue bpo-9611, bpo-9015: FileIO.read(), FileIO.readinto(), FileIO.write() and
    http://hg.python.org/cpython/rev/6abbc5f68e20

    @python-dev
    Copy link
    Mannequin

    python-dev mannequin commented Jul 5, 2011

    New changeset 7acdf9f5eb31 by Victor Stinner in branch '3.2':
    Issue bpo-9611, bpo-9015: FileIO.read() clamps the length to INT_MAX on Windows.
    http://hg.python.org/cpython/rev/7acdf9f5eb31

    New changeset e8646f120330 by Victor Stinner in branch 'default':
    (merge 3.2) Issue bpo-9611, bpo-9015: FileIO.read() clamps the length to INT_MAX on Windows.
    http://hg.python.org/cpython/rev/e8646f120330

    @vstinner
    Copy link
    Member

    vstinner commented Jul 5, 2011

    I backported fixes to 2.7, and also add a new fix to FileIO.read(). I don't see anything else to do on this issue, so I close it.

    Note: read() and write() methods the file object in 2.7 are 64-bit safe on any OS. They use fread() and frwrite() which take a length in the size_t type, not in int type even on Windows.

    @vstinner vstinner closed this as completed Jul 5, 2011
    @vstinner
    Copy link
    Member

    vstinner commented Jul 5, 2011

    Issue bpo-9015 has been marked as a duplicate of this issue.

    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    extension-modules C modules in the Modules dir OS-windows topic-IO type-bug An unexpected behavior, bug, or error
    Projects
    None yet
    Development

    No branches or pull requests

    3 participants