New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
FileIO not 64-bit safe under Windows #53820
Comments
Modules/_io/fileio.c assumes that read() and write() allow a Py_ssize_t length, but under Windows the length is an int, limiting chunk size to 2GB. It should be easy enough to read or write in multiple chunks, although testing might be difficult. |
os.write() (in posixmodule.c) is also affected. os.read(), however, is limited to 32-bit inputs. |
Using chunked writes is tricky. If the first write succeeds, and the second one fails, how do you report the result? |
The write() man page says:
So, we could return the number of bytes successfully written, and let the next call fail. Another possibility is to only write 2GB-1 and let the caller retry. |
I see. I think this is already slightly problematic: if you send an |
Well, the loop stops when an error status is returned by the raw IO |
Assume the following case:
End consequence: the signal is discarded without any effect. |
Le dimanche 15 août 2010 à 18:53 +0000, Martin v. Löwis a écrit :
Ok, I guess the loop should run PyErr_CheckSignals() somewhere. Simulate such a situation in an unit test will be a bit tricky. |
We've got a near duplicate bpo-9015. This thread seems more considered though :) Note that writing more than 2**32-1 bytes at once results in a hung process with 100% CPU in 64-bit Windows, which has to be killed with Task Manager. So I think that qualifies as a crash. This is apparently due to fwrite limitations in msvc, even in win64. |
Fortunately, the lower-level write() has no such bug, at least when used in binary mode as FileIO does: it's almost a direct call to WriteFile(). About the issue, I'd suggest to just clamp the length to 2**32-1, and don't bother retrying; leave this to the buffered layer. |
Yes, I think it's reasonable. |
I propose a different solution: On Windows, instead of calling write(), we call WriteFile directly. We try to faithfully follow the CRT implementation as much as possible, knowing that what we write to actually is a binary file (in particular, the file handle should get locked). We should perhaps make an exception for the standard handles (0,1,2), and fall back to call the CRT write unless we determine they are also in binary mode. |
If I'm not mistaken, WriteFile takes the length as a DWORD, which is |
Oops, ignore me, then... I agree that clamping is fine, assuming the |
On a second thought... is there another example where a *blocking* stream does not write all the data without raising an exception? |
Why do you think this would be somehow an example for a blocking stream |
|
It can happen with pipes or sockets, if some buffer can only store part |
Am 04.11.2010 22:28, schrieb Amaury Forgeot d'Arc:
Ah, then I misunderstood. We certainly should discard any data. |
I agree that clamping is a nice solution, and attached patch implements it. About the question of the loop: FileIO.readall() uses while(1) without checking for signals. It looks like new_buffersize() maximum size is not BIGCHUNK but (BIGCHUNK-1)*2:
Is it a bug? But (BIGCHUNK-1)*2 is always smaller than INT_MAX. So readall() isn't affected by this bug. Should it be patched? posix.read() doesn't have the bug because it uses an "int" for the size. |
(oops, the patch contained unrelated changes: remove trailing spaces) |
As asked by Antoine, I commited my patch: r87722. ... But I don't know if it fixes the issue or not, I don't have access to a Windows with more than 4 GB of memory. |
r87734 fixes stdprinter.write(). |
r87917 removes (useless and dangerous) conversion to size_t. |
New changeset 6abbc5f68e20 by Victor Stinner in branch '2.7': |
New changeset 7acdf9f5eb31 by Victor Stinner in branch '3.2': New changeset e8646f120330 by Victor Stinner in branch 'default': |
I backported fixes to 2.7, and also add a new fix to FileIO.read(). I don't see anything else to do on this issue, so I close it. Note: read() and write() methods the file object in 2.7 are 64-bit safe on any OS. They use fread() and frwrite() which take a length in the size_t type, not in int type even on Windows. |
Issue bpo-9015 has been marked as a duplicate of this issue. |
Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.
Show more details
GitHub fields:
bugs.python.org fields:
The text was updated successfully, but these errors were encountered: