Issue 9611: FileIO not 64-bit safe under Windows

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

This issue has been migrated to GitHub: https://github.com/python/cpython/issues/53820

classification

Title:	FileIO not 64-bit safe under Windows
Type:	behavior	Stage:
Components:	Extension Modules, IO, Windows	Versions:	Python 3.1, Python 3.2, Python 2.7

process

Status:	closed	Resolution:	fixed
Dependencies:		Superseder:
Assigned To:		Nosy List:	amaury.forgeotdarc, brian.curtin, loewis, mspacek, pitrou, python-dev, tim.golden, vstinner
Priority:	normal	Keywords:	patch

Created on 2010-08-15 17:25 by pitrou, last changed 2022-04-11 14:57 by admin. This issue is now closed.

Files
File name	Uploaded	Description	Edit
read_write_32bits.patch	vstinner, 2010-12-02 02:45

Messages (28)
msg113974 - (view)	Author: Antoine Pitrou (pitrou) *	Date: 2010-08-15 17:25
Modules/_io/fileio.c assumes that read() and write() allow a Py_ssize_t length, but under Windows the length is an int, limiting chunk size to 2GB. It should be easy enough to read or write in multiple chunks, although testing might be difficult.
msg113980 - (view)	Author: Antoine Pitrou (pitrou) *	Date: 2010-08-15 17:58
os.write() (in posixmodule.c) is also affected. os.read(), however, is limited to 32-bit inputs.
msg113985 - (view)	Author: Martin v. Löwis (loewis) *	Date: 2010-08-15 18:12
Using chunked writes is tricky. If the first write succeeds, and the second one fails, how do you report the result?
msg113988 - (view)	Author: Antoine Pitrou (pitrou) *	Date: 2010-08-15 18:20
The write() man page says: The number of bytes written may be less than count if, for example, there is insufficient space on the underlying physical medium, or the RLIMIT_FSIZE resource limit is encountered (see setrlimit(2)), or the call was interrupted by a signal handler after having written less than count bytes. (See also pipe(7).) So, we could return the number of bytes successfully written, and let the next call fail. Another possibility is to only write 2GB-1 and let the caller retry. Most people use buffered I/O, and the buffered layer automatically retries.
msg113990 - (view)	Author: Martin v. Löwis (loewis) *	Date: 2010-08-15 18:39
> Most people use buffered I/O, and the buffered layer automatically retries. I see. I think this is already slightly problematic: if you send an interrupt, it won't oblige. IMO, any such loop ought to be interruptable.
msg113992 - (view)	Author: Antoine Pitrou (pitrou) *	Date: 2010-08-15 18:44
> > Most people use buffered I/O, and the buffered layer automatically retries. > > I see. I think this is already slightly problematic: if you send an > interrupt, it won't oblige. IMO, any such loop ought to be > interruptable. Well, the loop stops when an error status is returned by the raw IO layer. At that point, the buffered IO layer re-raises the error after a bit of internal cleanup.
msg113995 - (view)	Author: Martin v. Löwis (loewis) *	Date: 2010-08-15 18:53
> Well, the loop stops when an error status is returned by the raw IO > layer. At that point, the buffered IO layer re-raises the error after a > bit of internal cleanup. Assume the following case: 1. writing starts, and writes some data 2. Ctrl-C is pressed, raises a signal, and interrupts the current system call (EINTR) 3. having already written data, the signal is discarded, and the number of successfully written bytes is returned. 4. the loop retries to write the rest. Not receiving any signal anymore, the subsequent write operations wait for completion. End consequence: the signal is discarded without any effect.
msg113997 - (view)	Author: Antoine Pitrou (pitrou) *	Date: 2010-08-15 19:05
Le dimanche 15 août 2010 à 18:53 +0000, Martin v. Löwis a écrit : > Martin v. Löwis <martin@v.loewis.de> added the comment: > > > Well, the loop stops when an error status is returned by the raw IO > > layer. At that point, the buffered IO layer re-raises the error after a > > bit of internal cleanup. > > Assume the following case: > 1. writing starts, and writes some data > 2. Ctrl-C is pressed, raises a signal, and interrupts the current > system call (EINTR) > 3. having already written data, the signal is discarded, and the > number of successfully written bytes is returned. > 4. the loop retries to write the rest. Not receiving any signal > anymore, the subsequent write operations wait for completion. Ok, I guess the loop should run PyErr_CheckSignals() somewhere. Simulate such a situation in an unit test will be a bit tricky. Perhaps we can use os.pipe() and depend on the fact that writes greater than the pipe buffer size will be blocking.
msg120388 - (view)	Author: Martin Spacek (mspacek)	Date: 2010-11-04 09:50
We've got a near duplicate Issue9015. This thread seems more considered though :) Note that writing more than 2**32-1 bytes at once results in a hung process with 100% CPU in 64-bit Windows, which has to be killed with Task Manager. So I think that qualifies as a crash. This is apparently due to fwrite limitations in msvc, even in win64.
msg120389 - (view)	Author: Amaury Forgeot d'Arc (amaury.forgeotdarc) *	Date: 2010-11-04 10:35
Fortunately, the lower-level write() has no such bug, at least when used in binary mode as FileIO does: it's almost a direct call to WriteFile(). This issue is more considered because it's not a bug in the Microsoft CRT, but in the Python implementation of the IO stack. About the issue, I'd suggest to just clamp the length to 2**32-1, and don't bother retrying; leave this to the buffered layer.
msg120394 - (view)	Author: Antoine Pitrou (pitrou) *	Date: 2010-11-04 12:17
> About the issue, I'd suggest to just clamp the length to 232-1, and > don't bother retrying; leave this to the buffered layer. Yes, I think it's reasonable. (or perhaps clamp to something page-aligned, such as 232-4096). Also, the signal issue which was raised by Martin above has since been fixed in r84239.
msg120430 - (view)	Author: Martin v. Löwis (loewis) *	Date: 2010-11-04 19:50
I propose a different solution: On Windows, instead of calling write(), we call WriteFile directly. We try to faithfully follow the CRT implementation as much as possible, knowing that what we write to actually is a binary file (in particular, the file handle should get locked). We should perhaps make an exception for the standard handles (0,1,2), and fall back to call the CRT write unless we determine they are also in binary mode.
msg120431 - (view)	Author: Antoine Pitrou (pitrou) *	Date: 2010-11-04 20:01
> I propose a different solution: On Windows, instead of calling > write(), we call WriteFile directly. If I'm not mistaken, WriteFile takes the length as a DWORD, which is still 32 bits under Win64.
msg120434 - (view)	Author: Martin v. Löwis (loewis) *	Date: 2010-11-04 20:22
> If I'm not mistaken, WriteFile takes the length as a DWORD, which is > still 32 bits under Win64. Oops, ignore me, then... I agree that clamping is fine, assuming the buffering layer takes that into account.
msg120443 - (view)	Author: Amaury Forgeot d'Arc (amaury.forgeotdarc) *	Date: 2010-11-04 21:19
On a second thought... is there another example where a blocking stream does not write all the data without raising an exception?
msg120444 - (view)	Author: Martin v. Löwis (loewis) *	Date: 2010-11-04 21:24
> On a second thought... is there another example where a blocking stream does not write all the data without raising an exception? Why do you think this would be somehow an example for a blocking stream that does not write all data without raising an exception?
msg120445 - (view)	Author: Amaury Forgeot d'Arc (amaury.forgeotdarc) *	Date: 2010-11-04 21:28
> Why do you think this would be somehow an example for a blocking stream > that does not write all data without raising an exception? Well, that's what "clamping" means, isn't it?
msg120446 - (view)	Author: Antoine Pitrou (pitrou) *	Date: 2010-11-04 21:31
> On a second thought... is there another example where a blocking > stream does not write all the data without raising an exception? It can happen with pipes or sockets, if some buffer can only store part of the data. Or any regular stream if a signal is received after part of the data has been written.
msg120448 - (view)	Author: Martin v. Löwis (loewis) *	Date: 2010-11-04 21:37
Am 04.11.2010 22:28, schrieb Amaury Forgeot d'Arc: > > Amaury Forgeot d'Arc <amauryfa@gmail.com> added the comment: > >> Why do you think this would be somehow an example for a blocking stream >> that does not write all data without raising an exception? > Well, that's what "clamping" means, isn't it? Ah, then I misunderstood. We certainly should discard any data. I understood the proposal as rejecting write attempts for larger blocks with an exception.
msg123041 - (view)	Author: STINNER Victor (vstinner) *	Date: 2010-12-02 02:43
I agree that clamping is a nice solution, and attached patch implements it. About the question of the loop: FileIO.readall() uses while(1) without checking for signals. It looks like new_buffersize() maximum size is not BIGCHUNK but (BIGCHUNK-1)2: / Keep doubling until we reach BIGCHUNK; then keep adding BIGCHUNK. / if (currentsize <= BIGCHUNK) return currentsize + currentsize; else return currentsize + BIGCHUNK; Is it a bug? But (BIGCHUNK-1)2 is always smaller than INT_MAX. So readall() isn't affected by this bug. Should it be patched? posix.read() doesn't have the bug because it uses an "int" for the size.
msg123042 - (view)	Author: STINNER Victor (vstinner) *	Date: 2010-12-02 02:45
(oops, the patch contained unrelated changes: remove trailing spaces)
msg125257 - (view)	Author: STINNER Victor (vstinner) *	Date: 2011-01-04 00:32
As asked by Antoine, I commited my patch: r87722. ... But I don't know if it fixes the issue or not, I don't have access to a Windows with more than 4 GB of memory.
msg125313 - (view)	Author: STINNER Victor (vstinner) *	Date: 2011-01-04 13:15
r87734 fixes stdprinter.write().
msg125961 - (view)	Author: STINNER Victor (vstinner) *	Date: 2011-01-11 00:07
r87917 removes (useless and dangerous) conversion to size_t.
msg139836 - (view)	Author: Roundup Robot (python-dev)	Date: 2011-07-05 09:28
New changeset 6abbc5f68e20 by Victor Stinner in branch '2.7': Issue #9611, #9015: FileIO.read(), FileIO.readinto(), FileIO.write() and http://hg.python.org/cpython/rev/6abbc5f68e20
msg139838 - (view)	Author: Roundup Robot (python-dev)	Date: 2011-07-05 09:38
New changeset 7acdf9f5eb31 by Victor Stinner in branch '3.2': Issue #9611, #9015: FileIO.read() clamps the length to INT_MAX on Windows. http://hg.python.org/cpython/rev/7acdf9f5eb31 New changeset e8646f120330 by Victor Stinner in branch 'default': (merge 3.2) Issue #9611, #9015: FileIO.read() clamps the length to INT_MAX on Windows. http://hg.python.org/cpython/rev/e8646f120330
msg139840 - (view)	Author: STINNER Victor (vstinner) *	Date: 2011-07-05 09:49
I backported fixes to 2.7, and also add a new fix to FileIO.read(). I don't see anything else to do on this issue, so I close it. Note: read() and write() methods the file object in 2.7 are 64-bit safe on any OS. They use fread() and frwrite() which take a length in the size_t type, not in int type even on Windows.
msg139841 - (view)	Author: STINNER Victor (vstinner) *	Date: 2011-07-05 09:51
Issue #9015 has been marked as a duplicate of this issue.

History
Date	User	Action	Args
2022-04-11 14:57:05	admin	set	github: 53820
2011-07-05 09:51:53	vstinner	set	messages: + msg139841
2011-07-05 09:49:52	vstinner	set	status: open -> closed resolution: fixed messages: + msg139840
2011-07-05 09:38:43	python-dev	set	messages: + msg139838
2011-07-05 09:28:35	python-dev	set	nosy: + python-dev messages: + msg139836
2011-01-11 00:07:01	vstinner	set	nosy: loewis, amaury.forgeotdarc, pitrou, vstinner, tim.golden, brian.curtin, mspacek messages: + msg125961
2011-01-04 13:15:54	vstinner	set	nosy: loewis, amaury.forgeotdarc, pitrou, vstinner, tim.golden, brian.curtin, mspacek messages: + msg125313
2011-01-04 00:32:50	vstinner	set	nosy: loewis, amaury.forgeotdarc, pitrou, vstinner, tim.golden, brian.curtin, mspacek messages: + msg125257
2010-12-02 02:45:22	vstinner	set	files: - read_write_32bits.patch
2010-12-02 02:45:09	vstinner	set	files: + read_write_32bits.patch messages: + msg123042
2010-12-02 02:43:28	vstinner	set	files: + read_write_32bits.patch keywords: + patch messages: + msg123041
2010-11-04 21:37:33	loewis	set	messages: + msg120448
2010-11-04 21:31:22	pitrou	set	messages: + msg120446
2010-11-04 21:28:38	amaury.forgeotdarc	set	messages: + msg120445
2010-11-04 21:24:10	loewis	set	messages: + msg120444
2010-11-04 21:19:07	amaury.forgeotdarc	set	messages: + msg120443
2010-11-04 20:22:04	loewis	set	messages: + msg120434
2010-11-04 20:01:59	pitrou	set	messages: + msg120431
2010-11-04 19:50:58	loewis	set	messages: + msg120430
2010-11-04 12:17:06	pitrou	set	messages: + msg120394
2010-11-04 12:04:23	vstinner	set	nosy: + vstinner
2010-11-04 10:35:10	amaury.forgeotdarc	set	nosy: + amaury.forgeotdarc messages: + msg120389
2010-11-04 09:50:56	mspacek	set	nosy: + mspacek messages: + msg120388 components: + IO
2010-08-15 19:05:37	pitrou	set	messages: + msg113997
2010-08-15 18:53:44	loewis	set	messages: + msg113995
2010-08-15 18:44:17	pitrou	set	messages: + msg113992
2010-08-15 18:39:26	loewis	set	messages: + msg113990
2010-08-15 18:20:39	pitrou	set	messages: + msg113988
2010-08-15 18:12:37	loewis	set	nosy: + loewis messages: + msg113985
2010-08-15 18:02:51	pitrou	set	nosy: + tim.golden, brian.curtin
2010-08-15 17:58:38	pitrou	set	messages: + msg113980
2010-08-15 17:25:41	pitrou	create