Issue 4428: make io.BufferedWriter observe max_buffer_size limits

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

This issue has been migrated to GitHub: https://github.com/python/cpython/issues/48678

classification

Title:	make io.BufferedWriter observe max_buffer_size limits
Type:	resource usage	Stage:
Components:	Library (Lib)	Versions:	Python 3.0, Python 2.7, Python 2.6

process

Status:	closed	Resolution:	out of date
Dependencies:		Superseder:
Assigned To:	gregory.p.smith	Nosy List:	amaury.forgeotdarc, beazley, gregory.p.smith, gvanrossum, loewis, pitrou
Priority:	normal	Keywords:	needs review, patch

Created on 2008-11-25 12:34 by beazley, last changed 2022-04-11 14:56 by admin. This issue is now closed.

Files
File name	Uploaded	Description	Edit
issue4428-io-bufwrite-gps04.diff	gregory.p.smith, 2009-01-20 05:03	obey max_buffer_size and blocking semantics, with unit tests
issue4428-io-bufwrite-gps05.diff	gregory.p.smith, 2009-02-01 06:26

Messages (15)
msg76408 - (view)	Author: David M. Beazley (beazley)	Date: 2008-11-25 12:34
The Buffered I/O interface in the io module has the user specify buffer limits such as size and max_buffer_size. The first limit (size) is easy to understand as a buffering threshold at which writes will occur. However, no apparent attempt is made to strictly limit the internal buffer size to max_buffer_size. In BuffererWriter.write(), one of the first operations is self._write_buf.extend(b) which simply extends the buffer by the full data being written. If b happens to be a large string (e.g., megabytes or even the entire contents of a big file), then the internal I/O buffer makes a complete copy of the data, effectively doubling the memory requirements for carrying out the write operation. I suppose most programmers might not notice given that everyone has gigabytes of RAM these days, but you certainly don't see this kind of buffering behavior in the operating system kernel or in the C library. Some patch suggestions (details left to the maintainers of this module): 1. Don't extend self._write_buf by more than the max_buffer_size. fragment = b[:self.max_buffer_size - len(self._write_buf)] self._write_buf.extend(fragment) 2. For large blocking writes, simply carry out the remaining I/O operations in the write() method instead of in the _flush_locked() method. Try to use the original input data b as the data source instead of making copies of it. And if you have to copy the data, don't do it all at once.
msg76571 - (view)	Author: Gregory P. Smith (gregory.p.smith) *	Date: 2008-11-29 00:54
I've attached my first attempt at fixing this as io-bufwrite-gps01. Unfortunately it causes the Lib/test/test_io.py testThreads to fail: ====================================================================== FAIL: testThreads (__main__.BufferedReaderTest) ---------------------------------------------------------------------- Traceback (most recent call last): File "Lib/test/test_io.py", line 456, in testThreads self.assertEqual(s.count(c), N) AssertionError: 30 != 1000 I haven't read the test yet, but my guess is that the test is broken and is blindly calling write() without checking the return value.
msg76573 - (view)	Author: Gregory P. Smith (gregory.p.smith) *	Date: 2008-11-29 01:01
Yep, the test was ignoring the return value from BufferedWriter.write. Fixed in the attached io-bufwrite-gps02. This can wait for 3.0.1 and 2.6.1/2 (depending on the 2.6.x release schedule).
msg76574 - (view)	Author: Amaury Forgeot d'Arc (amaury.forgeotdarc) *	Date: 2008-11-29 01:15
I do not agree. User code should not have to verify the return value of write(). When a big amount of data is passed to BufferedWriter.write, - normal blocking streams should block until all data has been written. (the implementation may call self.raw.write several times) - non-blocking streams should raise BlockingIOError indicating the number of characters actually written. This applies only to buffered files, of course.
msg76576 - (view)	Author: Gregory P. Smith (gregory.p.smith) *	Date: 2008-11-29 01:42
I agree with Amuary, write() traditionally never writes less data unless the underlying IO is in nonblocking mode. I'm working up a new patch to write to self.raw in max_buffer_size chunks with as few data copies as possible, raising an appropriate BlockingIOError (as the surrounding code does) when the IO on the underlying self.raw.write raises one through self._flush_unlocked().
msg76581 - (view)	Author: Gregory P. Smith (gregory.p.smith) *	Date: 2008-11-29 02:38
Okay, here's a new patch that obeys the blocking vs nonblocking semantics properly. It still needs explicit unit tests for proper behavior.
msg76590 - (view)	Author: Martin v. Löwis (loewis) *	Date: 2008-11-29 12:49
According to msg76581, the patch fails review (lacking test cases), so I'm removing the "needs review" keyword.
msg76591 - (view)	Author: David M. Beazley (beazley)	Date: 2008-11-29 13:24
I agree with previous comments that write() should definitely write all data when in blocking mode.
msg80234 - (view)	Author: Gregory P. Smith (gregory.p.smith) *	Date: 2009-01-20 05:07
Updated patch with unit tests to verify the write behavior attached in -gps04. Up for code review at http://codereview.appspot.com/12470/show
msg80292 - (view)	Author: Antoine Pitrou (pitrou) *	Date: 2009-01-20 23:27
Hi! that sounds > like a behavior change. I'd be fine with removing it for the 3.1/2.7 version of > this code (though I hope people will be using the C implementation instead). Well, either it's supported and it will have to go through a deprecation phase, or it's unsupported and it can be ripped out right now... I don't think it should be supported at all, given that the semantics of writing an iterable of ints are totally non-obvious. Reading both the PEP and the docstrings in io.py, I only see mentions of "bytes" and "buffer", not of an iterable of ints. Perhaps Guido should pronounce. (do you know of any code relying on this behaviour? the C version obviously does not support it and all regression tests pass fine, except for an SSL bug I filed) http://codereview.appspot.com/12470
msg80299 - (view)	Author: Gregory P. Smith (gregory.p.smith) *	Date: 2009-01-21 00:18
I know of no code relying on this behavior, I merely duplicated the behavior that existed while writing the patch. I'm happy to simplify. Adding Guido to the CC list for a pronouncement. Guido is it okay to change io.BufferedWriter.write() in release30-maint to no longer accept iterables that do not support len() and things that do not support the buffer API? Antoine already notes in the codereview that other existing io classes such as FileIO currently do not support non bytes/buffer or buffer-api supporting inputs: http://codereview.appspot.com/12470/patch/1/3
msg80304 - (view)	Author: Guido van Rossum (gvanrossum) *	Date: 2009-01-21 01:01
@Gregory, that sounds like an odd enough use case to skip. However you might want to look for __length_hint__ before giving up? OTOH unless the use case is real, why not support it but making it slow?
msg80902 - (view)	Author: Gregory P. Smith (gregory.p.smith) *	Date: 2009-02-01 06:29
I've uploaded a new patch set with more extensive unit tests. It also handles the case of writing array.array objects (or anything with a memoryview itemsize > 1). The previous code would buffer by item rather than by byte. it has been updated in codereview as well.
msg80903 - (view)	Author: Gregory P. Smith (gregory.p.smith) *	Date: 2009-02-01 06:30
fwiw, I decided Guido and Antoine were right and took out the support for input that did not support len() to keep things a bit simpler.
msg93792 - (view)	Author: Antoine Pitrou (pitrou) *	Date: 2009-10-09 14:04
max_buffer_size is no longer used, so this issue is obsolete ;)

History
Date	User	Action	Args
2022-04-11 14:56:41	admin	set	github: 48678
2009-10-09 14:04:56	pitrou	set	status: open -> closed resolution: out of date messages: + msg93792
2009-02-01 06:30:55	gregory.p.smith	set	messages: + msg80903
2009-02-01 06:29:49	gregory.p.smith	set	messages: + msg80902
2009-02-01 06:26:49	gregory.p.smith	set	files: + issue4428-io-bufwrite-gps05.diff
2009-01-21 01:01:21	gvanrossum	set	messages: + msg80304
2009-01-21 00:18:11	gregory.p.smith	set	nosy: + gvanrossum messages: + msg80299
2009-01-20 23:27:50	pitrou	set	nosy: + pitrou messages: + msg80292 title: io.BufferedWriter does not observe buffer size limits -> make io.BufferedWriter observe max_buffer_size limits
2009-01-20 05:07:20	gregory.p.smith	set	files: - issue4428-io-bufwrite-gps03.diff
2009-01-20 05:07:06	gregory.p.smith	set	keywords: + needs review messages: + msg80234
2009-01-20 05:03:51	gregory.p.smith	set	files: + issue4428-io-bufwrite-gps04.diff
2009-01-20 04:16:55	gregory.p.smith	set	files: - issue4428-io-bufwrite-gps02.diff
2008-11-29 13:24:50	beazley	set	messages: + msg76591
2008-11-29 12:49:10	loewis	set	keywords: - needs review nosy: + loewis messages: + msg76590
2008-11-29 02:38:52	gregory.p.smith	set	files: - issue4428-io-bufwrite-gps01.diff
2008-11-29 02:38:27	gregory.p.smith	set	files: + issue4428-io-bufwrite-gps03.diff messages: + msg76581
2008-11-29 01:42:47	gregory.p.smith	set	messages: + msg76576
2008-11-29 01:15:44	amaury.forgeotdarc	set	nosy: + amaury.forgeotdarc messages: + msg76574
2008-11-29 01:01:37	gregory.p.smith	set	keywords: + needs review files: + issue4428-io-bufwrite-gps02.diff messages: + msg76573
2008-11-29 00:54:29	gregory.p.smith	set	files: + issue4428-io-bufwrite-gps01.diff keywords: + patch messages: + msg76571
2008-11-29 00:13:00	gregory.p.smith	set	priority: normal assignee: gregory.p.smith nosy: + gregory.p.smith
2008-11-25 12:34:59	beazley	create