msg76408 - (view) |
Author: David M. Beazley (beazley) |
Date: 2008-11-25 12:34 |
The Buffered I/O interface in the io module has the user specify buffer
limits such as size and max_buffer_size. The first limit (size) is
easy to understand as a buffering threshold at which writes will occur.
However, no apparent attempt is made to strictly limit the internal
buffer size to max_buffer_size.
In BuffererWriter.write(), one of the first operations is
self._write_buf.extend(b)
which simply extends the buffer by the full data being written. If b
happens to be a large string (e.g., megabytes or even the entire
contents of a big file), then the internal I/O buffer makes a complete
copy of the data, effectively doubling the memory requirements for
carrying out the write operation.
I suppose most programmers might not notice given that everyone has
gigabytes of RAM these days, but you certainly don't see this kind of
buffering behavior in the operating system kernel or in the C library.
Some patch suggestions (details left to the maintainers of this module):
1. Don't extend self._write_buf by more than the max_buffer_size.
fragment = b[:self.max_buffer_size - len(self._write_buf)]
self._write_buf.extend(fragment)
2. For large blocking writes, simply carry out the remaining I/O
operations in the write() method instead of in the _flush_locked()
method. Try to use the original input data b as the data
source instead of making copies of it. And if you have to copy
the data, don't do it all at once.
|
msg76571 - (view) |
Author: Gregory P. Smith (gregory.p.smith) * |
Date: 2008-11-29 00:54 |
I've attached my first attempt at fixing this as io-bufwrite-gps01.
Unfortunately it causes the Lib/test/test_io.py testThreads to fail:
======================================================================
FAIL: testThreads (__main__.BufferedReaderTest)
----------------------------------------------------------------------
Traceback (most recent call last):
File "Lib/test/test_io.py", line 456, in testThreads
self.assertEqual(s.count(c), N)
AssertionError: 30 != 1000
I haven't read the test yet, but my guess is that the test is broken and
is blindly calling write() without checking the return value.
|
msg76573 - (view) |
Author: Gregory P. Smith (gregory.p.smith) * |
Date: 2008-11-29 01:01 |
Yep, the test was ignoring the return value from BufferedWriter.write.
Fixed in the attached io-bufwrite-gps02.
This can wait for 3.0.1 and 2.6.1/2 (depending on the 2.6.x release
schedule).
|
msg76574 - (view) |
Author: Amaury Forgeot d'Arc (amaury.forgeotdarc) * |
Date: 2008-11-29 01:15 |
I do not agree. User code should not have to verify the return value of write().
When a big amount of data is passed to BufferedWriter.write,
- normal blocking streams should block until all data has been written.
(the implementation may call self.raw.write several times)
- non-blocking streams should raise BlockingIOError indicating the number of
characters actually written.
This applies only to buffered files, of course.
|
msg76576 - (view) |
Author: Gregory P. Smith (gregory.p.smith) * |
Date: 2008-11-29 01:42 |
I agree with Amuary, write() traditionally never writes less data unless
the underlying IO is in nonblocking mode.
I'm working up a new patch to write to self.raw in max_buffer_size
chunks with as few data copies as possible, raising an appropriate
BlockingIOError (as the surrounding code does) when the IO on the
underlying self.raw.write raises one through self._flush_unlocked().
|
msg76581 - (view) |
Author: Gregory P. Smith (gregory.p.smith) * |
Date: 2008-11-29 02:38 |
Okay, here's a new patch that obeys the blocking vs nonblocking
semantics properly. It still needs explicit unit tests for proper behavior.
|
msg76590 - (view) |
Author: Martin v. Löwis (loewis) * |
Date: 2008-11-29 12:49 |
According to msg76581, the patch fails review (lacking test cases), so
I'm removing the "needs review" keyword.
|
msg76591 - (view) |
Author: David M. Beazley (beazley) |
Date: 2008-11-29 13:24 |
I agree with previous comments that write() should definitely write all
data when in blocking mode.
|
msg80234 - (view) |
Author: Gregory P. Smith (gregory.p.smith) * |
Date: 2009-01-20 05:07 |
Updated patch with unit tests to verify the write behavior attached in
-gps04. Up for code review at http://codereview.appspot.com/12470/show
|
msg80292 - (view) |
Author: Antoine Pitrou (pitrou) * |
Date: 2009-01-20 23:27 |
Hi!
that sounds
> like a behavior change. I'd be fine with removing it for the 3.1/2.7
version of
> this code (though I hope people will be using the C implementation
instead).
Well, either it's supported and it will have to go through a deprecation
phase, or it's unsupported and it can be ripped out right now...
I don't think it should be supported at all, given that the semantics of
writing an iterable of ints are totally non-obvious. Reading both the
PEP and the docstrings in io.py, I only see mentions of "bytes" and
"buffer", not of an iterable of ints. Perhaps Guido should pronounce.
(do you know of any code relying on this behaviour? the C version
obviously does not support it and all regression tests pass fine, except
for an SSL bug I filed)
http://codereview.appspot.com/12470
|
msg80299 - (view) |
Author: Gregory P. Smith (gregory.p.smith) * |
Date: 2009-01-21 00:18 |
I know of no code relying on this behavior, I merely duplicated the
behavior that existed while writing the patch. I'm happy to simplify.
Adding Guido to the CC list for a pronouncement.
Guido is it okay to change io.BufferedWriter.write() in release30-maint
to no longer accept iterables that do not support len() and things that
do not support the buffer API?
Antoine already notes in the codereview that other existing io classes
such as FileIO currently do not support non bytes/buffer or buffer-api
supporting inputs: http://codereview.appspot.com/12470/patch/1/3
|
msg80304 - (view) |
Author: Guido van Rossum (gvanrossum) * |
Date: 2009-01-21 01:01 |
@Gregory, that sounds like an odd enough use case to skip. However you
might want to look for __length_hint__ before giving up?
OTOH unless the use case is real, why not support it but making it slow?
|
msg80902 - (view) |
Author: Gregory P. Smith (gregory.p.smith) * |
Date: 2009-02-01 06:29 |
I've uploaded a new patch set with more extensive unit tests. It also
handles the case of writing array.array objects (or anything with a
memoryview itemsize > 1). The previous code would buffer by item rather
than by byte. it has been updated in codereview as well.
|
msg80903 - (view) |
Author: Gregory P. Smith (gregory.p.smith) * |
Date: 2009-02-01 06:30 |
fwiw, I decided Guido and Antoine were right and took out the support
for input that did not support len() to keep things a bit simpler.
|
msg93792 - (view) |
Author: Antoine Pitrou (pitrou) * |
Date: 2009-10-09 14:04 |
max_buffer_size is no longer used, so this issue is obsolete ;)
|
|
Date |
User |
Action |
Args |
2022-04-11 14:56:41 | admin | set | github: 48678 |
2009-10-09 14:04:56 | pitrou | set | status: open -> closed resolution: out of date messages:
+ msg93792
|
2009-02-01 06:30:55 | gregory.p.smith | set | messages:
+ msg80903 |
2009-02-01 06:29:49 | gregory.p.smith | set | messages:
+ msg80902 |
2009-02-01 06:26:49 | gregory.p.smith | set | files:
+ issue4428-io-bufwrite-gps05.diff |
2009-01-21 01:01:21 | gvanrossum | set | messages:
+ msg80304 |
2009-01-21 00:18:11 | gregory.p.smith | set | nosy:
+ gvanrossum messages:
+ msg80299 |
2009-01-20 23:27:50 | pitrou | set | nosy:
+ pitrou messages:
+ msg80292 title: io.BufferedWriter does not observe buffer size limits -> make io.BufferedWriter observe max_buffer_size limits |
2009-01-20 05:07:20 | gregory.p.smith | set | files:
- issue4428-io-bufwrite-gps03.diff |
2009-01-20 05:07:06 | gregory.p.smith | set | keywords:
+ needs review messages:
+ msg80234 |
2009-01-20 05:03:51 | gregory.p.smith | set | files:
+ issue4428-io-bufwrite-gps04.diff |
2009-01-20 04:16:55 | gregory.p.smith | set | files:
- issue4428-io-bufwrite-gps02.diff |
2008-11-29 13:24:50 | beazley | set | messages:
+ msg76591 |
2008-11-29 12:49:10 | loewis | set | keywords:
- needs review nosy:
+ loewis messages:
+ msg76590 |
2008-11-29 02:38:52 | gregory.p.smith | set | files:
- issue4428-io-bufwrite-gps01.diff |
2008-11-29 02:38:27 | gregory.p.smith | set | files:
+ issue4428-io-bufwrite-gps03.diff messages:
+ msg76581 |
2008-11-29 01:42:47 | gregory.p.smith | set | messages:
+ msg76576 |
2008-11-29 01:15:44 | amaury.forgeotdarc | set | nosy:
+ amaury.forgeotdarc messages:
+ msg76574 |
2008-11-29 01:01:37 | gregory.p.smith | set | keywords:
+ needs review files:
+ issue4428-io-bufwrite-gps02.diff messages:
+ msg76573 |
2008-11-29 00:54:29 | gregory.p.smith | set | files:
+ issue4428-io-bufwrite-gps01.diff keywords:
+ patch messages:
+ msg76571 |
2008-11-29 00:13:00 | gregory.p.smith | set | priority: normal assignee: gregory.p.smith nosy:
+ gregory.p.smith |
2008-11-25 12:34:59 | beazley | create | |