Issue 43260: Never release buffer when MemoryError in print()

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

This issue has been migrated to GitHub: https://github.com/python/cpython/issues/87426

classification

Title:	Never release buffer when MemoryError in print()
Type:	crash	Stage:	resolved
Components:	IO, Windows	Versions:	Python 3.10, Python 3.9, Python 3.8

process

Status:	closed	Resolution:	fixed
Dependencies:		Superseder:
Assigned To:		Nosy List:	Ramin Farajpour Cami, eryksun, methane, paul.moore, steve.dower, tim.golden, vstinner, zach.ware
Priority:	high	Keywords:	patch

Created on 2021-02-19 05:27 by Ramin Farajpour Cami, last changed 2022-04-11 14:59 by admin. This issue is now closed.

Pull Requests
URL	Status	Linked	Edit
PR 24592	merged	methane, 2021-02-20 03:17
PR 24617	merged	methane, 2021-02-21 23:34
PR 24618	merged	methane, 2021-02-21 23:38

Messages (15)
msg387282 - (view)	Author: Ramin Farajpour Cami (Ramin Farajpour Cami)	Date: 2021-02-19 05:27
Hi, When we use "a"10000000000 in print("a"10000000000) function, Show "MemoryError",Again i use print("1") again display "MemoryError", I think memory not release for use again, >>> print("a"1000000000) Traceback (most recent call last): File "<stdin>", line 1, in <module> MemoryError >>> print("1") Traceback (most recent call last): File "<stdin>", line 1, in <module> MemoryError When i exit() python interpeter, and again print("1") it's work. And I test in linux its' work: >>> print("a"1000000000) Traceback (most recent call last): File "<stdin>", line 1, in <module> MemoryError >>> print("1") 1 >>> print("a") a >>> Thanks. Ramin
msg387286 - (view)	Author: Ramin Farajpour Cami (Ramin Farajpour Cami)	Date: 2021-02-19 06:39
Version : Python 3.8.5 (tags/v3.8.5:580fbb0, Jul 20 2020, 15:43:08) [MSC v.1926 32 bit (Intel)] on win32
msg387287 - (view)	Author: Ramin Farajpour Cami (Ramin Farajpour Cami)	Date: 2021-02-19 06:45
Python 3.9.1 (tags/v3.9.1:1e5d33e, Dec 7 2020, 16:33:24) [MSC v.1928 32 bit (Intel)] on win32 Type "help", "copyright", "credits" or "license" for more information. >>> print("a"*1000000000) Traceback (most recent call last): File "<stdin>", line 1, in <module> MemoryError >>> print("1") Traceback (most recent call last): File "<stdin>", line 1, in <module> MemoryError >>>
msg387299 - (view)	Author: Eryk Sun (eryksun) *	Date: 2021-02-19 09:38
The sys.stdout TextIOWrapper is stuck in a bad state. When the write() method is called with an ASCII string, it implements an optimization that stores a reference to the str() object in the internal pending_bytes instead of immediately encoding the string. Subsequently _textiowrapper_writeflush() attempts to create a bytes object for this ASCII string, but PyBytes_FromStringAndSize fails with a memory error. It's stuck in this state because it will never be able to flush the string. The first workaround I thought of was to call to detach() to rewrap sys.stdout.buffer with a new TextIOWrapper instance, but detach() attempts to flush and fails. A hacky but simple and effective workaround is to just re-initialize the wrapper. For example: sys.stdout.__init__(sys.stdout.buffer, sys.stdout.encoding, sys.stdout.errors, None, True) This clears the internal pending_bytes.
msg387300 - (view)	Author: Eryk Sun (eryksun) *	Date: 2021-02-19 09:58
Issue 36748 added the ASCII optimization in write(), so I'm adding Inada Naoki to the nosy list.
msg387376 - (view)	Author: Inada Naoki (methane) *	Date: 2021-02-20 00:57
This is not the problem only in the optimization. _textiowrapper_writeflush need to create buffer bytes object anyway and if it cause MemoryError, the TextIOWrapper can not flush internal buffer forever. stdout.write("small text") stdout.write("very large text") # Calls writeflush, but can not allocate buffer. This example would stuck on same situation without the optimization. But the optimization made the problem easy to happen. Now the problem happend with only one learge text. Idea to fix this problem: * If input text is large (>1M?) * Flush buffer before adding the large text to the buffer. * Encode the text and write it to self->buffer soon. Do not put it into internal buffer (self->pending_bytes).
msg387387 - (view)	Author: Eryk Sun (eryksun) *	Date: 2021-02-20 04:40
> stdout.write("small text") > stdout.write("very large text") # Calls writeflush, but can not allocate buffer. Without the optimization, in most cases this will likely fail in _io_TextIOWrapper_write_impl() at the line `b = (self->encodefunc)((PyObject ) self, text)`. In some cases, it could be that the latter succeeds, but its size combined with the existing pending_bytes_count leads to a memory error in _textiowrapper_writeflush(). > * If input text is large (>1M?) I'd change write() to only optimize ASCII writes so long as the new total size of pending writes would not exceed the text wrapper's chunk size. Then rearrange the logic to pre-flush the text wrapper if the pending bytes plus the write would exceed the chunk size. Thus the total size of a list of pending writes (aggregating small writes as a chunk), or that of a single ASCII str() object, would be limited to the chunk size, in which case PyBytes_FromStringAndSize in _textiowrapper_writeflush() shouldn't fail in any normal circumstances. For example: if (self->encodefunc != NULL) { [NEW CONDITION] if (PyUnicode_IS_ASCII(text) && (PyUnicode_GET_LENGTH(text) + (self->pending_bytes ? self->pending_bytes_count : 0)) <= self->chunk_size && is_asciicompat_encoding(self->encodefunc)) { b = text; Py_INCREF(b); } else { b = (self->encodefunc)((PyObject ) self, text); } self->encoding_start_of_stream = 0; } else { b = PyObject_CallMethodOneArg(self->encoder, _PyIO_str_encode, text); } Py_DECREF(text); if (b == NULL) return NULL; if (b != text && !PyBytes_Check(b)) { PyErr_Format(PyExc_TypeError, "encoder should return a bytes object, not '%.200s'", Py_TYPE(b)->tp_name); Py_DECREF(b); return NULL; } Py_ssize_t bytes_len; if (b == text) { bytes_len = PyUnicode_GET_LENGTH(b); } else { bytes_len = PyBytes_GET_SIZE(b); } if (self->pending_bytes == NULL) { self->pending_bytes_count = 0; self->pending_bytes = b; } [NEW PRE-FLUSH] else if ((self->pending_bytes_count + bytes_len) > self->chunk_size) { if (_textiowrapper_writeflush(self) < 0) { Py_DECREF(b); return NULL; } self->pending_bytes = b; } else if (!PyList_CheckExact(self->pending_bytes)) { PyObject *list = PyList_New(2); if (list == NULL) { Py_DECREF(b); return NULL; } PyList_SET_ITEM(list, 0, self->pending_bytes); PyList_SET_ITEM(list, 1, b); self->pending_bytes = list; } else { if (PyList_Append(self->pending_bytes, b) < 0) { Py_DECREF(b); return NULL; } Py_DECREF(b); } self->pending_bytes_count += bytes_len; if (self->pending_bytes_count > self->chunk_size \|\| needflush \|\| text_needflush) { if (_textiowrapper_writeflush(self) < 0) return NULL; }
msg387388 - (view)	Author: Inada Naoki (methane) *	Date: 2021-02-20 04:53
In your code, huge data passed to .write(huge) may be remained in the internal buffer. ``` [NEW PRE-FLUSH] else if ((self->pending_bytes_count + bytes_len) > self->chunk_size) { if (_textiowrapper_writeflush(self) < 0) { Py_DECREF(b); return NULL; } self->pending_bytes = b; } (snip) self->pending_bytes_count += bytes_len; if (self->pending_bytes_count > self->chunk_size \|\| needflush \|\| text_needflush) { if (_textiowrapper_writeflush(self) < 0) return NULL; } ``` In my opinion, when .write(huge) fails with MemoryError, TextIOWrapper must not keep the `huge` in the internal buffer. See my PR-24592.
msg387391 - (view)	Author: Eryk Sun (eryksun) *	Date: 2021-02-20 05:20
> In your code, huge data passed to .write(huge) may be > remained in the internal buffer. If you mean the buffered writer, then I don't see the problem. A large bytes object in pending_bytes gets temporarily referenced by _textiowrapper_writeflush(), and self->pending_bytes is cleared. If the buffer's write() method fails, then the bytes object is simply deallocated. If you mean pending_bytes in the text wrapper, then I also don't see the problem. It always gets flushed if pending_bytes_count exceeds the chunk size. If pending_bytes is a list, it never exceeds the chunk size. It gets pre-flushed to avoid growing beyond the chunk size. If pending_bytes isn't a list, then it can only exceed the chunk size if it's a bytes object -- never for an ASCII str() object. _textiowrapper_writeflush() does not call PyBytes_AsStringAndSize() for that case.
msg387393 - (view)	Author: Inada Naoki (methane) *	Date: 2021-02-20 06:28
You are right. I misunderstood. ``` if (PyUnicode_IS_ASCII(text) && (PyUnicode_GET_LENGTH(text) + (self->pending_bytes ? self->pending_bytes_count : 0)) <= self->chunk_size && is_asciicompat_encoding(self->encodefunc)) { b = text; Py_INCREF(b); } ``` This seems too complex and defensive. Isn't `PyUnicode_GET_LENGTH(text) < self->chunk_size` enough? When `PyUnicode_GET_LENGTH(text) + self->pending_bytes_count > self->chunk_size`, `self->pending_bytes` is preflushed anyway.
msg387395 - (view)	Author: Eryk Sun (eryksun) *	Date: 2021-02-20 06:43
> Isn't `PyUnicode_GET_LENGTH(text) < self->chunk_size` enough? Yes, that's simpler, except with `<=`" instead of `<`, since the maximum count is chunk_size when pending_bytes is a list or ASCII string. When I wrote the more complex check, I did't take into account that pending_bytes would be flushed anyway if it causes pending_bytes_count to exceed chunk_size.
msg387480 - (view)	Author: Inada Naoki (methane) *	Date: 2021-02-21 23:29
New changeset 01806d5beba3d208bb56adba6829097d803bf54f by Inada Naoki in branch 'master': bpo-43260: io: Prevent large data remains in textio buffer. (GH-24592) https://github.com/python/cpython/commit/01806d5beba3d208bb56adba6829097d803bf54f
msg387484 - (view)	Author: Inada Naoki (methane) *	Date: 2021-02-22 01:33
New changeset d51436f95bf5978f82d917e53e9a456fdaa83a9d by Inada Naoki in branch '3.9': bpo-43260: io: Prevent large data remains in textio buffer. (GH-24592) https://github.com/python/cpython/commit/d51436f95bf5978f82d917e53e9a456fdaa83a9d
msg387486 - (view)	Author: Inada Naoki (methane) *	Date: 2021-02-22 01:38
New changeset 6e2f144f53982d7103d4cfc2d9361fc445a1817e by Inada Naoki in branch '3.8': bpo-43260: io: Prevent large data remains in textio buffer. (GH-24592) https://github.com/python/cpython/commit/6e2f144f53982d7103d4cfc2d9361fc445a1817e
msg387539 - (view)	Author: STINNER Victor (vstinner) *	Date: 2021-02-22 19:36
I didn't understand the bug but thanks for fixing it :)

History
Date	User	Action	Args
2022-04-11 14:59:41	admin	set	github: 87426
2021-02-22 19:36:22	vstinner	set	messages: + msg387539
2021-02-22 01:38:54	methane	set	status: open -> closed resolution: fixed stage: patch review -> resolved
2021-02-22 01:38:33	methane	set	messages: + msg387486
2021-02-22 01:33:02	methane	set	messages: + msg387484
2021-02-21 23:38:34	methane	set	pull_requests: + pull_request23399
2021-02-21 23:34:33	methane	set	pull_requests: + pull_request23398
2021-02-21 23:29:37	methane	set	messages: + msg387480
2021-02-20 08:30:41	Ramin Farajpour Cami	set	nosy: + vstinner
2021-02-20 06:43:44	eryksun	set	messages: + msg387395
2021-02-20 06:28:51	methane	set	messages: + msg387393
2021-02-20 05:20:01	eryksun	set	messages: + msg387391
2021-02-20 04:53:52	methane	set	messages: + msg387388
2021-02-20 04:40:26	eryksun	set	messages: + msg387387
2021-02-20 03:17:35	methane	set	keywords: + patch stage: patch review pull_requests: + pull_request23372
2021-02-20 00:57:34	methane	set	messages: + msg387376
2021-02-19 09:58:23	eryksun	set	nosy: + methane messages: + msg387300
2021-02-19 09:38:56	eryksun	set	priority: normal -> high versions: + Python 3.9, Python 3.10 nosy: + eryksun messages: + msg387299 components: + IO
2021-02-19 06:48:56	Ramin Farajpour Cami	set	type: crash
2021-02-19 06:45:44	Ramin Farajpour Cami	set	messages: + msg387287
2021-02-19 06:39:48	Ramin Farajpour Cami	set	messages: + msg387286
2021-02-19 05:28:57	Ramin Farajpour Cami	set	title: Never release buffer when MemoryError in prtin() -> Never release buffer when MemoryError in print()
2021-02-19 05:27:20	Ramin Farajpour Cami	create