This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: Never release buffer when MemoryError in print()
Type: crash Stage: resolved
Components: IO, Windows Versions: Python 3.10, Python 3.9, Python 3.8
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: Nosy List: Ramin Farajpour Cami, eryksun, methane, paul.moore, steve.dower, tim.golden, vstinner, zach.ware
Priority: high Keywords: patch

Created on 2021-02-19 05:27 by Ramin Farajpour Cami, last changed 2022-04-11 14:59 by admin. This issue is now closed.

Pull Requests
URL Status Linked Edit
PR 24592 merged methane, 2021-02-20 03:17
PR 24617 merged methane, 2021-02-21 23:34
PR 24618 merged methane, 2021-02-21 23:38
Messages (15)
msg387282 - (view) Author: Ramin Farajpour Cami (Ramin Farajpour Cami) Date: 2021-02-19 05:27
Hi,


When we use "a"*10000000000 in print("a"*10000000000) function, Show "MemoryError",Again i use print("1") again display  "MemoryError", I think memory not release for use again, 


>>> print("a"*1000000000)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
MemoryError
>>> print("1")
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
MemoryError

When i exit() python interpeter, and again print("1") it's work.

And I test in linux its' work:
>>> print("a"*1000000000)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
MemoryError
>>> print("1")
1
>>> print("a")
a
>>>


Thanks.
Ramin
msg387286 - (view) Author: Ramin Farajpour Cami (Ramin Farajpour Cami) Date: 2021-02-19 06:39
Version : 

Python 3.8.5 (tags/v3.8.5:580fbb0, Jul 20 2020, 15:43:08) [MSC v.1926 32 bit (Intel)] on win32
msg387287 - (view) Author: Ramin Farajpour Cami (Ramin Farajpour Cami) Date: 2021-02-19 06:45
Python 3.9.1 (tags/v3.9.1:1e5d33e, Dec  7 2020, 16:33:24) [MSC v.1928 32 bit (Intel)] on win32
Type "help", "copyright", "credits" or "license" for more information.
>>> print("a"*1000000000)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
MemoryError
>>> print("1")
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
MemoryError
>>>
msg387299 - (view) Author: Eryk Sun (eryksun) * (Python triager) Date: 2021-02-19 09:38
The sys.stdout TextIOWrapper is stuck in a bad state. When the write() method is called with an ASCII string, it implements an optimization that stores a reference to the str() object in the internal pending_bytes instead of immediately encoding the string. Subsequently _textiowrapper_writeflush() attempts to create a bytes object for this ASCII string, but PyBytes_FromStringAndSize fails with a memory error. It's stuck in this state because it will never be able to flush the string. 

The first workaround I thought of was to call to detach() to rewrap sys.stdout.buffer with a new TextIOWrapper instance, but detach() attempts to flush and fails. A hacky but simple and effective workaround is to just re-initialize the wrapper. For example:

    sys.stdout.__init__(sys.stdout.buffer, sys.stdout.encoding,
        sys.stdout.errors, None, True)

This clears the internal pending_bytes.
msg387300 - (view) Author: Eryk Sun (eryksun) * (Python triager) Date: 2021-02-19 09:58
Issue 36748 added the ASCII optimization in write(), so I'm adding Inada Naoki to the nosy list.
msg387376 - (view) Author: Inada Naoki (methane) * (Python committer) Date: 2021-02-20 00:57
This is not the problem only in the optimization. _textiowrapper_writeflush need to create buffer bytes object anyway and if it cause MemoryError, the TextIOWrapper can not flush internal buffer forever.

  stdout.write("small text")
  stdout.write("very large text")  # Calls writeflush, but can not allocate buffer.

This example would stuck on same situation without the optimization.


But the optimization made the problem easy to happen. Now the problem happend with only one learge text.

Idea to fix this problem:

* If input text is large (>1M?)
  * Flush buffer before adding the large text to the buffer.
  * Encode the text and write it to self->buffer soon. Do not put it into internal buffer (self->pending_bytes).
msg387387 - (view) Author: Eryk Sun (eryksun) * (Python triager) Date: 2021-02-20 04:40
> stdout.write("small text")
> stdout.write("very large text")  # Calls writeflush, but can not allocate buffer.

Without the optimization, in most cases this will likely fail in _io_TextIOWrapper_write_impl() at the line `b = (*self->encodefunc)((PyObject *) self, text)`. In some cases, it could be that the latter succeeds, but its size combined with the existing pending_bytes_count leads to a memory error in _textiowrapper_writeflush().

> * If input text is large (>1M?)

I'd change write() to only optimize ASCII writes so long as the new total size of pending writes would not exceed the text wrapper's chunk size. Then rearrange the logic to pre-flush the text wrapper if the pending bytes plus the write would exceed the chunk size. Thus the total size of a list of pending writes (aggregating small writes as a chunk), or that of a single ASCII str() object, would be limited to the chunk size, in which case PyBytes_FromStringAndSize in _textiowrapper_writeflush() shouldn't fail in any normal circumstances. For example:

    if (self->encodefunc != NULL) {

[NEW CONDITION]

        if (PyUnicode_IS_ASCII(text) &&
              (PyUnicode_GET_LENGTH(text) +
                (self->pending_bytes ? self->pending_bytes_count : 0)) <=
                  self->chunk_size &&
              is_asciicompat_encoding(self->encodefunc)) {
            b = text;
            Py_INCREF(b);
        }
        else {
            b = (*self->encodefunc)((PyObject *) self, text);
        }
        self->encoding_start_of_stream = 0;
    }
    else {
        b = PyObject_CallMethodOneArg(self->encoder, _PyIO_str_encode, text);
    }

    Py_DECREF(text);
    if (b == NULL)
        return NULL;
    if (b != text && !PyBytes_Check(b)) {
        PyErr_Format(PyExc_TypeError,
                     "encoder should return a bytes object, not '%.200s'",
                     Py_TYPE(b)->tp_name);
        Py_DECREF(b);
        return NULL;
    }

    Py_ssize_t bytes_len;
    if (b == text) {
        bytes_len = PyUnicode_GET_LENGTH(b);
    }
    else {
        bytes_len = PyBytes_GET_SIZE(b);
    }

    if (self->pending_bytes == NULL) {
        self->pending_bytes_count = 0;
        self->pending_bytes = b;
    }

[NEW PRE-FLUSH]

    else if ((self->pending_bytes_count + bytes_len) > self->chunk_size) {
        if (_textiowrapper_writeflush(self) < 0) {
            Py_DECREF(b);
            return NULL;
        }
        self->pending_bytes = b;
    }
    else if (!PyList_CheckExact(self->pending_bytes)) {
        PyObject *list = PyList_New(2);
        if (list == NULL) {
            Py_DECREF(b);
            return NULL;
        }
        PyList_SET_ITEM(list, 0, self->pending_bytes);
        PyList_SET_ITEM(list, 1, b);
        self->pending_bytes = list;
    }
    else {
        if (PyList_Append(self->pending_bytes, b) < 0) {
            Py_DECREF(b);
            return NULL;
        }
        Py_DECREF(b);
    }

    self->pending_bytes_count += bytes_len;
    if (self->pending_bytes_count > self->chunk_size || needflush ||
        text_needflush) {
        if (_textiowrapper_writeflush(self) < 0)
            return NULL;
    }
msg387388 - (view) Author: Inada Naoki (methane) * (Python committer) Date: 2021-02-20 04:53
In your code, huge data passed to .write(huge) may be remained in the internal buffer.

```
[NEW PRE-FLUSH]

    else if ((self->pending_bytes_count + bytes_len) > self->chunk_size) {
        if (_textiowrapper_writeflush(self) < 0) {
            Py_DECREF(b);
            return NULL;
        }
        self->pending_bytes = b;
    }
(snip)
    self->pending_bytes_count += bytes_len;
    if (self->pending_bytes_count > self->chunk_size || needflush ||
        text_needflush) {
        if (_textiowrapper_writeflush(self) < 0)
            return NULL;
    }
```

In my opinion, when .write(huge) fails with MemoryError, TextIOWrapper must not keep the `huge` in the internal buffer.

See my PR-24592.
msg387391 - (view) Author: Eryk Sun (eryksun) * (Python triager) Date: 2021-02-20 05:20
> In your code, huge data passed to .write(huge) may be 
> remained in the internal buffer.

If you mean the buffered writer, then I don't see the problem. A large bytes object in pending_bytes gets temporarily referenced by _textiowrapper_writeflush(), and self->pending_bytes is cleared. If the buffer's write() method fails, then the bytes object is simply deallocated.

If you mean pending_bytes in the text wrapper, then I also don't see the problem. It always gets flushed if pending_bytes_count exceeds the chunk size. If pending_bytes is a list, it never exceeds the chunk size. It gets pre-flushed to avoid growing beyond the chunk size. If pending_bytes isn't a list, then it can only exceed the chunk size if it's a bytes object -- never for an ASCII str() object. _textiowrapper_writeflush() does not call PyBytes_AsStringAndSize() for that case.
msg387393 - (view) Author: Inada Naoki (methane) * (Python committer) Date: 2021-02-20 06:28
You are right. I misunderstood.

```
        if (PyUnicode_IS_ASCII(text) &&
              (PyUnicode_GET_LENGTH(text) +
                (self->pending_bytes ? self->pending_bytes_count : 0)) <=
                  self->chunk_size &&
              is_asciicompat_encoding(self->encodefunc)) {
            b = text;
            Py_INCREF(b);
        }
```

This seems too complex and defensive.  Isn't `PyUnicode_GET_LENGTH(text) < self->chunk_size` enough?

When `PyUnicode_GET_LENGTH(text) + self->pending_bytes_count > self->chunk_size`, `self->pending_bytes` is preflushed anyway.
msg387395 - (view) Author: Eryk Sun (eryksun) * (Python triager) Date: 2021-02-20 06:43
> Isn't `PyUnicode_GET_LENGTH(text) < self->chunk_size` enough?

Yes, that's simpler, except with `<=`" instead of `<`, since the maximum count is chunk_size when pending_bytes is a list or ASCII string. When I wrote the more complex check, I did't take into account that pending_bytes would be flushed anyway if it causes pending_bytes_count to exceed chunk_size.
msg387480 - (view) Author: Inada Naoki (methane) * (Python committer) Date: 2021-02-21 23:29
New changeset 01806d5beba3d208bb56adba6829097d803bf54f by Inada Naoki in branch 'master':
bpo-43260: io: Prevent large data remains in textio buffer. (GH-24592)
https://github.com/python/cpython/commit/01806d5beba3d208bb56adba6829097d803bf54f
msg387484 - (view) Author: Inada Naoki (methane) * (Python committer) Date: 2021-02-22 01:33
New changeset d51436f95bf5978f82d917e53e9a456fdaa83a9d by Inada Naoki in branch '3.9':
bpo-43260: io: Prevent large data remains in textio buffer. (GH-24592)
https://github.com/python/cpython/commit/d51436f95bf5978f82d917e53e9a456fdaa83a9d
msg387486 - (view) Author: Inada Naoki (methane) * (Python committer) Date: 2021-02-22 01:38
New changeset 6e2f144f53982d7103d4cfc2d9361fc445a1817e by Inada Naoki in branch '3.8':
bpo-43260: io: Prevent large data remains in textio buffer. (GH-24592)
https://github.com/python/cpython/commit/6e2f144f53982d7103d4cfc2d9361fc445a1817e
msg387539 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2021-02-22 19:36
I didn't understand the bug but thanks for fixing it :)
History
Date User Action Args
2022-04-11 14:59:41adminsetgithub: 87426
2021-02-22 19:36:22vstinnersetmessages: + msg387539
2021-02-22 01:38:54methanesetstatus: open -> closed
resolution: fixed
stage: patch review -> resolved
2021-02-22 01:38:33methanesetmessages: + msg387486
2021-02-22 01:33:02methanesetmessages: + msg387484
2021-02-21 23:38:34methanesetpull_requests: + pull_request23399
2021-02-21 23:34:33methanesetpull_requests: + pull_request23398
2021-02-21 23:29:37methanesetmessages: + msg387480
2021-02-20 08:30:41Ramin Farajpour Camisetnosy: + vstinner
2021-02-20 06:43:44eryksunsetmessages: + msg387395
2021-02-20 06:28:51methanesetmessages: + msg387393
2021-02-20 05:20:01eryksunsetmessages: + msg387391
2021-02-20 04:53:52methanesetmessages: + msg387388
2021-02-20 04:40:26eryksunsetmessages: + msg387387
2021-02-20 03:17:35methanesetkeywords: + patch
stage: patch review
pull_requests: + pull_request23372
2021-02-20 00:57:34methanesetmessages: + msg387376
2021-02-19 09:58:23eryksunsetnosy: + methane
messages: + msg387300
2021-02-19 09:38:56eryksunsetpriority: normal -> high
versions: + Python 3.9, Python 3.10
nosy: + eryksun

messages: + msg387299

components: + IO
2021-02-19 06:48:56Ramin Farajpour Camisettype: crash
2021-02-19 06:45:44Ramin Farajpour Camisetmessages: + msg387287
2021-02-19 06:39:48Ramin Farajpour Camisetmessages: + msg387286
2021-02-19 05:28:57Ramin Farajpour Camisettitle: Never release buffer when MemoryError in prtin() -> Never release buffer when MemoryError in print()
2021-02-19 05:27:20Ramin Farajpour Camicreate