Author msakai
Recipients msakai
Date 2020-02-25.02:47:58
SpamBayes Score -1.0
Marked as misclassified Yes
Message-id <1582598879.29.0.157478927981.issue39745@roundup.psfhosted.org>
In-reply-to
Content
According to https://docs.python.org/3/library/exceptions.html#BlockingIOError , 'characters_written' is "An integer containing the number of characters written to the stream before it blocked". But I observed that it represents number of *bytes* not *characters* in the following program.

Program:
----
import os
import threading
import time

r, w = os.pipe()
os.set_blocking(w, False)
f_r = os.fdopen(r, mode="rb")
f_w = os.fdopen(w, mode="w", encoding="utf-8")

msg = "\u03b1\u03b2\u03b3\u3042\u3044\u3046\u3048\u304a" * (1024 * 16)
try:
    print(msg, file=f_w, flush=True)
except BlockingIOError as e:
    print(f"BlockingIOError.characters_written == {e.characters_written}")
    written = e.characters_written

def close():
    os.set_blocking(w, True)
    f_w.close()
threading.Thread(target=close).start()

b = f_r.read()
f_r.close()

print(f"{written} characters correspond to {len(msg[:written].encode('utf-8'))} bytes in UTF-8")
print(f"{len(b)} bytes read")
----

Output:
----
BlockingIOError.characters_written == 81920
81920 characters correspond to 215040 bytes in UTF-8
81920 bytes read
----

I think it is confusing behavior.
If this is intended behavior, then it should be documented as such and I think 'bytes_written' is more appropriate name.
History
Date User Action Args
2020-02-25 02:47:59msakaisetrecipients: + msakai
2020-02-25 02:47:59msakaisetmessageid: <1582598879.29.0.157478927981.issue39745@roundup.psfhosted.org>
2020-02-25 02:47:59msakailinkissue39745 messages
2020-02-25 02:47:58msakaicreate