Author vstinner
Recipients Arfrever, inada.naoki, ishimoto, loewis, mrabarnett, ncoghlan, pitrou, serhiy.storchaka, vstinner
Date 2012-08-07.02:01:33
SpamBayes Score -1.0
Marked as misclassified Yes
Message-id <1344304897.02.0.734555445654.issue15216@psf.upfronthosting.co.za>
In-reply-to
Content
Here is a Python implementation of TextIOWrapper.set_encoding().

The main limitation is that it is not possible to set the encoding on a non-seekable stream after the first read (if the read buffer is not empty, ie. if there are pending decoded characters).

+        # flush read buffer, may require to seek backward in the underlying
+        # file object
+        if self._decoded_chars:
+            if not self.seekable():
+                raise UnsupportedOperation(
+                    "It is not possible to set the encoding "
+                    "of a non seekable file after the first read")
+            assert self._snapshot is not None
+            dec_flags, next_input = self._snapshot
+            offset = self._decoded_chars_used - len(next_input)
+            if offset:
+                self.buffer.seek(offset, SEEK_CUR)

--

I don't have an use case for setting the encoding of sys.stdout/stderr after startup, but I would like to be able to control the *error handler* after the startup! My implementation permits to change both (encoding, errors, encoding and errors).

For example, Lib/test/regrtest.py uses the following function to force the backslashreplace error handler on sys.stdout. It changes the error handler to avoid UnicodeEncodeError when displaying the result of tests.

def replace_stdout():
    """Set stdout encoder error handler to backslashreplace (as stderr error
    handler) to avoid UnicodeEncodeError when printing a traceback"""
    import atexit

    stdout = sys.stdout
    sys.stdout = open(stdout.fileno(), 'w',
        encoding=stdout.encoding,
        errors="backslashreplace",
        closefd=False,
        newline='\n')

    def restore_stdout():
        sys.stdout.close()
        sys.stdout = stdout
    atexit.register(restore_stdout)

The doctest module uses another trick to change the error handler:

        save_stdout = sys.stdout
        if out is None:
            encoding = save_stdout.encoding
            if encoding is None or encoding.lower() == 'utf-8':
                out = save_stdout.write
            else:
                # Use backslashreplace error handling on write
                def out(s):
                    s = str(s.encode(encoding, 'backslashreplace'), encoding)
                    save_stdout.write(s)
        sys.stdout = self._fakeout
History
Date User Action Args
2012-08-07 02:01:37vstinnersetrecipients: + vstinner, loewis, ishimoto, ncoghlan, pitrou, mrabarnett, Arfrever, inada.naoki, serhiy.storchaka
2012-08-07 02:01:37vstinnersetmessageid: <1344304897.02.0.734555445654.issue15216@psf.upfronthosting.co.za>
2012-08-07 02:01:36vstinnerlinkissue15216 messages
2012-08-07 02:01:35vstinnercreate