This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: logging.FileHandler currently hardcodes errors='strict'
Type: enhancement Stage: resolved
Components: Library (Lib) Versions: Python 3.9
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: Nosy List: sourcejedi, vinay.sajip
Priority: normal Keywords:

Created on 2019-04-23 11:38 by sourcejedi, last changed 2022-04-11 14:59 by admin. This issue is now closed.

Messages (4)
msg340717 - (view) Author: Alan Jenkins (sourcejedi) * Date: 2019-04-23 11:38
## Problem description

```
import os
import logging

logging.basicConfig(filename='example.log',level=logging.DEBUG)
for name in os.listdir():
    logging.error('Hypothetical syntax error at line 1 of file: {}'.format(name))
```

The above program is incorrect.[*]  Because it is liable to raise UnicodeEncodeError.  This can be reproduced on a standard UTF-8 locale by creating a filename which is not valid UTF-8, e.g. `touch $(printf '\xff')`. Plenty of arguments have been written on this topic, but this is not my problem.

The user can report the program error, and it should not be too hard to debug.  And there is a fairly straightforward workaround, following Python's IOError: use repr() when outputing filenames.

But there is another issue with the above.  The docs advise that most programs deployed to production, will want to set `logging.raiseExceptions = false`.

My motivating example is the Linux package manager dnf, which followed this advice.  Specifically when they wanted to avoid UnicodeEncodeError.  [**][***]

Link: https://bugzilla.redhat.com/show_bug.cgi?id=1303293#c17

I think UnicodeEncodeError is an important case to handle, but the logging module does not handle it in a good way.  When we have `logging.raiseExceptions = false`, the problem messages will be lost.  Those messages could be critical to troubleshooting the user's problem.  It is even possible that all messages are lost - I think this would be very frustrating to troubleshoot.


## Alternative solutions which have been considered

* All debugging problems could of course be avoided, by simply writing correct programs in the first place.  The existence of debuggers suggests this is not a practical answer, even for Python :-).

* FileHandler can be worked around fairly simply, using StreamHandler instead.  However if you wanted to use RotatingFileHandler, there is no (documented) interface that would let you work around it.  SyslogHandler also seems important enough to be worth worrying about.


## A possible solution

When you set `raiseExceptions = false`, logging.FileHandler and friends should use errors='backslashreplace'.

errors='backslashreplace' is already the default for sys.stderr.  Matching this seems nice in case the program uses the default stderr logging in some configurations.

A log file full of encoding noise will be a specific sign, that can be used in troubleshooting.  And in cases similar to my example program, parts of the message could be entirely readable.  The end-user might be able to use the log message without noticing the incorrectness at all.  This is entirely consistent with the rationale for `logging.raiseExceptions = false`.

Previously you could set logging.raiseExceptions *after* you create the logger.  It will be a bit inconsistent if FileHandler relies on setting the `errors` parameter of `open()`.  It seems fairly easy to document this restriction.  But if that is not considered acceptable, we would either need some weird system that calls stream.reconfigure(), or muck around like dnf.i18n.UnicodeStream does:

    try:
        stream.write(s)
    except UnicodeEncodeError:
        if logging.raiseExceptions:
            raise
        else:
            stream.flush()  # make sure nothing gets out of order!
            s = s.encode(stream.encoding, 'backslashreplace')
            stream.buffer.write(s)

---

[*] C programs which segfault are said to be incorrect (or there is an error in system software or hardware).  I am using similar phrasing for python programs which raise unhandled UnicodeError's.  I am not sure if it is a good phrase to use for a Python program, but I hope my overall point is fairly clear.

[**] dnf developers did not appear to work on the correctness issue they found.  It might be a bug in gettext.

[***] In the linked case, *none* of dnf's messages were readable.  But I sympathize that dnf is so critical, it seems useful for dnf to try and hobble along.  Even in cases like this one.  As the user attempts to work towards some desired end state...  The user might temporarily ignore the non-fatal problem in dnf, because their current problem seems more important, e.g. trying to install some software needed to recover or back up document files.  At best, they might know some diagnostic software that knows how to diagnose they have a locale problem :-).  Or more serendipitously, installing certain other software that also suffers a locale problem might help them realize what is happening.
msg340726 - (view) Author: Alan Jenkins (sourcejedi) * Date: 2019-04-23 13:31
> [**] dnf developers did not appear to work on the correctness issue they found.  It might be a bug in gettext.

Nitpick: sorry, I read too quickly.  dnf *did* fix the specific correctness issue, as well as setting raiseExceptions = false.  It is explained by the PR description (and the diff)

https://github.com/rpm-software-management/dnf/pull/751
msg340923 - (view) Author: Alan Jenkins (sourcejedi) * Date: 2019-04-26 15:17
Oops. I assumed logging.raiseExceptions == True (the default) actually raises exceptions, but it doesn't.  It prints the exception to stderr and continues.

E.g. traditional UNIX daemons run with stderr connected to /dev/null.  Consider during development / debugging, or if you overlooked setting raiseExceptions altogether, or maybe you made the same mistake I did (because the reference-style documentation doesn't explain this behaviour).  My above proposal was to escape encoding errors only when raiseExceptions == False.  So when raiseExceptions == True, we might still lose log messages, and the UnicodeEncodeError might be printed to /dev/null i.e. also silently discarded.  This suggests my proposal was trying to be too clever.

I guess the simplest approach is for emit() to handle encode errors by using 'backslashreplace' to log the message AND calling handleError().  

Instead of the original code sketch, change StreamHandler.emit() to

        try:
            # issue 35046: merged two stream.writes into one.
            msg = self.format(record) + self.terminator
            stream = self.stream
            try:
                stream.write(msg)
                self.flush()
            except UnicodeEncodeError:
                # Try to log something, even pure mojibake might provide a clue
                encoding = getattr(stream, 'encoding', None)
                if encoding:
                    bytes = msg.encode(encoding, errors='backslashreplace')
                    msg = bytes.decode(encoding)
                    stream.write(msg)
                    self.flush()
                # Call handleError() as normal
                raise
        except RecursionError:  # See issue 36272
            raise
        except Exception:
            self.handleError(record)

(And I'd like a similar change for SyslogHandler at least).
msg346051 - (view) Author: Vinay Sajip (vinay.sajip) * (Python committer) Date: 2019-06-19 14:37
Fixed as part of the fix for bpo-37111.
History
Date User Action Args
2022-04-11 14:59:14adminsetgithub: 80885
2019-06-19 14:37:11vinay.sajipsetstatus: open -> closed
versions: + Python 3.9, - Python 3.7
messages: + msg346051

resolution: fixed
stage: resolved
2019-04-26 15:17:30sourcejedisetmessages: + msg340923
2019-04-23 13:31:48sourcejedisetmessages: + msg340726
2019-04-23 11:44:28SilentGhostsetnosy: + vinay.sajip
2019-04-23 11:38:37sourcejedicreate