This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author sourcejedi
Recipients sourcejedi
Date 2019-04-23.11:38:36
SpamBayes Score -1.0
Marked as misclassified Yes
Message-id <1556019517.33.0.109747791974.issue36704@roundup.psfhosted.org>
In-reply-to
Content
## Problem description

```
import os
import logging

logging.basicConfig(filename='example.log',level=logging.DEBUG)
for name in os.listdir():
    logging.error('Hypothetical syntax error at line 1 of file: {}'.format(name))
```

The above program is incorrect.[*]  Because it is liable to raise UnicodeEncodeError.  This can be reproduced on a standard UTF-8 locale by creating a filename which is not valid UTF-8, e.g. `touch $(printf '\xff')`. Plenty of arguments have been written on this topic, but this is not my problem.

The user can report the program error, and it should not be too hard to debug.  And there is a fairly straightforward workaround, following Python's IOError: use repr() when outputing filenames.

But there is another issue with the above.  The docs advise that most programs deployed to production, will want to set `logging.raiseExceptions = false`.

My motivating example is the Linux package manager dnf, which followed this advice.  Specifically when they wanted to avoid UnicodeEncodeError.  [**][***]

Link: https://bugzilla.redhat.com/show_bug.cgi?id=1303293#c17

I think UnicodeEncodeError is an important case to handle, but the logging module does not handle it in a good way.  When we have `logging.raiseExceptions = false`, the problem messages will be lost.  Those messages could be critical to troubleshooting the user's problem.  It is even possible that all messages are lost - I think this would be very frustrating to troubleshoot.


## Alternative solutions which have been considered

* All debugging problems could of course be avoided, by simply writing correct programs in the first place.  The existence of debuggers suggests this is not a practical answer, even for Python :-).

* FileHandler can be worked around fairly simply, using StreamHandler instead.  However if you wanted to use RotatingFileHandler, there is no (documented) interface that would let you work around it.  SyslogHandler also seems important enough to be worth worrying about.


## A possible solution

When you set `raiseExceptions = false`, logging.FileHandler and friends should use errors='backslashreplace'.

errors='backslashreplace' is already the default for sys.stderr.  Matching this seems nice in case the program uses the default stderr logging in some configurations.

A log file full of encoding noise will be a specific sign, that can be used in troubleshooting.  And in cases similar to my example program, parts of the message could be entirely readable.  The end-user might be able to use the log message without noticing the incorrectness at all.  This is entirely consistent with the rationale for `logging.raiseExceptions = false`.

Previously you could set logging.raiseExceptions *after* you create the logger.  It will be a bit inconsistent if FileHandler relies on setting the `errors` parameter of `open()`.  It seems fairly easy to document this restriction.  But if that is not considered acceptable, we would either need some weird system that calls stream.reconfigure(), or muck around like dnf.i18n.UnicodeStream does:

    try:
        stream.write(s)
    except UnicodeEncodeError:
        if logging.raiseExceptions:
            raise
        else:
            stream.flush()  # make sure nothing gets out of order!
            s = s.encode(stream.encoding, 'backslashreplace')
            stream.buffer.write(s)

---

[*] C programs which segfault are said to be incorrect (or there is an error in system software or hardware).  I am using similar phrasing for python programs which raise unhandled UnicodeError's.  I am not sure if it is a good phrase to use for a Python program, but I hope my overall point is fairly clear.

[**] dnf developers did not appear to work on the correctness issue they found.  It might be a bug in gettext.

[***] In the linked case, *none* of dnf's messages were readable.  But I sympathize that dnf is so critical, it seems useful for dnf to try and hobble along.  Even in cases like this one.  As the user attempts to work towards some desired end state...  The user might temporarily ignore the non-fatal problem in dnf, because their current problem seems more important, e.g. trying to install some software needed to recover or back up document files.  At best, they might know some diagnostic software that knows how to diagnose they have a locale problem :-).  Or more serendipitously, installing certain other software that also suffers a locale problem might help them realize what is happening.
History
Date User Action Args
2019-04-23 11:38:37sourcejedisetrecipients: + sourcejedi
2019-04-23 11:38:37sourcejedisetmessageid: <1556019517.33.0.109747791974.issue36704@roundup.psfhosted.org>
2019-04-23 11:38:37sourcejedilinkissue36704 messages
2019-04-23 11:38:36sourcejedicreate