Issue36704
This issue tracker has been migrated to GitHub,
and is currently read-only.
For more information,
see the GitHub FAQs in the Python's Developer Guide.
Created on 2019-04-23 11:38 by sourcejedi, last changed 2022-04-11 14:59 by admin. This issue is now closed.
Messages (4) | |||
---|---|---|---|
msg340717 - (view) | Author: Alan Jenkins (sourcejedi) * | Date: 2019-04-23 11:38 | |
## Problem description ``` import os import logging logging.basicConfig(filename='example.log',level=logging.DEBUG) for name in os.listdir(): logging.error('Hypothetical syntax error at line 1 of file: {}'.format(name)) ``` The above program is incorrect.[*] Because it is liable to raise UnicodeEncodeError. This can be reproduced on a standard UTF-8 locale by creating a filename which is not valid UTF-8, e.g. `touch $(printf '\xff')`. Plenty of arguments have been written on this topic, but this is not my problem. The user can report the program error, and it should not be too hard to debug. And there is a fairly straightforward workaround, following Python's IOError: use repr() when outputing filenames. But there is another issue with the above. The docs advise that most programs deployed to production, will want to set `logging.raiseExceptions = false`. My motivating example is the Linux package manager dnf, which followed this advice. Specifically when they wanted to avoid UnicodeEncodeError. [**][***] Link: https://bugzilla.redhat.com/show_bug.cgi?id=1303293#c17 I think UnicodeEncodeError is an important case to handle, but the logging module does not handle it in a good way. When we have `logging.raiseExceptions = false`, the problem messages will be lost. Those messages could be critical to troubleshooting the user's problem. It is even possible that all messages are lost - I think this would be very frustrating to troubleshoot. ## Alternative solutions which have been considered * All debugging problems could of course be avoided, by simply writing correct programs in the first place. The existence of debuggers suggests this is not a practical answer, even for Python :-). * FileHandler can be worked around fairly simply, using StreamHandler instead. However if you wanted to use RotatingFileHandler, there is no (documented) interface that would let you work around it. SyslogHandler also seems important enough to be worth worrying about. ## A possible solution When you set `raiseExceptions = false`, logging.FileHandler and friends should use errors='backslashreplace'. errors='backslashreplace' is already the default for sys.stderr. Matching this seems nice in case the program uses the default stderr logging in some configurations. A log file full of encoding noise will be a specific sign, that can be used in troubleshooting. And in cases similar to my example program, parts of the message could be entirely readable. The end-user might be able to use the log message without noticing the incorrectness at all. This is entirely consistent with the rationale for `logging.raiseExceptions = false`. Previously you could set logging.raiseExceptions *after* you create the logger. It will be a bit inconsistent if FileHandler relies on setting the `errors` parameter of `open()`. It seems fairly easy to document this restriction. But if that is not considered acceptable, we would either need some weird system that calls stream.reconfigure(), or muck around like dnf.i18n.UnicodeStream does: try: stream.write(s) except UnicodeEncodeError: if logging.raiseExceptions: raise else: stream.flush() # make sure nothing gets out of order! s = s.encode(stream.encoding, 'backslashreplace') stream.buffer.write(s) --- [*] C programs which segfault are said to be incorrect (or there is an error in system software or hardware). I am using similar phrasing for python programs which raise unhandled UnicodeError's. I am not sure if it is a good phrase to use for a Python program, but I hope my overall point is fairly clear. [**] dnf developers did not appear to work on the correctness issue they found. It might be a bug in gettext. [***] In the linked case, *none* of dnf's messages were readable. But I sympathize that dnf is so critical, it seems useful for dnf to try and hobble along. Even in cases like this one. As the user attempts to work towards some desired end state... The user might temporarily ignore the non-fatal problem in dnf, because their current problem seems more important, e.g. trying to install some software needed to recover or back up document files. At best, they might know some diagnostic software that knows how to diagnose they have a locale problem :-). Or more serendipitously, installing certain other software that also suffers a locale problem might help them realize what is happening. |
|||
msg340726 - (view) | Author: Alan Jenkins (sourcejedi) * | Date: 2019-04-23 13:31 | |
> [**] dnf developers did not appear to work on the correctness issue they found. It might be a bug in gettext. Nitpick: sorry, I read too quickly. dnf *did* fix the specific correctness issue, as well as setting raiseExceptions = false. It is explained by the PR description (and the diff) https://github.com/rpm-software-management/dnf/pull/751 |
|||
msg340923 - (view) | Author: Alan Jenkins (sourcejedi) * | Date: 2019-04-26 15:17 | |
Oops. I assumed logging.raiseExceptions == True (the default) actually raises exceptions, but it doesn't. It prints the exception to stderr and continues. E.g. traditional UNIX daemons run with stderr connected to /dev/null. Consider during development / debugging, or if you overlooked setting raiseExceptions altogether, or maybe you made the same mistake I did (because the reference-style documentation doesn't explain this behaviour). My above proposal was to escape encoding errors only when raiseExceptions == False. So when raiseExceptions == True, we might still lose log messages, and the UnicodeEncodeError might be printed to /dev/null i.e. also silently discarded. This suggests my proposal was trying to be too clever. I guess the simplest approach is for emit() to handle encode errors by using 'backslashreplace' to log the message AND calling handleError(). Instead of the original code sketch, change StreamHandler.emit() to try: # issue 35046: merged two stream.writes into one. msg = self.format(record) + self.terminator stream = self.stream try: stream.write(msg) self.flush() except UnicodeEncodeError: # Try to log something, even pure mojibake might provide a clue encoding = getattr(stream, 'encoding', None) if encoding: bytes = msg.encode(encoding, errors='backslashreplace') msg = bytes.decode(encoding) stream.write(msg) self.flush() # Call handleError() as normal raise except RecursionError: # See issue 36272 raise except Exception: self.handleError(record) (And I'd like a similar change for SyslogHandler at least). |
|||
msg346051 - (view) | Author: Vinay Sajip (vinay.sajip) * | Date: 2019-06-19 14:37 | |
Fixed as part of the fix for bpo-37111. |
History | |||
---|---|---|---|
Date | User | Action | Args |
2022-04-11 14:59:14 | admin | set | github: 80885 |
2019-06-19 14:37:11 | vinay.sajip | set | status: open -> closed versions: + Python 3.9, - Python 3.7 messages: + msg346051 resolution: fixed stage: resolved |
2019-04-26 15:17:30 | sourcejedi | set | messages: + msg340923 |
2019-04-23 13:31:48 | sourcejedi | set | messages: + msg340726 |
2019-04-23 11:44:28 | SilentGhost | set | nosy:
+ vinay.sajip |
2019-04-23 11:38:37 | sourcejedi | create |