classification
Title: logging can raise UnicodeEncodeError
Type: Stage:
Components: Documentation, Unicode Versions: Python 2.7
process
Status: closed Resolution: not a bug
Dependencies: Superseder:
Assigned To: docs@python Nosy List: Kronuz, docs@python, eric.araujo, ezio.melotti, kiminoa, vinay.sajip
Priority: normal Keywords:

Created on 2013-02-07 18:18 by Kronuz, last changed 2013-02-21 03:52 by kiminoa. This issue is now closed.

Messages (5)
msg181640 - (view) Author: Germán Méndez Bravo (Kronuz) * Date: 2013-02-07 18:18
I've seen *a lot* of people using `logging.exception(exc)` to log exceptions. It all seems okay, until the exc object contains unicode strings, at which point logging throes a UnicodeEncodeError exception.

Example: `exc = Exception(u'operaci\xf3n'); logger.error(exc)` throws an exception, while `exc = Exception(u'operaci\xf3n'); logger.error(u"%s", exc)` does not and works as expected.

The problem for this is in the `_fmt` string in logging being `"%(message)s"`, not `u"%(message)s"`, which ends up getting the string (non-unicode) version of the exception object (returned by `getMessage()`) and failing to apply the formatting since the exception contains unicode.

A solution would be to make the default formatting string a unicode string so the object returned by `getMessage()` (the exception) is converted to unicode by making all formatting strings for logging unicode strings: (could be done for example by changing to `unicode(self._fmt) % record.__dict__` the line logging/__init__.py:467).

Other solution could be to encourage users not to use objects as the first argument to the logging methods, and perhaps even log a warning against it if it's done.
msg181700 - (view) Author: Éric Araujo (eric.araujo) * (Python committer) Date: 2013-02-08 20:59
Hm the correct way to use exception is:

    except Something:
        logger.exception('problem while doing X')

i.e. this is a generic unicode-to-str-with-default-encoding problem, not something specific to logging.

Vinay, do you think logging should do something smarter than calling str when passed non-str objects?
msg181705 - (view) Author: Vinay Sajip (vinay.sajip) * (Python committer) Date: 2013-02-08 23:07
It is by design that logging accepts arbitrary objects, rather than just strings, see

docs.python.org/howto/logging.html#arbitrary-object-messages

and, as documented, the instance's __str__ will be called by logging calling str() on the instance. If people are being lazy and doing logging.exception(exc) where exc is an exception instance, then they need to change their code. Recall that on Python 2.x, just doing a + b can trigger a UnicodeError because of implicit bytes->Unicode conversions which use ASCII as a default (this is just how Python 2.x works - nothing to do with logging). An arbitrary exception's str() method may or may not be smart with respect to this sort of behaviour. I think the answer is for people to be more aware of Unicode issues and how Python 2.x deals with mixed Unicode and byte data. If the _fmt string you are referring to is the Formatter instance attribute, you can control that by passing whatever you want to the Formatter - a Unicode string, if you wish.

The normal logging exception handling is as per Éric's example (though of course you can have placeholders and arguments passed to the exception call, as in

    logger.exception('Problem with %r', 'specific data')

I'm closing as invalid, because the example you quoted as working is how people are supposed to use it.
msg182575 - (view) Author: Kim (kiminoa) Date: 2013-02-21 02:06
I'm running into similar issues with 2.6.7 and logging 0.4.9.6, where unicode strings are fine in print statements and codecs writes, but the same string is giving tracebacks for logging.  If it's an education issue, I'm not finding the education I need ... :-/

import logging
import codecs

# Unicode string
i = u'\u0433\u043e\u0432\u043e\u0440\u0438\u0442\u044a'

# Print statement is fine
print "hi, i'm the string in question in a print statement: %s" % i

# Codecs write is fine
with codecs.open('/tmp/utf8', 'w', 'utf-8') as f:
    f.write(i)

# Logging gives a Traceback
log = logging.getLogger(__name__)
log.setLevel(logging.DEBUG)
handler = logging.FileHandler('/tmp/out', 'w', 'utf-8')
handler.setFormatter(logging.Formatter(u'[%(levelname)s] %(message)s'))
# I've also tried nixing setFormatter and going with the default
log.addHandler(handler)
log.debug(u"process_clusters: From CSV: %s", i)
# I've also tried a bare call to i, with and without the u in the message, and explicitly i.encode('utf8'); all Tracebacks.
msg182579 - (view) Author: Kim (kiminoa) Date: 2013-02-21 03:52
p.s. Converting to a StreamHandler fixes my issue for now.
History
Date User Action Args
2013-02-21 03:52:43kiminoasetmessages: + msg182579
2013-02-21 02:06:59kiminoasetnosy: + kiminoa
messages: + msg182575
2013-02-08 23:07:22vinay.sajipsetstatus: open -> closed
resolution: not a bug
messages: + msg181705
2013-02-08 20:59:50eric.araujosetnosy: + eric.araujo, vinay.sajip

messages: + msg181700
title: Logging throwing UnicodeEncodeError exception -> logging can raise UnicodeEncodeError
2013-02-07 18:18:25Kronuzcreate