This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: logging UnicodeDecodeError from undecodable strftime output
Type: behavior Stage: resolved
Components: Library (Lib) Versions: Python 2.7
process
Status: closed Resolution: out of date
Dependencies: Superseder:
Assigned To: Nosy List: iritkatriel, mark.dickinson, vinay.sajip
Priority: normal Keywords:

Created on 2018-12-06 16:39 by mark.dickinson, last changed 2022-04-11 14:59 by admin. This issue is now closed.

Messages (4)
msg331238 - (view) Author: Mark Dickinson (mark.dickinson) * (Python committer) Date: 2018-12-06 16:39
We're seeing UnicodeDecodeErrors on Windows / Python 2.7 when using logging on a Japanese customer machine. The cause turns out to be in the log record formatting, where unicode fields are combined with non-decodable bytestrings coming from strftime.

More details: we were using the following formatter:

_LOG_FORMATTER = logging.Formatter(
    "%(asctime)s %(levelname)s:%(name)s:%(message)s",
    datefmt="%Y-%m-%dT%H:%M:%S%Z",
)

In the logging internals, that `datefmt` gets passed to `time.strftime`, which on the machine in question produced a non-ASCII bytestring (i.e., type `str`). When combined with other Unicode strings in the log record, this gave the `UnicodeDecodeError`.

I'm unfortunately failing to reproduce this directly on my own macOS / UK locale machine, but it's documented that `time.strftime` returns a value encoded according to the current locale. In this particular case, the output we were getting from the `time.strftime` call looked like:

"2018-12-06T23:57:14\x93\x8c\x8b\x9e (\x95W\x8f\x80\x8e\x9e)"

which assuming an encoding of cp932 decodes to something plausible:

>>> s.decode("cp932")
u'2018-12-06T23:57:14\u6771\u4eac (\u6a19\u6e96\u6642)'
>>> print(s.decode("cp932"))
2018-12-06T23:57:14東京 (標準時)

It looks as though the logging module should be explicitly decoding the strftime output before doing formatting, using for example what's recommended in the strftime documentation [1]:

    strftime(<myformat>).decode(locale.getlocale()[1])


Code links: this is the line that's producing non-decodable bytes: 

https://github.com/python/cpython/blob/49cedc51a68b4cd2525c14ab02bd1a483d8be389/Lib/logging/__init__.py#L425

... and this is the formatting operation that then ends up raising UnicodeDecodeError as a result of those:

https://github.com/python/cpython/blob/49cedc51a68b4cd2525c14ab02bd1a483d8be389/Lib/logging/__init__.py#L469

This isn't an issue on Python 3, and I was unable to reproduce it on my non-Windows machine; that particular form of strftime output may well be specific to Windows (or possibly even specific to Japanese flavours of Windows).


[1] https://docs.python.org/2/library/time.html#time.strftime
msg331239 - (view) Author: Mark Dickinson (mark.dickinson) * (Python committer) Date: 2018-12-06 16:54
Here's a bug report from someone else running into the same issue on Scrapy; looks like they worked around the issue by simply removing the timezone information from the `datefmt` (which is what we ended up doing, too): 

https://github.com/scrapy/scrapy/issues/1270
msg394674 - (view) Author: Irit Katriel (iritkatriel) * (Python committer) Date: 2021-05-28 17:24
Since this is not relevant to python 3, I think this issue can be closed.
msg394677 - (view) Author: Mark Dickinson (mark.dickinson) * (Python committer) Date: 2021-05-28 18:46
Agreed. Thank you!
History
Date User Action Args
2022-04-11 14:59:09adminsetgithub: 79608
2021-05-28 18:46:48mark.dickinsonsetstatus: pending -> closed

messages: + msg394677
stage: resolved
2021-05-28 17:24:28iritkatrielsetstatus: open -> pending

nosy: + iritkatriel
messages: + msg394674

resolution: out of date
2018-12-07 16:19:28xtreaksetnosy: + vinay.sajip
2018-12-06 16:54:24mark.dickinsonsetmessages: + msg331239
2018-12-06 16:39:34mark.dickinsoncreate