classification
Title: logging encoding failes some situation
Type: behavior Stage:
Components: Library (Lib) Versions: Python 3.0, Python 3.1, Python 3.2, Python 2.7, Python 2.6
process
Status: closed Resolution: not a bug
Dependencies: Superseder:
Assigned To: vinay.sajip Nosy List: methane, vinay.sajip
Priority: normal Keywords: patch

Created on 2009-09-25 05:29 by methane, last changed 2009-09-25 19:02 by methane. This issue is now closed.

Files
File name Uploaded Description Edit
logging_encode.patch methane, 2009-09-25 05:29 logging.__init__ and test_logging
foo.py methane, 2009-09-25 14:54 DecodeError sample
logging_error.py methane, 2009-09-25 16:39
Messages (9)
msg93100 - (view) Author: Inada Naoki (methane) * (Python committer) Date: 2009-09-25 05:29
When stream is codecs.writer object, stream.write(string) does
string.decode() internally and it may cause UnicodeDecodeError.

Then, fallback to utf-8 is not good.
I think good fallback logic is:
* When message is unicode, message.encode(stream.encoding or 'ascii',
'backslashreplace')
* When message is bytes, message.encode('string_escape')

Attached patch contains this logic, refactoring and test.
msg93103 - (view) Author: Vinay Sajip (vinay.sajip) * (Python committer) Date: 2009-09-25 14:34
Thanks, but I'm not sure I understand the reasoning.
stream.write(unicode_string) should not do decode() internally, though
of course it would do encode(). Can you explain a little more (with an
illustrative example) what problem you are trying to solve, and attach a
small script which shows the problem? Thanks.
msg93104 - (view) Author: Inada Naoki (methane) * (Python committer) Date: 2009-09-25 14:54
Please see and execute an attached foo.py.

In Python 2.6.2, this cause following error:
>python foo.py
Traceback (most recent call last):
  File "foo.py", line 3, in <module>
    f.write('\xaa')
  File "C:\usr\Python2.6\lib\codecs.py", line 686, in write
    return self.writer.write(data)
  File "C:\usr\Python2.6\lib\codecs.py", line 351, in write
    data, consumed = self.encode(object, self.errors)
UnicodeDecodeError: 'ascii' codec can't decode byte 0xaa in position 0:
ordinal not in ran
ge(128)
msg93106 - (view) Author: Vinay Sajip (vinay.sajip) * (Python committer) Date: 2009-09-25 15:10
There seems to be a problem with your foo.py. In it, you are writing a
byte-string to a stream returned from codecs.open. I don't think this is
correct: you should be writing a Unicode string to that stream, which
will convert to bytes using the stream's encoding, and write those bytes
to file. The following snippet illustrates:

>>> import codecs
>>> f = codecs.open('foo.txt', 'w', encoding='utf-8')
>>> f.write(u'\u76F4\u6A39\u7A32\u7530')
>>> f.close()
>>> f = open('foo.txt', 'r')
>>> f.read()
'\xe7\x9b\xb4\xe6\xa8\xb9\xe7\xa8\xb2\xe7\x94\xb0'

As you can see, the Unicode has been converted using UTF-8.
msg93112 - (view) Author: Inada Naoki (methane) * (Python committer) Date: 2009-09-25 16:39
Another sample.

Traceback (most recent call last):
  File "C:\usr\Python2.6\lib\logging\__init__.py", line 790, in emit
    stream.write(fs % msg.encode("UTF-8"))
UnicodeDecodeError: 'ascii' codec can't decode byte 0xaa in position 0:
ordinal not in range(128)

This is because logging.FileHandler uses codecs.open internally.
msg93114 - (view) Author: Vinay Sajip (vinay.sajip) * (Python committer) Date: 2009-09-25 17:09
Your second example (logging_error.py) fails for the same reason -
you're writing a byte-string to a stream which is expecting Unicode. The
error occurs in logging only it tries encoding as UTF-8 as a last-ditch
attempt - and that only happens because of an earlier exception caused
by you not writing a Unicode string.

In summary: If you open a stream via codecs.open, whether directly or
through the logging module, you are expecting the stream to do encoding
for you. Therefore, you only write Unicode to the stream - never a
byte-string. If you have a byte-string in your application which you
have obtained from somewhere else, convert it to Unicode using whatever
encoding applies to the source. Then, send the resulting Unicode to the
encoding stream (or logger).
msg93124 - (view) Author: Inada Naoki (methane) * (Python committer) Date: 2009-09-25 17:58
OK, you're right.
But logging is very basic feature and used very wide modules.
"All logging code should use unicode string" is right but difficult.

And logging may be used for debbuging usually. So I think
logging should write log as safe as possible.
When log.error(...) called, no-one wants UnicodeDecodeError
from logging in their log.
msg93125 - (view) Author: Vinay Sajip (vinay.sajip) * (Python committer) Date: 2009-09-25 18:15
It's not about logging - your first example (foo.py) didn't have any
logging code in it.

The problem is caused only when someone doesn't understand how Unicode
and codecs.open works, and logging can't fix this.

The rule is: If you use a stream without encoding and byte strings under
Python 2.x, you'll be OK - fine if you're using ASCII or Latin-1.
However, users of systems outside this (e.g. CJK or Cyrillic) will not
be covered.

For use anywhere, you really have to work in Unicode internally, decode
stuff on the way in and encode stuff on the way out. That's what the
codecs module is for.

If third-party libraries which you are using don't use Unicode properly,
then they are broken, and logging can't fix that. Any attempt to "paper
over the cracks" will fail sooner or later. It's better to identify the
problem exactly where it occurs: Python's Zen says "Errors should never
pass silently."

I'm closing this issue, as it's not really logging-related. Hope that's OK.
msg93126 - (view) Author: Inada Naoki (methane) * (Python committer) Date: 2009-09-25 19:02
OK, I agree.
Thank you for your answer.
History
Date User Action Args
2009-09-25 19:02:01methanesetmessages: + msg93126
2009-09-25 18:15:04vinay.sajipsetstatus: open -> closed
resolution: not a bug
messages: + msg93125
2009-09-25 17:58:42methanesetstatus: pending -> open

messages: + msg93124
2009-09-25 17:09:50vinay.sajipsetstatus: open -> pending

messages: + msg93114
2009-09-25 16:39:42methanesetfiles: + logging_error.py
status: pending -> open
messages: + msg93112
2009-09-25 15:10:39vinay.sajipsetstatus: open -> pending

messages: + msg93106
2009-09-25 14:54:39methanesetfiles: + foo.py
status: pending -> open
messages: + msg93104
2009-09-25 14:34:45vinay.sajipsetstatus: open -> pending

messages: + msg93103
2009-09-25 14:08:33benjamin.petersonsetassignee: vinay.sajip

nosy: + vinay.sajip
2009-09-25 13:19:56methanesettype: behavior
2009-09-25 05:47:58methanesetversions: + Python 3.0, Python 3.1, Python 3.2
2009-09-25 05:29:25methanecreate