classification
Title: logging.StreamHandler encodes log message in UTF-8
Type: Stage:
Components: Library (Lib) Versions: Python 2.3
process
Status: closed Resolution: not a bug
Dependencies: Superseder:
Assigned To: vinay.sajip Nosy List: loewis, vdvo, vinay.sajip
Priority: normal Keywords:

Created on 2003-11-03 22:45 by vdvo, last changed 2004-03-02 13:32 by vinay.sajip. This issue is now closed.

Messages (5)
msg18886 - (view) Author: Vaclav Dvorak (vdvo) Date: 2003-11-03 22:45
For some reason that I do not see,
logging.StreamHandler in Python 2.3 insists on writing
plain non-Unicode strings to the stream, and the
encoding is hard-coded as UTF-8:

            if not hasattr(types, "UnicodeType"): #if
no unicode support...
                self.stream.write("%s\n" % msg)
            else:
                try:
                    self.stream.write("%s\n" % msg)
                except UnicodeError:
                    self.stream.write("%s\n" %
msg.encode("UTF-8"))

This behaviour is neither documented nor reasonable.
Files can be perfectly able to write Unicode strings
(e.g., through the use of codecs.EncodedFile or with a
default encoding of sys.stdout), and even if they are
not, UTF-8 is hardly the only choice for an encoding. I
propose to simply replace the above code with:

self.stream.write(msg)
self.stream.write("\n")
msg18887 - (view) Author: Martin v. Löwis (loewis) * (Python committer) Date: 2003-11-05 20:30
Logged In: YES 
user_id=21627

That would be an incompatible change, of course, as you then
may get encoding errors where you currently get none.
msg18888 - (view) Author: Vinay Sajip (vinay.sajip) * (Python committer) Date: 2004-03-01 12:10
Logged In: YES 
user_id=308438

Notice that UTF-8 is only used if a UnicodeError is detected. 
By default, "%s\n" % msg is written to the stream using the 
stream's write(). If the stream can handle this without raising 
a UnicodeError, then UTF-8 will not be used. Is there a 
specific use case/test script which demonstrates a problem?
msg18889 - (view) Author: Vaclav Dvorak (vdvo) Date: 2004-03-02 09:22
Logged In: YES 
user_id=545628

Hmmm... I can't remember what the exact problem was, but now
that I look at it again, I see that it must have been my
error. What a poor bug report this is. :-( Sorry.

Still, I'd like the encoding to be configurable: UTF-8 can
stay as the default, but it would be nice to have an option
to use, say, "iso-8859-2" or "windows-1250".
msg18890 - (view) Author: Vinay Sajip (vinay.sajip) * (Python committer) Date: 2004-03-02 13:32
Logged In: YES 
user_id=308438

If you want to use some other encoding, why not use a 
stream created using codecs.open(), and if necessary use a 
Formatter which is Unicode-aware to convert from msg + args 
to the formatted message? Then the exception handler should 
never be invoked.

Or, do you mean, for the exception handler? I think UTF-8 is 
OK as the default, since it is the most commonly used. I may 
consider making this configurable for a future release, if there 
is enough demand; for now you can patch it yourself.

I'll close this bug report now, I assume that's OK with you?
History
Date User Action Args
2003-11-03 22:45:43vdvocreate