classification
Title: Warnings error with non-ascii chars.
Type: behavior Stage: resolved
Components: Library (Lib) Versions: Python 2.7
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: serhiy.storchaka Nosy List: Lukáš.Němec, ezio.melotti, python-dev, r.david.murray, serhiy.storchaka, vstinner
Priority: normal Keywords: patch

Created on 2015-03-11 08:35 by Lukáš.Němec, last changed 2015-05-17 09:24 by serhiy.storchaka. This issue is now closed.

Files
File name Uploaded Description Edit
warnings_unicode.patch serhiy.storchaka, 2015-03-13 20:22 review
Messages (9)
msg237850 - (view) Author: Lukáš Němec (Lukáš.Němec) Date: 2015-03-11 08:35
File "/usr/lib/python2.7/warnings.py", line 29, in _show_warning
    file.write(formatwarning(message, category, filename, lineno, line))
  File "/usr/lib/python2.7/warnings.py", line 38, in formatwarning
    s =  "%s:%s: %s: %s\n" % (filename, lineno, category.__name__, message)
UnicodeEncodeError: 'ascii' codec can't encode character u'\xed' in position 42: ordinal not in range(128)

Only thing required to make this work is add "u" in front of the message so it is unicode. This will work for all ascii characters and all non-ascii should pass unicode anyway.
msg237929 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2015-03-12 13:04
Can you provide a complete failure example?  If message is unicode, the format string should (in python2) be auto-promoted to unicode.
msg238046 - (view) Author: Ezio Melotti (ezio.melotti) * (Python committer) Date: 2015-03-13 18:17
I think that the problem is actually with the file.write() in _show_warning().
If any of the arguments of formatwarning() are unicode, the result will be unicode, and if "file" (default sys.stderr) is opened in binary mode, Python will try to encode the unicode result with the ASCII codec and fail with a UnicodeEncodeError:

>>> warnings.showwarning(u'你好', DeprecationWarning, 'foo.py', 10)
foo.py:10: DeprecationWarning: 你好
>>> with open('err.log', 'wb') as f:
...     warnings.showwarning(u'你好', DeprecationWarning, 'foo.py', 10, file=f)
... 
Traceback (most recent call last):
  File "<stdin>", line 2, in <module>
  File "/usr/lib/python2.7/warnings.py", line 30, in _show_warning
    file.write(formatwarning(message, category, filename, lineno, line))
UnicodeEncodeError: 'ascii' codec can't encode characters in position 31-32: ordinal not in range(128)
msg238047 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2015-03-13 18:48
The problem is not only with the file.write(). If one of arguments is unicode (even if it doesn't contain non-ascii characters) and other argument is non-ascii string, we get this error.

>>> warnings.showwarning(u'', DeprecationWarning, 'filè.py', 10)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/serhiy/py/cpython-2.7/Lib/warnings.py", line 33, in _show_warning
    file.write(formatwarning(message, category, filename, lineno, line))
  File "/home/serhiy/py/cpython-2.7/Lib/warnings.py", line 42, in formatwarning
    s =  "%s:%s: %s: %s\n" % (filename, lineno, category.__name__, message)
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 3: ordinal not in range(128)

Non-ascii file names are rare, and unicode warnings are rare, that is why this bug was not fixed before. I think it is worth to fix. It is better to output modified warning (e.g. backslashescaped) than fail without clear diagnostic.
msg238051 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2015-03-13 20:22
Here is a patch that tries to coerce non-ascii filename and line to unicode using appropriate encoding if it is needed and possible. If it is not possible, the warning just gets lost, as in the case of IO error.
msg239361 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2015-03-26 23:04
> Only thing required to make this work is add "u" in front of the message so it is unicode.

The warnings module works with non-ASCII characters if you only use bytes strings. I'm not sure that we should enhance it to support the unicode type in some fields, and bytes fields in other fields.

This issue was already fixed in Python 3 with the global switch to Unicode by default for all strings.

I would prefer to not fix this issue.

Since it looks to a new feature, it's also not a good practice to add new features in minor python versions (2.7.x).
msg239365 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2015-03-26 23:21
The problem is that the warnings module works with unicode message almost all time except rare circumstances. So for sure this feature is used in many programs and it works for authors and most of users. An exception can be considered as a bug.
msg243057 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2015-05-13 08:31
One of the worst things in Python 2 is that all can work on author's machine in ASCII-only environment, but then unhelpfully fail on user machine with non-ASCII data. Especially when needed a combination of few conditions for the fail. This issue is about one of such cases. And even worse, it makes the program fail with unfriendly error message during an attempt to output possible helpful warning. It is very desirable to me to solve it.

What would you say about this Benjamin?
msg243315 - (view) Author: Roundup Robot (python-dev) (Python triager) Date: 2015-05-16 13:43
New changeset 22e44a7ee89f by Serhiy Storchaka in branch '2.7':
Issue #23637: Showing a warning no longer fails with UnicodeErrror.
https://hg.python.org/cpython/rev/22e44a7ee89f
History
Date User Action Args
2015-05-17 09:24:16serhiy.storchakasetstatus: open -> closed
resolution: fixed
stage: patch review -> resolved
2015-05-16 13:43:23python-devsetnosy: + python-dev
messages: + msg243315
2015-05-13 08:31:37serhiy.storchakasetmessages: + msg243057
2015-03-26 23:21:57serhiy.storchakasetmessages: + msg239365
2015-03-26 23:04:40vstinnersetnosy: + vstinner
messages: + msg239361
2015-03-24 20:49:03serhiy.storchakasetassignee: serhiy.storchaka
2015-03-13 20:22:36serhiy.storchakasetfiles: + warnings_unicode.patch
keywords: + patch
messages: + msg238051

stage: patch review
2015-03-13 18:48:19serhiy.storchakasetstatus: closed -> open

nosy: + serhiy.storchaka
messages: + msg238047

resolution: not a bug -> (no value)
stage: resolved -> (no value)
2015-03-13 18:17:36ezio.melottisetstatus: open -> closed

type: crash -> behavior

nosy: + ezio.melotti
messages: + msg238046
resolution: not a bug
stage: resolved
2015-03-12 13:04:27r.david.murraysetnosy: + r.david.murray
messages: + msg237929
2015-03-11 08:35:04Lukáš.Němeccreate