classification
Title: Error when printing an exception containing a Unicode string
Type: behavior Stage:
Components: Unicode Versions: Python 2.6, Python 2.5
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: ncoghlan Nosy List: amaury.forgeotdarc, benjamin.peterson, christoph, davidfraser, ezio.melotti, georg.brandl, ggenellina, hodgestar, lemburg, ncoghlan, pitrou
Priority: critical Keywords: patch

Created on 2008-03-30 23:13 by christoph, last changed 2009-09-17 10:12 by ezio.melotti. This issue is now closed.

Files
File name Uploaded Description Edit
unicode_exception_warning.patch benjamin.peterson, 2008-04-01 02:22
exception-unicode.diff hodgestar, 2008-06-09 13:39 Patch implementing BaseException.__unicode__
tp_unicode_exception.patch davidfraser, 2008-06-09 19:04 Patch to provide tp_unicode slot, and implementation for Exception
exception-unicode-with-type-fetch.diff hodgestar, 2008-06-11 11:54 Implement Nigh Coghlan's suggestion from 67944.
exception-unicode-with-type-fetch-no-whitespace-changes.diff ncoghlan, 2008-06-11 14:15 Simon's patch with unneeded whitespace changes removed.
Messages (35)
msg64770 - (view) Author: Christoph Burgmer (christoph) Date: 2008-03-30 23:13
Python seems to have problems when an exception is thrown that 
contains non-ASCII text as a message and is converted to a string.

>>> try:
...     raise Exception(u'Error when printing ü')
... except Exception, e:
...     print e
...
Traceback (most recent call last):
  File "", line 4, in ?
UnicodeEncodeError: 'ascii' codec can't encode character u'\xfc' in 
position 20:
ordinal not in range(128)

See 
http://www.stud.uni-karlsruhe.de/~uyhc/de/content/python-and-exceptions-containing-unicode-messages
msg64771 - (view) Author: Benjamin Peterson (benjamin.peterson) * (Python committer) Date: 2008-03-30 23:21
That is because Python encodes it's error messages as ASCII by default,
and "ü" is not in ASCII. You can fix this by using "print
unicode_msg.encode("utf-8")" or something similar.
msg64779 - (view) Author: Christoph Burgmer (christoph) Date: 2008-03-31 09:47
To be more precise: I see no way to convert the encapsulated non-ASCII 
data from the string in an easy way.
Taking e from my last post none of the following will work:
str(e) # UnicodeDecodeError
e.__str__() # UnicodeDecodeError
e.__unicode__() # AttributeError
unicode(e) # UnicodeDecodeError
unicode(e, 'utf8') # TypeError

My solution around this right now is raising an exception with an 
already converted string (see the link I provided).

But as the tutorials speak of simply "print e" I guess the behaviour 
described above is some kind of a bug.
msg64781 - (view) Author: Benjamin Peterson (benjamin.peterson) * (Python committer) Date: 2008-03-31 11:58
Use: print unicode(e.message).encode("utf-8")
msg64782 - (view) Author: Christoph Burgmer (christoph) Date: 2008-03-31 12:19
Thanks, this does work.

But, where can I find the piece of information you just gave to me in 
the docs? I couldn't find any interface definition for Exceptions.

Further more will this be regarded as a bug?
From [1] I understand that "unicode(e)" and "unicode(e, 'utf8')" are 
supposed to work. No limitations are made on the type of the object. 
And I suppose that unicode() is the exact equivalent of str() in that 
it copes with unicode strings. Not expecting the string representation 
of an Exception to return a Unicode string when its content is 
non-ASCII where as this kind of behaviour of simple string conversion 
is wished for with ASCII text seems unlikely cumbersome.

Please reopen if my report does have a point.

[1] http://docs.python.org/lib/built-in-funcs.html
msg64786 - (view) Author: Amaury Forgeot d'Arc (amaury.forgeotdarc) * (Python committer) Date: 2008-03-31 16:13
Note the interpreter cannot print the exception either:

>>> raise Exception(u'Error when printing ü')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
Exception>>>
msg64793 - (view) Author: Benjamin Peterson (benjamin.peterson) * (Python committer) Date: 2008-03-31 21:36
I am going to reopen this issue for Py3k. The recommended encoding for
Python source files in 2.x is ASCII; I wouldn't say correctly dealing
with non-ASCII exceptions is fully supported. In 3.x, however, the
recommended encoding is UTF-8, so this should work.

In Py3k,
str(e) # str is unicode in Py3k
does work correctly, and that'll have to be used because the message
attribute is gone is 3.x.
However, the problem Amaury pointed out is not fixed. Exceptions that
cannot encoding into ASCII are silently not printed. I think a warning
should at least be printed.
msg64794 - (view) Author: Christoph Burgmer (christoph) Date: 2008-03-31 22:19
Though I welcome the reopening of the bug for Python 3.0 I must say 
that plans of not fixing a core element rather surprises me.

I never believed Python to be a programming language with good Unicode 
integration. Several points were missing that would've been nice or 
even essential to have for good development with Unicode, most ignored 
for the sake of maintaining backward compatibility. This though is not 
the fault of the Unicode class itself and supporting packages.

Some modules like the one for CSV are lacking full Unicode support. 
But nevertheless the basic Python would always give you the 
possibility to use Unicode in (at least) a consistent way. For me 
raising exceptions does count as basic support like this.

So I still hope to see this solved for the 2.x versions which I read 
will be maintained even after the release of 3.0.
msg64795 - (view) Author: Benjamin Peterson (benjamin.peterson) * (Python committer) Date: 2008-03-31 22:30
>I never believed Python to be a programming language with good Unicode 
>integration. Several points were missing that would've been nice or 
>even essential to have for good development with Unicode, most ignored 
>for the sake of maintaining backward compatibility. This though is not 
>the fault of the Unicode class itself and supporting packages.
Many (including myself) agree with you. That's pretty much the whole
point of Py3k. We want to fix the Python "warts" which can only be fixed
by breaking backwards compatibility.
msg64797 - (view) Author: Amaury Forgeot d'Arc (amaury.forgeotdarc) * (Python committer) Date: 2008-03-31 23:10
Even in 2.5, __str__ is allowed to return a Unicode object;
we could change BaseException_str this way:

Index: exceptions.c
===================================================================
--- exceptions.c	(revision 61957)
+++ exceptions.c	(working copy)
@@ -108,6 +104,11 @@
         break;
     case 1:
         out = PyObject_Str(PyTuple_GET_ITEM(self->args, 0));
+        if (out == NULL &&
PyErr_ExceptionMatches(PyExc_UnicodeEncodeError))
+        {
+            PyErr_Clear();
+            out = PyObject_Unicode(PyTuple_GET_ITEM(self->args, 0));
+        }
         break;
     default:
         out = PyObject_Str(self->args);

Then str(e) still raises UnicodeEncodeError,
but unicode(e) returns the original message.

But I would like the opinion of an experimented core developer...
msg64798 - (view) Author: Benjamin Peterson (benjamin.peterson) * (Python committer) Date: 2008-04-01 02:22
After thinking some more, I'm going to add 2.6 to this. I'm attaching a
patch for the trunk (it can be merged in Py3k, and maybe 2.5) which
displays a UnicodeWarning when an Exception cannot be displayed due to
encoding issues.

Georg, can you review Amaury's and my patches? Also, would mine be a
candidate for 2.5 backporting?
msg64802 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2008-04-01 06:41
Shouldn't it be an exception rather than a warning? The fact that an
exception can be downgraded to a warning (and thus involuntarily
silenced) is a bit disturbing IMHO.

Another possibility would be to display the warning, and *then* to
encode the exception message again in "replace" or "ignore" mode rather
than "strict" mode. That way exception messages are always displayed,
but not always properly. The ASCII part of the message is generally
useful, since it gives the exception name and most often the reason too.
msg64807 - (view) Author: Benjamin Peterson (benjamin.peterson) * (Python committer) Date: 2008-04-01 12:44
Have you looked at PyErr_Display? There are many, many possible
exceptions, and it ignores them all because "too many callers rely on
this." So, I think all we can do is warn. I will look into encoding the
message differently.
msg64866 - (view) Author: Christoph Burgmer (christoph) Date: 2008-04-02 17:42
JFTR:
> print unicode(e.message).encode("utf-8")
only works for Python 2.5, not downwards.
msg64876 - (view) Author: Benjamin Peterson (benjamin.peterson) * (Python committer) Date: 2008-04-02 20:57
We can't do much about that because only security fixes are backported
to version < 2.5.
msg67863 - (view) Author: Simon Cross (hodgestar) Date: 2008-06-09 13:39
One of the examples Christoph tried was

  unicode(Exception(u'\xe1'))

which fails quite oddly with:

  UnicodeEncodeError: 'ascii' codec can't encode character u'\xe1' in
position 0: ordinal not in range(128)

The reason for this is Exception lacks an __unicode__ method
implementation so that unicode(e) does something like unicode(str(e))
which attempts to convert the exception arguments to the default
encoding (almost always ASCII) and fails.

Fixing this seems quite important. It's common to want to raise errors
with non-ASCII characters (e.g. when the data which caused the error
contains such characters). Usually the code raising the error has no way
of knowing how the characters should be encoded (exceptions can end up
being written to log files, displayed in web interfaces, that sort of
thing). This means raising exceptions with unicode messages. Using
unicode(e.message) is unattractive since it won't work in 3.0 and also
does not duplicate str(e)'s handling of the other exception __init__
arguments.

I'm attaching a patch which implements __unicode__ for BaseException.
Because of the lack of a tp_unicode slot to mirror tp_str slot, this
breaks the test that calls unicode(Exception). The existing test for
unicode(e) does unicode(Exception(u"Foo")) which is a bit of a non-test.
My patch adds a test of unicode(Exception(u'\xe1')) which fails without
the patch.

A quick look through trunk suggests implementing tp_unicode actually
wouldn't be a huge job. My worry is that this would constitute a change
to the C API for PyObjects and has little chance of acceptance into 2.6
(and in 3.0 all these issues disappear anyway). If there is some chance
of acceptance, I'm willing to write a patch that adds tp_unicode.
msg67865 - (view) Author: David Fraser (davidfraser) Date: 2008-06-09 15:53
Aha - the __unicode__ method was previously there in Python 2.5, and was
ripped out because of the unicode(Exception) problem. See
http://bugs.python.org/issue1551432.

The reversion is in
http://svn.python.org/view/python/trunk/Objects/exceptions.c?rev=51837&r1=51770&r2=51837
msg67867 - (view) Author: Benjamin Peterson (benjamin.peterson) * (Python committer) Date: 2008-06-09 15:56
On Mon, Jun 9, 2008 at 8:40 AM, Simon Cross <report@bugs.python.org> wrote:
>
> Simon Cross <hodgestar@gmail.com> added the comment:
>
> One of the examples Christoph tried was
>
>  unicode(Exception(u'\xe1'))
>
> which fails quite oddly with:
>
>  UnicodeEncodeError: 'ascii' codec can't encode character u'\xe1' in
> position 0: ordinal not in range(128)
>
> The reason for this is Exception lacks an __unicode__ method
> implementation so that unicode(e) does something like unicode(str(e))
> which attempts to convert the exception arguments to the default
> encoding (almost always ASCII) and fails.

What version are you using? In Py3k, str is unicode so __str__ can
return a unicode string.

>
> Fixing this seems quite important. It's common to want to raise errors
> with non-ASCII characters (e.g. when the data which caused the error
> contains such characters). Usually the code raising the error has no way
> of knowing how the characters should be encoded (exceptions can end up
> being written to log files, displayed in web interfaces, that sort of
> thing). This means raising exceptions with unicode messages. Using
> unicode(e.message) is unattractive since it won't work in 3.0 and also
> does not duplicate str(e)'s handling of the other exception __init__
> arguments.
>
> I'm attaching a patch which implements __unicode__ for BaseException.
> Because of the lack of a tp_unicode slot to mirror tp_str slot, this
> breaks the test that calls unicode(Exception). The existing test for
> unicode(e) does unicode(Exception(u"Foo")) which is a bit of a non-test.
> My patch adds a test of unicode(Exception(u'\xe1')) which fails without
> the patch.
>
> A quick look through trunk suggests implementing tp_unicode actually
> wouldn't be a huge job. My worry is that this would constitute a change
> to the C API for PyObjects and has little chance of acceptance into 2.6
> (and in 3.0 all these issues disappear anyway). If there is some chance
> of acceptance, I'm willing to write a patch that adds tp_unicode.

Email Python-dev for permission.
msg67868 - (view) Author: Simon Cross (hodgestar) Date: 2008-06-09 16:03
Concerning http://bugs.python.org/issue1551432:

I'd much rather have working unicode(e) than working unicode(Exception).
Calling unicode(C) on any class C which overrides __unicode__ is broken
without tp_unicode anyway.
msg67869 - (view) Author: Simon Cross (hodgestar) Date: 2008-06-09 16:11
Benjamin Peterson wrote:
> What version are you using? In Py3k, str is unicode so __str__ can
> return a unicode string.

I'm sorry it wasn't clear. I'm aware that this issue doesn't apply to
Python 3.0. I'm testing on both Python 2.5 and Python 2.6 for the
purposes of the bug.

Code I'm developing that hits these issues are database exceptions with
unicode messages raised inside MySQLdb on Python 2.5.

The patch I submitted is against trunk.
msg67870 - (view) Author: Marc-Andre Lemburg (lemburg) * (Python committer) Date: 2008-06-09 16:20
Removing 3.0 from the versions list.
msg67874 - (view) Author: David Fraser (davidfraser) Date: 2008-06-09 19:04
So I've got a follow-up patch that adds tp_unicode.
Caveat that I've never done anything like this before and it's almost
certain to be wrong.

It does however generate the desired result in this case :-)
msg67875 - (view) Author: Benjamin Peterson (benjamin.peterson) * (Python committer) Date: 2008-06-09 19:09
On Mon, Jun 9, 2008 at 2:04 PM, David Fraser <report@bugs.python.org> wrote:
>
> David Fraser <davidf@sjsoft.com> added the comment:
>
> So I've got a follow-up patch that adds tp_unicode.
> Caveat that I've never done anything like this before and it's almost
> certain to be wrong.

Unfortunately, adding a slot is a bit more complicated. You have to
deal with inheritance and such. Have a look in typeobject.c for all
the gory details. I'd recommend you write to python-dev before going
on the undertaking, though.

>
> It does however generate the desired result in this case :-)
>
> Added file: http://bugs.python.org/file10562/tp_unicode_exception.patch
>
> _______________________________________
> Python tracker <report@bugs.python.org>
> <http://bugs.python.org/issue2517>
> _______________________________________
>
msg67944 - (view) Author: Nick Coghlan (ncoghlan) * (Python committer) Date: 2008-06-11 09:32
As far as I am concerned, the implementation of PyObject_Unicode in
object.c has a bug in it: it should NEVER be retrieving __unicode__ from
the instance object. The implementation of PyObject_Format in abstract.c
shows the correct way to retrieve a pseudo-slot method like __unicode__
from an arbitrary object.

Line 482 in object.c is the offending line:
	func = PyObject_GetAttr(v, unicodestr);

Fix that bug, then add a __unicode__ method back to Exception objects
and you will have the best of both worlds.
msg67946 - (view) Author: Marc-Andre Lemburg (lemburg) * (Python committer) Date: 2008-06-11 09:47
On 2008-06-11 11:32, Nick Coghlan wrote:
> Nick Coghlan <ncoghlan@gmail.com> added the comment:
> 
> As far as I am concerned, the implementation of PyObject_Unicode in
> object.c has a bug in it: it should NEVER be retrieving __unicode__ from
> the instance object. The implementation of PyObject_Format in abstract.c
> shows the correct way to retrieve a pseudo-slot method like __unicode__
> from an arbitrary object.

The only difference I can spot is that the PyObject_Format() code
special cases non-instance objects.

> Line 482 in object.c is the offending line:
> 	func = PyObject_GetAttr(v, unicodestr);
> 
> Fix that bug, then add a __unicode__ method back to Exception objects
> and you will have the best of both worlds.

I'm not sure whether that would really solve anything.

IMHO, it's better to implement the tp_unicode slot and then
check that before trying .__unicode__ (as mentioned in the comment
in PyObject_Unicode()).
msg67947 - (view) Author: Nick Coghlan (ncoghlan) * (Python committer) Date: 2008-06-11 10:02
Here's the key difference with the way PyObject_Format looks up the
pseudo-slot method:

		PyObject *method = _PyType_Lookup(Py_TYPE(obj),
						  str__format__);

_PyType_Lookup instead of PyObject_GetAttr - so unicode(Exception) would
only look for type.__unicode__ and avoid getting confused by the utterly
irrelevant Exception.__unicode__ method (which is intended only for
printing Exception instances, not for printing the Exception type itself).

You then need the PyInstance_Check/PyObject_GetAttr special case for
retrieving the bound method because _PyType_Lookup won't work on classic
class instances.
msg67950 - (view) Author: Simon Cross (hodgestar) Date: 2008-06-11 11:54
Attached a patch which implements Nick Coghlan's suggestion. All
existing tests in test_exceptions.py and test_unicode.py pass as does
the new unicode(Exception(u"\xe1")) test.
msg67974 - (view) Author: Nick Coghlan (ncoghlan) * (Python committer) Date: 2008-06-11 14:15
Minor cleanup of Simon's patch attached - aside from a couple of
unneeded whitespace changes, it all looks good to me.

Not checking it in yet, since it isn't critical for this week's beta
release - I'd prefer to leave it until after that has been dealt with.
msg67980 - (view) Author: Marc-Andre Lemburg (lemburg) * (Python committer) Date: 2008-06-11 14:33
On 2008-06-11 16:15, Nick Coghlan wrote:
> Nick Coghlan <ncoghlan@gmail.com> added the comment:
> 
> Minor cleanup of Simon's patch attached - aside from a couple of
> unneeded whitespace changes, it all looks good to me.
> 
> Not checking it in yet, since it isn't critical for this week's beta
> release - I'd prefer to leave it until after that has been dealt with.
> 
> Added file: http://bugs.python.org/file10585/exception-unicode-with-type-fetch-no-whitespace-changes.diff

That approach is fine as well.

I still like the idea to add a tp_unicode slot, though, since that's
still missing for C extension types to benefit from.

Perhaps we can have both ?!
msg67984 - (view) Author: Nick Coghlan (ncoghlan) * (Python committer) Date: 2008-06-11 14:49
I'm not sure adding a dedicated method slot would be worth the hassle
involved - Py3k drop backs to just the tp_str slot anyway, and the only
thing you gain with a tp_unicode slot over _PyType_Lookup of a
__unicode__ attribute is a small reduction in memory usage and a slight
speed increase.
msg67985 - (view) Author: Simon Cross (hodgestar) Date: 2008-06-11 14:53
Re msg67974:
> Minor cleanup of Simon's patch attached - aside from a couple of
> unneeded whitespace changes, it all looks good to me.
>
> Not checking it in yet, since it isn't critical for this week's beta
> release - I'd prefer to leave it until after that has been dealt with.

Thanks for the clean-up, Nick. The mixture of tabs and spaces in the
current object.c was unpleasant :/.
msg67994 - (view) Author: Marc-Andre Lemburg (lemburg) * (Python committer) Date: 2008-06-11 16:50
On 2008-06-11 16:49, Nick Coghlan wrote:
> Nick Coghlan <ncoghlan@gmail.com> added the comment:
> 
> I'm not sure adding a dedicated method slot would be worth the hassle
> involved - Py3k drop backs to just the tp_str slot anyway, and the only
> thing you gain with a tp_unicode slot over _PyType_Lookup of a
> __unicode__ attribute is a small reduction in memory usage and a slight
> speed increase.

AFAIK, _PyType_Lookup will only work for base types, ie. objects
subclassing from object. C extension types often do not inherit from
object, since the attribute access mechanisms and object creation
are a lot simpler when not doing so.
msg68394 - (view) Author: Simon Cross (hodgestar) Date: 2008-06-19 08:24
Justing prodding the issue again now that the betas are out. What's the
next step?
msg69384 - (view) Author: Nick Coghlan (ncoghlan) * (Python committer) Date: 2008-07-07 12:53
Adding this to my personal to-do list for the next beta release.
msg69436 - (view) Author: Nick Coghlan (ncoghlan) * (Python committer) Date: 2008-07-08 14:14
Fixed in 64791.

Blocked from being merged to Py3k (since there is no longer a
__unicode__ special method).

For MAL: the PyInstance_Check included in the patch for the benefit of
classic classes defined in Python code also covers all of the classic C
extension classes which are not instances of object.
History
Date User Action Args
2009-09-17 10:12:59ezio.melottisetnosy: + ezio.melotti
2008-07-08 14:15:53ncoghlansetstatus: open -> closed
resolution: fixed
2008-07-08 14:14:15ncoghlansetmessages: + msg69436
2008-07-07 12:53:19ncoghlansetpriority: normal -> critical
assignee: georg.brandl -> ncoghlan
messages: + msg69384
2008-06-19 08:24:25hodgestarsetmessages: + msg68394
2008-06-13 02:22:12ggenellinasetnosy: + ggenellina
2008-06-11 16:50:26lemburgsetmessages: + msg67994
2008-06-11 14:53:04hodgestarsetmessages: + msg67985
2008-06-11 14:49:20ncoghlansetmessages: + msg67984
2008-06-11 14:33:57lemburgsetmessages: + msg67980
2008-06-11 14:15:11ncoghlansetfiles: + exception-unicode-with-type-fetch-no-whitespace-changes.diff
messages: + msg67974
2008-06-11 11:54:40hodgestarsetfiles: + exception-unicode-with-type-fetch.diff
messages: + msg67950
2008-06-11 10:02:03ncoghlansetmessages: + msg67947
2008-06-11 09:47:17lemburgsetmessages: + msg67946
2008-06-11 09:32:22ncoghlansetnosy: + ncoghlan
messages: + msg67944
2008-06-09 19:09:03benjamin.petersonsetmessages: + msg67875
2008-06-09 19:04:55davidfrasersetfiles: + tp_unicode_exception.patch
messages: + msg67874
2008-06-09 16:20:45lemburgsetnosy: + lemburg
messages: + msg67870
versions: - Python 3.0
2008-06-09 16:11:28hodgestarsetmessages: + msg67869
2008-06-09 16:03:25hodgestarsetmessages: + msg67868
2008-06-09 15:56:37benjamin.petersonsetmessages: + msg67867
2008-06-09 15:53:25davidfrasersetmessages: + msg67865
2008-06-09 15:44:46davidfrasersetnosy: + davidfraser
2008-06-09 13:39:22hodgestarsetfiles: + exception-unicode.diff
nosy: + hodgestar
messages: + msg67863
2008-04-02 20:57:53benjamin.petersonsetmessages: + msg64876
2008-04-02 17:42:23christophsetmessages: + msg64866
2008-04-01 12:44:42benjamin.petersonsetmessages: + msg64807
2008-04-01 06:41:07pitrousetnosy: + pitrou
messages: + msg64802
2008-04-01 02:22:28benjamin.petersonsetfiles: + unicode_exception_warning.patch
versions: + Python 2.6, Python 2.5
nosy: + georg.brandl
messages: + msg64798
assignee: georg.brandl
keywords: + patch
2008-03-31 23:10:48amaury.forgeotdarcsetmessages: + msg64797
2008-03-31 22:30:42benjamin.petersonsetmessages: + msg64795
2008-03-31 22:19:26christophsetmessages: + msg64794
2008-03-31 21:36:01benjamin.petersonsetstatus: closed -> open
priority: normal
resolution: not a bug -> (no value)
messages: + msg64793
versions: + Python 3.0, - Python 2.5, Python 2.4
2008-03-31 16:13:59amaury.forgeotdarcsetnosy: + amaury.forgeotdarc
messages: + msg64786
2008-03-31 12:19:24christophsetmessages: + msg64782
2008-03-31 11:58:21benjamin.petersonsetmessages: + msg64781
2008-03-31 09:47:43christophsetmessages: + msg64779
2008-03-30 23:21:25benjamin.petersonsetstatus: open -> closed
resolution: not a bug
messages: + msg64771
nosy: + benjamin.peterson
2008-03-30 23:13:52christophcreate