Error when printing an exception containing a Unicode string #46769

christoph · 2008-03-30T23:13:52Z

BPO	2517
Nosy	@malemburg, @birkenfeld, @amauryfa, @ncoghlan, @pitrou, @benjaminp, @ezio-melotti
Files	unicode_exception_warning.patch exception-unicode.diff: Patch implementing BaseException.unicode tp_unicode_exception.patch: Patch to provide tp_unicode slot, and implementation for Exception exception-unicode-with-type-fetch.diff: Implement Nigh Coghlan's suggestion from 67944. exception-unicode-with-type-fetch-no-whitespace-changes.diff: Simon's patch with unneeded whitespace changes removed.

^{Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.}

Show more details

GitHub fields:

assignee = 'https://github.com/ncoghlan'
closed_at = <Date 2008-07-08.14:15:53.841>
created_at = <Date 2008-03-30.23:13:52.000>
labels = ['type-bug', 'expert-unicode']
title = 'Error when printing an exception containing a Unicode string'
updated_at = <Date 2019-01-10.21:18:29.713>
user = 'https://bugs.python.org/christoph'

bugs.python.org fields:

activity = <Date 2019-01-10.21:18:29.713>
actor = 'piotr.dobrogost'
assignee = 'ncoghlan'
closed = True
closed_date = <Date 2008-07-08.14:15:53.841>
closer = 'ncoghlan'
components = ['Unicode']
creation = <Date 2008-03-30.23:13:52.000>
creator = 'christoph'
dependencies = []
files = ['9915', '10559', '10562', '10580', '10585']
hgrepos = []
issue_num = 2517
keywords = ['patch']
message_count = 36.0
messages = ['64770', '64771', '64779', '64781', '64782', '64786', '64793', '64794', '64795', '64797', '64798', '64802', '64807', '64866', '64876', '67863', '67865', '67867', '67868', '67869', '67870', '67874', '67875', '67944', '67946', '67947', '67950', '67974', '67980', '67984', '67985', '67994', '68394', '69384', '69436', '333419']
nosy_count = 12.0
nosy_names = ['lemburg', 'georg.brandl', 'amaury.forgeotdarc', 'ncoghlan', 'davidfraser', 'ggenellina', 'pitrou', 'benjamin.peterson', 'christoph', 'ezio.melotti', 'hodgestar', 'piotr.dobrogost']
pr_nums = []
priority = 'critical'
resolution = 'fixed'
stage = None
status = 'closed'
superseder = None
type = 'behavior'
url = 'https://bugs.python.org/issue2517'
versions = ['Python 2.6', 'Python 2.5']

christoph · 2008-03-30T23:13:51Z

Python seems to have problems when an exception is thrown that
contains non-ASCII text as a message and is converted to a string.

>>> try:
...     raise Exception(u'Error when printing ü')
... except Exception, e:
...     print e
...
Traceback (most recent call last):
  File "", line 4, in ?
UnicodeEncodeError: 'ascii' codec can't encode character u'\xfc' in 
position 20:
ordinal not in range(128)

See
http://www.stud.uni-karlsruhe.de/~uyhc/de/content/python-and-exceptions-containing-unicode-messages

benjaminp · 2008-03-30T23:21:25Z

That is because Python encodes it's error messages as ASCII by default,
and "ü" is not in ASCII. You can fix this by using "print
unicode_msg.encode("utf-8")" or something similar.

christoph · 2008-03-31T09:47:43Z

To be more precise: I see no way to convert the encapsulated non-ASCII
data from the string in an easy way.
Taking e from my last post none of the following will work:
str(e) # UnicodeDecodeError
e.__str__() # UnicodeDecodeError
e.__unicode__() # AttributeError
unicode(e) # UnicodeDecodeError
unicode(e, 'utf8') # TypeError

My solution around this right now is raising an exception with an
already converted string (see the link I provided).

But as the tutorials speak of simply "print e" I guess the behaviour
described above is some kind of a bug.

benjaminp · 2008-03-31T11:58:16Z

Use: print unicode(e.message).encode("utf-8")

christoph · 2008-03-31T12:19:24Z

Thanks, this does work.

But, where can I find the piece of information you just gave to me in
the docs? I couldn't find any interface definition for Exceptions.

Further more will this be regarded as a bug?
From [1] I understand that "unicode(e)" and "unicode(e, 'utf8')" are
supposed to work. No limitations are made on the type of the object.
And I suppose that unicode() is the exact equivalent of str() in that
it copes with unicode strings. Not expecting the string representation
of an Exception to return a Unicode string when its content is
non-ASCII where as this kind of behaviour of simple string conversion
is wished for with ASCII text seems unlikely cumbersome.

Please reopen if my report does have a point.

[1] http://docs.python.org/lib/built-in-funcs.html

amauryfa · 2008-03-31T16:13:59Z

Note the interpreter cannot print the exception either:

>>> raise Exception(u'Error when printing ü')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
Exception>>>

benjaminp · 2008-03-31T21:36:01Z

I am going to reopen this issue for Py3k. The recommended encoding for
Python source files in 2.x is ASCII; I wouldn't say correctly dealing
with non-ASCII exceptions is fully supported. In 3.x, however, the
recommended encoding is UTF-8, so this should work.

In Py3k,
str(e) # str is unicode in Py3k
does work correctly, and that'll have to be used because the message
attribute is gone is 3.x.
However, the problem Amaury pointed out is not fixed. Exceptions that
cannot encoding into ASCII are silently not printed. I think a warning
should at least be printed.

christoph · 2008-03-31T22:19:24Z

Though I welcome the reopening of the bug for Python 3.0 I must say
that plans of not fixing a core element rather surprises me.

I never believed Python to be a programming language with good Unicode
integration. Several points were missing that would've been nice or
even essential to have for good development with Unicode, most ignored
for the sake of maintaining backward compatibility. This though is not
the fault of the Unicode class itself and supporting packages.

Some modules like the one for CSV are lacking full Unicode support.
But nevertheless the basic Python would always give you the
possibility to use Unicode in (at least) a consistent way. For me
raising exceptions does count as basic support like this.

So I still hope to see this solved for the 2.x versions which I read
will be maintained even after the release of 3.0.

benjaminp · 2008-03-31T22:30:42Z

I never believed Python to be a programming language with good Unicode
integration. Several points were missing that would've been nice or
even essential to have for good development with Unicode, most ignored
for the sake of maintaining backward compatibility. This though is not
the fault of the Unicode class itself and supporting packages.
Many (including myself) agree with you. That's pretty much the whole
point of Py3k. We want to fix the Python "warts" which can only be fixed
by breaking backwards compatibility.

amauryfa · 2008-03-31T23:10:48Z

Even in 2.5, __str__ is allowed to return a Unicode object;
we could change BaseException_str this way:

Index: exceptions.c
===================================================================

--- exceptions.c	(revision 61957)
+++ exceptions.c	(working copy)
@@ -108,6 +104,11 @@
         break;
     case 1:
         out = PyObject_Str(PyTuple_GET_ITEM(self->args, 0));
+        if (out == NULL &&
PyErr_ExceptionMatches(PyExc_UnicodeEncodeError))
+        {
+            PyErr_Clear();
+            out = PyObject_Unicode(PyTuple_GET_ITEM(self->args, 0));
+        }
         break;
     default:
         out = PyObject_Str(self->args);

Then str(e) still raises UnicodeEncodeError,
but unicode(e) returns the original message.

But I would like the opinion of an experimented core developer...

benjaminp · 2008-04-01T02:22:27Z

After thinking some more, I'm going to add 2.6 to this. I'm attaching a
patch for the trunk (it can be merged in Py3k, and maybe 2.5) which
displays a UnicodeWarning when an Exception cannot be displayed due to
encoding issues.

Georg, can you review Amaury's and my patches? Also, would mine be a
candidate for 2.5 backporting?

pitrou · 2008-04-01T06:41:07Z

Shouldn't it be an exception rather than a warning? The fact that an
exception can be downgraded to a warning (and thus involuntarily
silenced) is a bit disturbing IMHO.

Another possibility would be to display the warning, and *then* to
encode the exception message again in "replace" or "ignore" mode rather
than "strict" mode. That way exception messages are always displayed,
but not always properly. The ASCII part of the message is generally
useful, since it gives the exception name and most often the reason too.

benjaminp · 2008-04-01T12:44:42Z

Have you looked at PyErr_Display? There are many, many possible
exceptions, and it ignores them all because "too many callers rely on
this." So, I think all we can do is warn. I will look into encoding the
message differently.

christoph · 2008-04-02T17:42:24Z

JFTR:

print unicode(e.message).encode("utf-8")
only works for Python 2.5, not downwards.

benjaminp · 2008-04-02T20:57:53Z

We can't do much about that because only security fixes are backported
to version < 2.5.

hodgestar · 2008-06-09T13:39:17Z

One of the examples Christoph tried was

  unicode(Exception(u'\xe1'))

which fails quite oddly with:

UnicodeEncodeError: 'ascii' codec can't encode character u'\xe1' in
position 0: ordinal not in range(128)

The reason for this is Exception lacks an __unicode__ method
implementation so that unicode(e) does something like unicode(str(e))
which attempts to convert the exception arguments to the default
encoding (almost always ASCII) and fails.

Fixing this seems quite important. It's common to want to raise errors
with non-ASCII characters (e.g. when the data which caused the error
contains such characters). Usually the code raising the error has no way
of knowing how the characters should be encoded (exceptions can end up
being written to log files, displayed in web interfaces, that sort of
thing). This means raising exceptions with unicode messages. Using
unicode(e.message) is unattractive since it won't work in 3.0 and also
does not duplicate str(e)'s handling of the other exception __init__
arguments.

I'm attaching a patch which implements __unicode__ for BaseException.
Because of the lack of a tp_unicode slot to mirror tp_str slot, this
breaks the test that calls unicode(Exception). The existing test for
unicode(e) does unicode(Exception(u"Foo")) which is a bit of a non-test.
My patch adds a test of unicode(Exception(u'\xe1')) which fails without
the patch.

A quick look through trunk suggests implementing tp_unicode actually
wouldn't be a huge job. My worry is that this would constitute a change
to the C API for PyObjects and has little chance of acceptance into 2.6
(and in 3.0 all these issues disappear anyway). If there is some chance
of acceptance, I'm willing to write a patch that adds tp_unicode.

davidfraser · 2008-06-09T15:53:25Z

Aha - the __unicode__ method was previously there in Python 2.5, and was
ripped out because of the unicode(Exception) problem. See
http://bugs.python.org/issue1551432.

The reversion is in
http://svn.python.org/view/python/trunk/Objects/exceptions.c?rev=51837&r1=51770&r2=51837

benjaminp · 2008-06-09T15:56:35Z

On Mon, Jun 9, 2008 at 8:40 AM, Simon Cross <report@bugs.python.org> wrote:

Simon Cross <hodgestar@gmail.com> added the comment:

One of the examples Christoph tried was

unicode(Exception(u'\xe1'))

which fails quite oddly with:

UnicodeEncodeError: 'ascii' codec can't encode character u'\xe1' in
position 0: ordinal not in range(128)

The reason for this is Exception lacks an __unicode__ method
implementation so that unicode(e) does something like unicode(str(e))
which attempts to convert the exception arguments to the default
encoding (almost always ASCII) and fails.

What version are you using? In Py3k, str is unicode so __str__ can
return a unicode string.

Fixing this seems quite important. It's common to want to raise errors
with non-ASCII characters (e.g. when the data which caused the error
contains such characters). Usually the code raising the error has no way
of knowing how the characters should be encoded (exceptions can end up
being written to log files, displayed in web interfaces, that sort of
thing). This means raising exceptions with unicode messages. Using
unicode(e.message) is unattractive since it won't work in 3.0 and also
does not duplicate str(e)'s handling of the other exception __init__
arguments.

I'm attaching a patch which implements __unicode__ for BaseException.
Because of the lack of a tp_unicode slot to mirror tp_str slot, this
breaks the test that calls unicode(Exception). The existing test for
unicode(e) does unicode(Exception(u"Foo")) which is a bit of a non-test.
My patch adds a test of unicode(Exception(u'\xe1')) which fails without
the patch.

A quick look through trunk suggests implementing tp_unicode actually
wouldn't be a huge job. My worry is that this would constitute a change
to the C API for PyObjects and has little chance of acceptance into 2.6
(and in 3.0 all these issues disappear anyway). If there is some chance
of acceptance, I'm willing to write a patch that adds tp_unicode.

Email Python-dev for permission.

hodgestar · 2008-06-09T16:03:26Z

Concerning http://bugs.python.org/issue1551432:

I'd much rather have working unicode(e) than working unicode(Exception).
Calling unicode(C) on any class C which overrides __unicode__ is broken
without tp_unicode anyway.

hodgestar · 2008-06-09T16:11:24Z

Benjamin Peterson wrote:

What version are you using? In Py3k, str is unicode so __str__ can
return a unicode string.

I'm sorry it wasn't clear. I'm aware that this issue doesn't apply to
Python 3.0. I'm testing on both Python 2.5 and Python 2.6 for the
purposes of the bug.

Code I'm developing that hits these issues are database exceptions with
unicode messages raised inside MySQLdb on Python 2.5.

The patch I submitted is against trunk.

malemburg · 2008-06-09T16:20:45Z

Removing 3.0 from the versions list.

davidfraser · 2008-06-09T19:04:50Z

So I've got a follow-up patch that adds tp_unicode.
Caveat that I've never done anything like this before and it's almost
certain to be wrong.

It does however generate the desired result in this case :-)

benjaminp · 2008-06-09T19:09:03Z

On Mon, Jun 9, 2008 at 2:04 PM, David Fraser <report@bugs.python.org> wrote:

David Fraser <davidf@sjsoft.com> added the comment:

So I've got a follow-up patch that adds tp_unicode.
Caveat that I've never done anything like this before and it's almost
certain to be wrong.

Unfortunately, adding a slot is a bit more complicated. You have to
deal with inheritance and such. Have a look in typeobject.c for all
the gory details. I'd recommend you write to python-dev before going
on the undertaking, though.

It does however generate the desired result in this case :-)

Added file: http://bugs.python.org/file10562/tp_unicode_exception.patch

Python tracker <report@bugs.python.org>
<http://bugs.python.org/issue2517\>

ncoghlan · 2008-06-11T09:32:10Z

As far as I am concerned, the implementation of PyObject_Unicode in
object.c has a bug in it: it should NEVER be retrieving __unicode__ from
the instance object. The implementation of PyObject_Format in abstract.c
shows the correct way to retrieve a pseudo-slot method like __unicode__
from an arbitrary object.

Line 482 in object.c is the offending line:
func = PyObject_GetAttr(v, unicodestr);

Fix that bug, then add a __unicode__ method back to Exception objects
and you will have the best of both worlds.

malemburg · 2008-06-11T09:47:15Z

On 2008-06-11 11:32, Nick Coghlan wrote:

Nick Coghlan <ncoghlan@gmail.com> added the comment:

As far as I am concerned, the implementation of PyObject_Unicode in
object.c has a bug in it: it should NEVER be retrieving __unicode__ from
the instance object. The implementation of PyObject_Format in abstract.c
shows the correct way to retrieve a pseudo-slot method like __unicode__
from an arbitrary object.

The only difference I can spot is that the PyObject_Format() code
special cases non-instance objects.

Line 482 in object.c is the offending line:
func = PyObject_GetAttr(v, unicodestr);

Fix that bug, then add a __unicode__ method back to Exception objects
and you will have the best of both worlds.

I'm not sure whether that would really solve anything.

IMHO, it's better to implement the tp_unicode slot and then
check that before trying .__unicode__ (as mentioned in the comment
in PyObject_Unicode()).

ncoghlan · 2008-06-11T10:02:01Z

Here's the key difference with the way PyObject_Format looks up the
pseudo-slot method:

		PyObject *method = _PyType_Lookup(Py_TYPE(obj),
						  str__format__);

_PyType_Lookup instead of PyObject_GetAttr - so unicode(Exception) would
only look for type.__unicode__ and avoid getting confused by the utterly
irrelevant Exception.__unicode__ method (which is intended only for
printing Exception instances, not for printing the Exception type itself).

You then need the PyInstance_Check/PyObject_GetAttr special case for
retrieving the bound method because _PyType_Lookup won't work on classic
class instances.

hodgestar · 2008-06-11T11:54:38Z

Attached a patch which implements Nick Coghlan's suggestion. All
existing tests in test_exceptions.py and test_unicode.py pass as does
the new unicode(Exception(u"\xe1")) test.

ncoghlan · 2008-06-11T14:15:08Z

Minor cleanup of Simon's patch attached - aside from a couple of
unneeded whitespace changes, it all looks good to me.

Not checking it in yet, since it isn't critical for this week's beta
release - I'd prefer to leave it until after that has been dealt with.

malemburg · 2008-06-11T14:33:57Z

On 2008-06-11 16:15, Nick Coghlan wrote:

Nick Coghlan <ncoghlan@gmail.com> added the comment:

Minor cleanup of Simon's patch attached - aside from a couple of
unneeded whitespace changes, it all looks good to me.

Not checking it in yet, since it isn't critical for this week's beta
release - I'd prefer to leave it until after that has been dealt with.

Added file: http://bugs.python.org/file10585/exception-unicode-with-type-fetch-no-whitespace-changes.diff

That approach is fine as well.

I still like the idea to add a tp_unicode slot, though, since that's
still missing for C extension types to benefit from.

Perhaps we can have both ?!

ncoghlan · 2008-06-11T14:49:20Z

I'm not sure adding a dedicated method slot would be worth the hassle
involved - Py3k drop backs to just the tp_str slot anyway, and the only
thing you gain with a tp_unicode slot over _PyType_Lookup of a
__unicode__ attribute is a small reduction in memory usage and a slight
speed increase.

hodgestar · 2008-06-11T14:53:04Z

Re msg67974:

Minor cleanup of Simon's patch attached - aside from a couple of
unneeded whitespace changes, it all looks good to me.

Not checking it in yet, since it isn't critical for this week's beta
release - I'd prefer to leave it until after that has been dealt with.

Thanks for the clean-up, Nick. The mixture of tabs and spaces in the
current object.c was unpleasant :/.

malemburg · 2008-06-11T16:50:21Z

On 2008-06-11 16:49, Nick Coghlan wrote:

Nick Coghlan <ncoghlan@gmail.com> added the comment:

I'm not sure adding a dedicated method slot would be worth the hassle
involved - Py3k drop backs to just the tp_str slot anyway, and the only
thing you gain with a tp_unicode slot over _PyType_Lookup of a
__unicode__ attribute is a small reduction in memory usage and a slight
speed increase.

AFAIK, _PyType_Lookup will only work for base types, ie. objects
subclassing from object. C extension types often do not inherit from
object, since the attribute access mechanisms and object creation
are a lot simpler when not doing so.

hodgestar · 2008-06-19T08:24:25Z

Justing prodding the issue again now that the betas are out. What's the
next step?

ncoghlan · 2008-07-07T12:53:19Z

Adding this to my personal to-do list for the next beta release.

ncoghlan · 2008-07-08T14:14:13Z

Fixed in 64791.

Blocked from being merged to Py3k (since there is no longer a
__unicode__ special method).

For MAL: the PyInstance_Check included in the patch for the benefit of
classic classes defined in Python code also covers all of the classic C
extension classes which are not instances of object.

piotrdobrogost · 2019-01-10T21:18:30Z

Benjamin Peterson in comment https://bugs.python.org/issue2517#msg64771 wrote:

"That is because Python encodes it's error messages as ASCII by default…"

Could somebody please point where in the source code of Python 2 this happens?

christoph mannequin added topic-unicode type-bug An unexpected behavior, bug, or error labels Mar 30, 2008

benjaminp closed this as completed Mar 30, 2008

benjaminp added the invalid label Mar 30, 2008

benjaminp reopened this Mar 31, 2008

benjaminp removed the invalid label Mar 31, 2008

benjaminp assigned birkenfeld Apr 1, 2008

ncoghlan assigned ncoghlan and unassigned birkenfeld Jul 7, 2008

ncoghlan closed this as completed Jul 8, 2008

ezio-melotti transferred this issue from another repository Apr 10, 2022

Error when printing an exception containing a Unicode string #46769

Error when printing an exception containing a Unicode string #46769

Comments

christoph mannequin commented Mar 30, 2008

christoph mannequin commented Mar 30, 2008

benjaminp commented Mar 30, 2008

christoph mannequin commented Mar 31, 2008

benjaminp commented Mar 31, 2008

christoph mannequin commented Mar 31, 2008

amauryfa commented Mar 31, 2008

benjaminp commented Mar 31, 2008

christoph mannequin commented Mar 31, 2008

benjaminp commented Mar 31, 2008

amauryfa commented Mar 31, 2008

benjaminp commented Apr 1, 2008

pitrou commented Apr 1, 2008

benjaminp commented Apr 1, 2008

christoph mannequin commented Apr 2, 2008

benjaminp commented Apr 2, 2008

hodgestar mannequin commented Jun 9, 2008

davidfraser mannequin commented Jun 9, 2008

benjaminp commented Jun 9, 2008

hodgestar mannequin commented Jun 9, 2008

hodgestar mannequin commented Jun 9, 2008

malemburg commented Jun 9, 2008

davidfraser mannequin commented Jun 9, 2008

benjaminp commented Jun 9, 2008

ncoghlan commented Jun 11, 2008

malemburg commented Jun 11, 2008

ncoghlan commented Jun 11, 2008

hodgestar mannequin commented Jun 11, 2008

ncoghlan commented Jun 11, 2008

malemburg commented Jun 11, 2008

ncoghlan commented Jun 11, 2008

hodgestar mannequin commented Jun 11, 2008

malemburg commented Jun 11, 2008

hodgestar mannequin commented Jun 19, 2008

ncoghlan commented Jul 7, 2008

ncoghlan commented Jul 8, 2008

piotrdobrogost mannequin commented Jan 10, 2019