This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author hodgestar
Recipients amaury.forgeotdarc, benjamin.peterson, christoph, georg.brandl, hodgestar, pitrou
Date 2008-06-09.13:39:16
SpamBayes Score 3.574058e-05
Marked as misclassified No
Message-id <1213018824.25.0.137378819627.issue2517@psf.upfronthosting.co.za>
In-reply-to
Content
One of the examples Christoph tried was

  unicode(Exception(u'\xe1'))

which fails quite oddly with:

  UnicodeEncodeError: 'ascii' codec can't encode character u'\xe1' in
position 0: ordinal not in range(128)

The reason for this is Exception lacks an __unicode__ method
implementation so that unicode(e) does something like unicode(str(e))
which attempts to convert the exception arguments to the default
encoding (almost always ASCII) and fails.

Fixing this seems quite important. It's common to want to raise errors
with non-ASCII characters (e.g. when the data which caused the error
contains such characters). Usually the code raising the error has no way
of knowing how the characters should be encoded (exceptions can end up
being written to log files, displayed in web interfaces, that sort of
thing). This means raising exceptions with unicode messages. Using
unicode(e.message) is unattractive since it won't work in 3.0 and also
does not duplicate str(e)'s handling of the other exception __init__
arguments.

I'm attaching a patch which implements __unicode__ for BaseException.
Because of the lack of a tp_unicode slot to mirror tp_str slot, this
breaks the test that calls unicode(Exception). The existing test for
unicode(e) does unicode(Exception(u"Foo")) which is a bit of a non-test.
My patch adds a test of unicode(Exception(u'\xe1')) which fails without
the patch.

A quick look through trunk suggests implementing tp_unicode actually
wouldn't be a huge job. My worry is that this would constitute a change
to the C API for PyObjects and has little chance of acceptance into 2.6
(and in 3.0 all these issues disappear anyway). If there is some chance
of acceptance, I'm willing to write a patch that adds tp_unicode.
History
Date User Action Args
2008-06-09 13:40:24hodgestarsetspambayes_score: 3.57406e-05 -> 3.574058e-05
recipients: + hodgestar, georg.brandl, amaury.forgeotdarc, pitrou, benjamin.peterson, christoph
2008-06-09 13:40:24hodgestarsetspambayes_score: 3.57406e-05 -> 3.57406e-05
messageid: <1213018824.25.0.137378819627.issue2517@psf.upfronthosting.co.za>
2008-06-09 13:39:22hodgestarlinkissue2517 messages
2008-06-09 13:39:21hodgestarcreate