Message 81841 - Python tracker

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author	ezio.melotti
Recipients	ezio.melotti, giampaolo.rodola, loewis, vstinner
Date	2009-02-13.00:17:25
SpamBayes Score	1.7900639e-07
Marked as misclassified	No
Message-id	<1234484248.33.0.430492360019.issue5110@psf.upfronthosting.co.za>
In-reply-to

Content
I've also noticed that if an error contains non-encodable characters, they are escaped: >>> raise ValueError("\u2620 can't be printed here, but '\u00e8' works fine!") Traceback (most recent call last): File "<stdin>", line 1, in <module> ValueError: \u2620 can't be printed here, but 'è' works fine! but: >>> "\u2620 can't be printed here, but '\u00e8' works fine!" UnicodeEncodeError: 'charmap' codec can't encode character '\u2620' in position 1: character maps to <undefined> The mechanism used to escape errors is even better than my patch, because it escapes only the chars that can't be encoded, instead of escaping every non-ascii chars when at least one char can't be encoded: >>> "\u2620 can't be printed here, but '\u00e8' works fine!" "\u2620 can't be printed here, but '\xe8' works fine!" I wonder if we can reuse the same mechanism here. By the way, the patch I proposed in msg80852 is just a proof of concept, if you think it's OK, someone will probably have to implement it in C.

I've also noticed that if an error contains non-encodable characters,
they are escaped:
>>> raise ValueError("\u2620 can't be printed here, but '\u00e8' works
fine!")
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
ValueError: \u2620 can't be printed here, but 'è' works fine!

but:
>>> "\u2620 can't be printed here, but '\u00e8' works fine!"
UnicodeEncodeError: 'charmap' codec can't encode character '\u2620' in
position 1: character maps to <undefined>

The mechanism used to escape errors is even better than my patch,
because it escapes only the chars that can't be encoded, instead of
escaping every non-ascii chars when at least one char can't be encoded:
>>> "\u2620 can't be printed here, but '\u00e8' works fine!"
"\u2620 can't be printed here, but '\xe8' works fine!"

I wonder if we can reuse the same mechanism here.

By the way, the patch I proposed in msg80852 is just a proof of concept,
if you think it's OK, someone will probably have to implement it in C.

History
Date	User	Action	Args
2009-02-13 00:17:28	ezio.melotti	set	recipients: + ezio.melotti, loewis, vstinner, giampaolo.rodola
2009-02-13 00:17:28	ezio.melotti	set	messageid: <1234484248.33.0.430492360019.issue5110@psf.upfronthosting.co.za>
2009-02-13 00:17:26	ezio.melotti	link	issue5110 messages
2009-02-13 00:17:25	ezio.melotti	create