Message 184315 - Python tracker

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author	techtonik
Recipients	ezio.melotti, techtonik
Date	2013-03-16.13:59:33
SpamBayes Score	-1.0
Marked as misclassified	Yes
Message-id	<1363442374.27.0.764896075169.issue17439@psf.upfronthosting.co.za>
In-reply-to

Content
When Python 2.x compares ordinary string with unicode, it tries to convert the former, and shows an error message if the conversion fails. Attached example with Russian strings gives the following: russian.py:11: UnicodeWarning: Unicode equal comparison failed to convert both arguments to Unicode - interpreting them as being unequal print(nonu2 == ustr2) This message is missing information about what source encoding Python used for the conversion. russian.py is encoded in UTF-8, so this information at least will give a hint what encoding is expected. A little different question. As you may see, russian.py has a coding header set to UTF-8. When Python parses source files, it reads and stores string literals encountered in this file. Are those literals linked to this source file? And does it store this coding information somewhere? Because if it does, then conversion can be automatically possible without side effects. And the error message above could contain reference to encoding and explanation where this coding information was taken from (i.e. from file header). When Python evaluates strings from stdin file, they also have some encoding. Is this problem solved for this case? Where Python stores encoding for stdin input?

When Python 2.x compares ordinary string with unicode, it tries to convert the former, and shows an error message if the conversion fails. Attached example with Russian strings gives the following:

russian.py:11: UnicodeWarning: Unicode equal comparison failed to convert both arguments to Unicode - interpreting them as being unequal
  print(nonu2 == ustr2)

This message is missing information about what source encoding Python used for the conversion. russian.py is encoded in UTF-8, so this information at least will give a hint what encoding is expected.


A little different question. As you may see, russian.py has a coding header set to UTF-8. When Python parses source files, it reads and stores string literals encountered in this file. Are those literals linked to this source file? And does it store this coding information somewhere? Because if it does, then conversion can be automatically possible without side effects. And the error message above could contain reference to encoding and explanation where this coding information was taken from (i.e. from file header).

When Python evaluates strings from stdin file, they also have some encoding. Is this problem solved for this case? Where Python stores encoding for stdin input?

History
Date	User	Action	Args
2013-03-16 13:59:34	techtonik	set	recipients: + techtonik, ezio.melotti
2013-03-16 13:59:34	techtonik	set	messageid: <1363442374.27.0.764896075169.issue17439@psf.upfronthosting.co.za>
2013-03-16 13:59:34	techtonik	link	issue17439 messages
2013-03-16 13:59:33	techtonik	create