Issue 17439: insufficient error message for failed unicode conversion

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

This issue has been migrated to GitHub: https://github.com/python/cpython/issues/61641

classification

Title:	insufficient error message for failed unicode conversion
Type:	behavior	Stage:	resolved
Components:	Unicode	Versions:	Python 2.7

process

Status:	closed	Resolution:	not a bug
Dependencies:		Superseder:
Assigned To:		Nosy List:	ezio.melotti, r.david.murray, techtonik
Priority:	normal	Keywords:

Created on 2013-03-16 13:59 by techtonik, last changed 2022-04-11 14:57 by admin. This issue is now closed.

Files
File name	Uploaded	Description	Edit
russian.py	techtonik, 2013-03-16 13:59

Messages (4)
msg184315 - (view)	Author: anatoly techtonik (techtonik)	Date: 2013-03-16 13:59
When Python 2.x compares ordinary string with unicode, it tries to convert the former, and shows an error message if the conversion fails. Attached example with Russian strings gives the following: russian.py:11: UnicodeWarning: Unicode equal comparison failed to convert both arguments to Unicode - interpreting them as being unequal print(nonu2 == ustr2) This message is missing information about what source encoding Python used for the conversion. russian.py is encoded in UTF-8, so this information at least will give a hint what encoding is expected. A little different question. As you may see, russian.py has a coding header set to UTF-8. When Python parses source files, it reads and stores string literals encountered in this file. Are those literals linked to this source file? And does it store this coding information somewhere? Because if it does, then conversion can be automatically possible without side effects. And the error message above could contain reference to encoding and explanation where this coding information was taken from (i.e. from file header). When Python evaluates strings from stdin file, they also have some encoding. Is this problem solved for this case? Where Python stores encoding for stdin input?
msg184326 - (view)	Author: R. David Murray (r.david.murray) *	Date: 2013-03-16 16:03
Python doesn't store the encoding information anywhere. The coding cookie is used to correctly convert the bytes in the file into unicode...otherwise they are just treated as bytes. For the stdin case, the encoding is associated with the input stream, and again you either get unicode or bytes, there is no encoding information that is carried along with the data. So, when the conversion is attempted, there is no encoding information available to add to the error message.
msg184365 - (view)	Author: anatoly techtonik (techtonik)	Date: 2013-03-17 07:33
Ok. Does the data (string literals) has a scope? Does Python know at runtime that a string literal stored in its memory was defined in the input stream or a file?
msg184366 - (view)	Author: R. David Murray (r.david.murray) *	Date: 2013-03-17 09:02
No.

History
Date	User	Action	Args
2022-04-11 14:57:42	admin	set	github: 61641
2013-03-17 09:02:04	r.david.murray	set	messages: + msg184366
2013-03-17 07:33:00	techtonik	set	messages: + msg184365
2013-03-16 16:03:12	r.david.murray	set	status: open -> closed type: behavior nosy: + r.david.murray messages: + msg184326 resolution: not a bug stage: resolved
2013-03-16 13:59:34	techtonik	create