Issue 19159: 2to3 incorrectly converts two parameter unicode() constructor to str()

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

This issue has been migrated to GitHub: https://github.com/python/cpython/issues/63358

classification

Title:	2to3 incorrectly converts two parameter unicode() constructor to str()
Type:	behavior	Stage:	resolved
Components:		Versions:	Python 3.3, Python 3.4, Python 2.7

process

Status:	closed	Resolution:	wont fix
Dependencies:		Superseder:
Assigned To:		Nosy List:	gregory.p.smith, serhiy.storchaka
Priority:	normal	Keywords:

Created on 2013-10-04 00:57 by gregory.p.smith, last changed 2022-04-11 14:57 by admin. This issue is now closed.

Messages (4)
msg198929 - (view)	Author: Gregory P. Smith (gregory.p.smith) *	Date: 2013-10-04 00:57
From a conversion through 2to3: < default_value=unicode("", "utf-8"), --- > default_value=str("", "utf-8"), The Python 2 unicode constructor takes an optional second parameter which is the codec to use to convert when the first parameter is non-unicode. 2to3 should check the parameters on uses of unicode() and if there is a second parameter and the first is explicitly b"" bytes it should turn it into default_value=b"whatever".decode(second_param) if the first is valid utf-8 and the second is "utf-8" (or its other spellings) it should leave it as is and simply become: default_value="thing passed to unicode() that was already utf-8"
msg198936 - (view)	Author: Serhiy Storchaka (serhiy.storchaka) *	Date: 2013-10-04 08:07
This is not a bug, str accepts the encoding argument in Python 3. And in contrast to the decode method it works with arbitrary byte-like objects (i.e. array.array).
msg198966 - (view)	Author: Gregory P. Smith (gregory.p.smith) *	Date: 2013-10-04 21:08
Correct, my characterization above was wrong (I shouldn't write these up without the interpreter right in front of me). What is wrong with the conversion is: unicode("", "utf-8") in python 2.x should become either str(b"", "utf-8") or, better, just "" in Python 3.x. The better version could be done if the codec and value can be represented in the encoding of the output 3.x source code file as is but that optimization is not critical. In order for str() to take a second arg (the codec) the first cannot be a unicode string already: >>> str("foo", "utf-8") Traceback (most recent call last): File "<stdin>", line 1, in <module> TypeError: decoding str is not supported
msg198968 - (view)	Author: Serhiy Storchaka (serhiy.storchaka) *	Date: 2013-10-04 21:36
Just add the "b" prefix to literal string argument of unicode() in Python 2.

History
Date	User	Action	Args
2022-04-11 14:57:51	admin	set	github: 63358
2017-03-07 18:39:15	serhiy.storchaka	set	status: pending -> closed resolution: wont fix stage: needs patch -> resolved
2016-11-28 23:35:56	serhiy.storchaka	set	status: open -> pending
2013-10-04 21:36:24	serhiy.storchaka	set	messages: + msg198968
2013-10-04 21:08:39	gregory.p.smith	set	messages: + msg198966
2013-10-04 08:07:58	serhiy.storchaka	set	nosy: + serhiy.storchaka messages: + msg198936
2013-10-04 00:57:01	gregory.p.smith	create