Message 241763 - Python tracker

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author	martin.panter
Recipients	eric.smith, ezio.melotti, mahmoud, martin.panter, r.david.murray, vstinner
Date	2015-04-22.01:47:42
SpamBayes Score	-1.0
Marked as misclassified	Yes
Message-id	<1429667263.45.0.272108110685.issue24019@psf.upfronthosting.co.za>
In-reply-to

Content
Okay, I was trying to confirm your proposal in Python 3 terms, because in Python 2, str has a different meaning and I was confused. I agree that the existence of the decoding mode is a design bug, so how would you feel about deprecating it, at least in the documentation? I.e. in Python 3, deprecate usage like str(buffer, "utf-8") in favour of buffer.decode("utf-8") or using the codecs module directly. If this was done, it would clearly remove the need for an encoding parameter to str() in all cases. I would be in favour of deprecating the complementary bytes() and bytearray() encoding modes as well. Do you have an example use case in Python 3 that would benefit from always allowing an encoding parameter? I can understand that your to_unicode() function could be useful in Python 2. But in Python 3, byte strings tend to hold raw data that is not necessarily textual at all. There are some places (warts in my opinion) such as the binascii module where ASCII-encoded byte strings are common, but I still don’t think this proposal would be very helpful with that.

Okay, I was trying to confirm your proposal in Python 3 terms, because in Python 2, str has a different meaning and I was confused.

I agree that the existence of the decoding mode is a design bug, so how would you feel about deprecating it, at least in the documentation? I.e. in Python 3, deprecate usage like str(buffer, "utf-8") in favour of buffer.decode("utf-8") or using the codecs module directly. If this was done, it would clearly remove the need for an encoding parameter to str() in all cases. I would be in favour of deprecating the complementary bytes() and bytearray() encoding modes as well.

Do you have an example use case in Python 3 that would benefit from always allowing an encoding parameter? I can understand that your to_unicode() function could be useful in Python 2. But in Python 3, byte strings tend to hold raw data that is not necessarily textual at all. There are some places (warts in my opinion) such as the binascii module where ASCII-encoded byte strings are common, but I still don’t think this proposal would be very helpful with that.

History
Date	User	Action	Args
2015-04-22 01:47:43	martin.panter	set	recipients: + martin.panter, vstinner, eric.smith, ezio.melotti, r.david.murray, mahmoud
2015-04-22 01:47:43	martin.panter	set	messageid: <1429667263.45.0.272108110685.issue24019@psf.upfronthosting.co.za>
2015-04-22 01:47:43	martin.panter	link	issue24019 messages
2015-04-22 01:47:42	martin.panter	create