Message 165435 - Python tracker

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author	ncoghlan
Recipients	barry, belopolsky, benjamin.peterson, cben, eric.araujo, flox, georg.brandl, gvanrossum, jcea, lemburg, loewis, ncoghlan, petri.lehtinen, ssbarnea, vstinner
Date	2012-07-14.07:36:41
SpamBayes Score	-1.0
Marked as misclassified	Yes
Message-id	<1342251402.78.0.0152149184069.issue7475@psf.upfronthosting.co.za>
In-reply-to

Content
FWIW it's, I've been thinking further about this recently and I think implementing this feature as builtin methods is the wrong way to approach it. Instead, I propose the addition of codecs.encode and codecs.decode methods that are type neutral (leaving any type checks entirely up to the codecs themselves), while the str.encode and bytes.decode methods retain their current strict test model related type restrictions. Also, I now think my previous proposal for nice error messages was massively over-engineered. A much simpler approach is to just replace the status quo: >>> "".encode("bz2_codec") Traceback (most recent call last): File "<stdin>", line 1, in <module> File "/home/ncoghlan/devel/py3k/Lib/encodings/bz2_codec.py", line 17, in bz2_encode return (bz2.compress(input), len(input)) File "/home/ncoghlan/devel/py3k/Lib/bz2.py", line 443, in compress return comp.compress(data) + comp.flush() TypeError: 'str' does not support the buffer interface with a better error with more context like: UnicodeEncodeError: encoding='bz2_codec', errors='strict', codec_error="TypeError: 'str' does not support the buffer interface" A similar change would be straightforward on the decoding side. This would be a good use case for __cause__, but the codec error should still be included in the string representation.

FWIW it's, I've been thinking further about this recently and I think implementing this feature as builtin methods is the wrong way to approach it.

Instead, I propose the addition of codecs.encode and codecs.decode methods that are type neutral (leaving any type checks entirely up to the codecs themselves), while the str.encode and bytes.decode methods retain their current strict test model related type restrictions.

Also, I now think my previous proposal for nice error messages was massively over-engineered. A much simpler approach is to just replace the status quo:

>>> "".encode("bz2_codec")
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/ncoghlan/devel/py3k/Lib/encodings/bz2_codec.py", line 17, in bz2_encode
    return (bz2.compress(input), len(input))
  File "/home/ncoghlan/devel/py3k/Lib/bz2.py", line 443, in compress
    return comp.compress(data) + comp.flush()
TypeError: 'str' does not support the buffer interface

with a better error with more context like:

UnicodeEncodeError: encoding='bz2_codec', errors='strict', codec_error="TypeError: 'str' does not support the buffer interface"

A similar change would be straightforward on the decoding side.

This would be a good use case for __cause__, but the codec error should still be included in the string representation.

History
Date	User	Action	Args
2012-07-14 07:36:43	ncoghlan	set	recipients: + ncoghlan, lemburg, gvanrossum, loewis, barry, georg.brandl, jcea, cben, belopolsky, vstinner, benjamin.peterson, eric.araujo, ssbarnea, flox, petri.lehtinen
2012-07-14 07:36:42	ncoghlan	set	messageid: <1342251402.78.0.0152149184069.issue7475@psf.upfronthosting.co.za>
2012-07-14 07:36:42	ncoghlan	link	issue7475 messages
2012-07-14 07:36:41	ncoghlan	create