Message187704
Passing the wrong types to codecs can currently lead to rather confusing exceptions, like:
====================
>>> b"ZXhhbXBsZQ==\n".decode("base64_codec")
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/lib64/python3.2/encodings/base64_codec.py", line 20, in base64_decode
return (base64.decodebytes(input), len(input))
File "/usr/lib64/python3.2/base64.py", line 359, in decodebytes
raise TypeError("expected bytes, not %s" % s.__class__.__name__)
TypeError: expected bytes, not memoryview
====================
>>> codecs.decode("example", "utf8")
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/lib64/python3.2/encodings/utf_8.py", line 16, in decode
return codecs.utf_8_decode(input, errors, True)
TypeError: 'str' does not support the buffer interface
====================
This situation could be improved by having the affected APIs use the exception chaining system to wrap these errors in a more informative exception that also display information on the codec involved. Note that UnicodeEncodeError and UnicodeDecodeError are not appropriate, as those are specific to text encoding operations, while these new wrappers will apply to arbitrary codecs, regardless of whether or not they use the unicode error handlers. Furthermore, for backwards compatibility with existing exception handling, it is probably necessary to limit ourselves to specific exception types and ensure that the wrapper exceptions are subclasses of those types.
These new wrappers would have __cause__ set to the exception raised by the codec, but emit a message more along the lines of the following:
==============
codecs.DecodeTypeError: encoding='utf8', details="TypeError: 'str' does not support the buffer interface"
==============
Wrapping TypeError and ValueError should cover most cases, which would mean four new exception types in the codecs module:
Raised by codecs.decode, bytes.decode and bytearray.decode:
* codecs.DecodeTypeError
* codecs.DecodeValueError
Raised by codecs.encode, str.encode:
* codecs.EncodeTypeError
* codecs.EncodeValueError
Instances of UnicodeError wouldn't be wrapped, since they already contain codec information. |
|
Date |
User |
Action |
Args |
2013-04-24 14:09:58 | ncoghlan | set | recipients:
+ ncoghlan |
2013-04-24 14:09:58 | ncoghlan | set | messageid: <1366812598.92.0.4116744942.issue17828@psf.upfronthosting.co.za> |
2013-04-24 14:09:58 | ncoghlan | link | issue17828 messages |
2013-04-24 14:09:58 | ncoghlan | create | |
|