Message 203037 - Python tracker

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author	ncoghlan
Recipients	doerwalter, ezio.melotti, lemburg, ncoghlan, serhiy.storchaka, vstinner
Date	2013-11-16.12:44:33
SpamBayes Score	-1.0
Marked as misclassified	Yes
Message-id	<1384605873.95.0.519157543259.issue19619@psf.upfronthosting.co.za>
In-reply-to

Content
Now that I understand Victor's proposal better, I actually agree with it, I just think the attribute names need to be "encodes_to" and "decodes_to". With Victor's proposal, input validity checks (including type checks) would remain the responsibility of the codec itself. What the new attributes would enable is output type checks without having to perform the encoding or decoding operation first. codecs will be free to leave these as None to retain the current behaviour of "try it and see". The specific field names "input_type" and "output_type" aren't accurate, since the acceptable input types for encoding or decoding are likely to be more permissive than the specific output type for the other operation. Most of the binary codecs, for example, accept any bytes-like object as input, but produce bytes objects as output for both encoding and decoding. For Unicode encodings, encoding is strictly str->bytes, but decoding is generally the more permissive bytes-like object -> str. I would still suggest providing the following helper function in the codecs module (the name has changed from my earlier suggestion and I now suggest implementing it in terms of Victor's suggestion with more appropriate field names): def is_text_encoding(name): """Returns true if the named encoding is a Unicode text encoding""" info = codecs.lookup(name) return info.encodes_to is bytes and info.decodes_to is str This approach covers all the current stdlib codecs: - the text encodings encode to bytes and decode to str - the binary transforms encode to bytes and also decode to bytes - the lone text transform (rot_13) encodes and decodes to str This approach also makes it possible for a type inference engine (like mypy) to potentially analyse codec use, and could be expanded in 3.5 to offer type checked binary and text transform APIs that filtered codecs appropriately according to their output types.

Now that I understand Victor's proposal better, I actually agree with it, I just think the attribute names need to be "encodes_to" and "decodes_to".

With Victor's proposal, *input* validity checks (including type checks) would remain the responsibility of the codec itself. What the new attributes would enable is *output* type checks *without having to perform the encoding or decoding operation first*. codecs will be free to leave these as None to retain the current behaviour of "try it and see".

The specific field names "input_type" and "output_type" aren't accurate, since the acceptable input types for encoding or decoding are likely to be more permissive than the specific output type for the other operation. Most of the binary codecs, for example, accept any bytes-like object as input, but produce bytes objects as output for both encoding and decoding. For Unicode encodings, encoding is strictly str->bytes, but decoding is generally the more permissive bytes-like object -> str.

I would still suggest providing the following helper function in the codecs module (the name has changed from my earlier suggestion and I now suggest implementing it in terms of Victor's suggestion with more appropriate field names):

    def is_text_encoding(name):
        """Returns true if the named encoding is a Unicode text encoding"""
        info = codecs.lookup(name)
        return info.encodes_to is bytes and info.decodes_to is str

This approach covers all the current stdlib codecs:

- the text encodings encode to bytes and decode to str
- the binary transforms encode to bytes and also decode to bytes
- the lone text transform (rot_13) encodes and decodes to str

This approach also makes it possible for a type inference engine (like mypy) to potentially analyse codec use, and could be expanded in 3.5 to offer type checked binary and text transform APIs that filtered codecs appropriately according to their output types.

History
Date	User	Action	Args
2013-11-16 12:44:33	ncoghlan	set	recipients: + ncoghlan, lemburg, doerwalter, vstinner, ezio.melotti, serhiy.storchaka
2013-11-16 12:44:33	ncoghlan	set	messageid: <1384605873.95.0.519157543259.issue19619@psf.upfronthosting.co.za>
2013-11-16 12:44:33	ncoghlan	link	issue19619 messages
2013-11-16 12:44:33	ncoghlan	create