Message 203742 - Python tracker

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author	lemburg
Recipients	doerwalter, ezio.melotti, lemburg, ncoghlan, serhiy.storchaka, vstinner
Date	2013-11-22.12:09:27
SpamBayes Score	-1.0
Marked as misclassified	Yes
Message-id	<528F496F.80503@egenix.com>
In-reply-to	<1385120605.25.0.594725951158.issue19619@psf.upfronthosting.co.za>

Content
On 22.11.2013 12:43, STINNER Victor wrote: > > STINNER Victor added the comment: > > If _is_text_encoding may change in Python 3.5, you should add a comment to warn users to not use it and explain its purpose, maybe with a reference to this issue. +1 > -- > > We have talking about a very few codecs: > > * base64: bytes => bytes > * bz2: bytes => bytes > * hex: bytes => bytes; decode supports also ASCII string (str) => bytes > * quopri: bytes => bytes > * rot_13: str => str > * uu: bytes => bytes > * zlib: bytes => bytes > > I suppose that supporting ASCII string input to the hex decoder is a border effect of its implementation. I don't know if it is expected for the codec. > > If we simplify the hex decoder to reject str types, all these codecs would have simply one type: same input and output type. Anyway, if you want something based on types, the special case for the hex decoder cannot be expressed with a type nor ABC. "ASCII string" is not a type. > > So instead of _is_text_encoding=False could be transform=bytes or transform=str. (I don't care of the name: transform_type, type, codec_type, data_type, etc.) > > I know that bytes is not exact: bytearray, memoryview and any bytes-like object is accepted, but it is a probably enough for now. I think it's better to go with something that's explicitly internal now than to fix a public API in form of a constructor parameter this late in the release process. For 3.5 it may make sense to declare a few codec feature flags which would then make lookups such as the one done for the blacklist easier to implement and faster to check as well. Such flags could provide introspection at a higher level than what would be possible with type mappings (even though I still like the idea of adding those to CodecInfo at some point). One possible use for such flags would be to declare whether a codec is reversible or not - in other words, whether .decode(.encode(x)) works for all possible inputs x. This flag could then be used to quickly check whether a codec would fail on a Unicode str which has non-Latin-1 code points or to create a list of valid encodings for certain applications, e.g. a list which only contains reversible Unicode encodings such as the UTF ones. Anyway: Thanks to Nick for implementing this, to Serhiy for the black list idea and Victor for the attribute idea :-)

On 22.11.2013 12:43, STINNER Victor wrote:
> 
> STINNER Victor added the comment:
> 
> If _is_text_encoding may change in Python 3.5, you should add a comment to warn users to not use it and explain its purpose, maybe with a reference to this issue.

+1

> --
> 
> We have talking about a very few codecs:
> 
> * base64: bytes => bytes
> * bz2: bytes => bytes
> * hex: bytes => bytes; decode supports also ASCII string (str) => bytes
> * quopri: bytes => bytes
> * rot_13: str => str
> * uu: bytes => bytes
> * zlib: bytes => bytes
> 
> I suppose that supporting ASCII string input to the hex decoder is a border effect of its implementation. I don't know if it is expected *for the codec*.
> 
> If we simplify the hex decoder to reject str types, all these codecs would have simply one type: same input and output type. Anyway, if you want something based on types, the special case for the hex decoder cannot be expressed with a type nor ABC. "ASCII string" is not a type.
> 
> So instead of  _is_text_encoding=False could be transform=bytes or transform=str. (I don't care of the name: transform_type, type, codec_type, data_type, etc.)
> 
> I know that bytes is not exact: bytearray, memoryview and any bytes-like object is accepted, but it is a probably enough for now.

I think it's better to go with something that's explicitly internal
now than to fix a public API in form of a constructor parameter
this late in the release process.

For 3.5 it may make sense to declare a few codec feature flags which
would then make lookups such as the one done for the blacklist easier
to implement and faster to check as well.

Such flags could provide introspection at a higher level than what
would be possible with type mappings (even though I still like the
idea of adding those to CodecInfo at some point).

One possible use for such flags would be to declare whether a
codec is reversible or not - in other words, whether .decode(.encode(x))
works for all possible inputs x. This flag could then be used to
quickly check whether a codec would fail on a Unicode str which
has non-Latin-1 code points or to create a list of valid encodings
for certain applications, e.g. a list which only contains reversible
Unicode encodings such as the UTF ones.

Anyway: Thanks to Nick for implementing this, to Serhiy for the black
list idea and Victor for the attribute idea :-)

History
Date	User	Action	Args
2013-11-22 12:09:27	lemburg	set	recipients: + lemburg, doerwalter, ncoghlan, vstinner, ezio.melotti, serhiy.storchaka
2013-11-22 12:09:27	lemburg	link	issue19619 messages
2013-11-22 12:09:27	lemburg	create