Message 203029 - Python tracker

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author	lemburg
Recipients	doerwalter, ezio.melotti, lemburg, ncoghlan, serhiy.storchaka, vstinner
Date	2013-11-16.12:01:26
SpamBayes Score	-1.0
Marked as misclassified	Yes
Message-id	<52875E93.4020803@egenix.com>
In-reply-to	<1384593411.8.0.669225419121.issue19619@psf.upfronthosting.co.za>

Content
On 16.11.2013 10:16, Nick Coghlan wrote: > > Nick Coghlan added the comment: > > The full input/output type specifications can't be implemented sensibly without also defining at least a ByteSequence ABC. While I think it's a good idea in the long run, there's no feasible way to design such a system in the time remaining before the Python 3.4 feature freeze. > > However, we could do something much simpler as a blacklist API: > > def is_unicode_codec(name): > """Returns true if this is the name of a known Unicode text encoding""" > > def set_as_non_unicode(name): > """Indicates that the named codec is not a Unicode codec""" > > And then the codecs module would just maintain a set internally of all the names explicitly flagged as non-unicode. That doesn't look flexible enough to cover the various different input/output types. > Such an API remains useful even if the input/output type support is added in Python 3.5 (since "codecs.is_unicode_codec(name)" is a bit simpler thing to explain than the exact type restrictions). > > Alternatively, implementing just the "encodes_to" and "decodes_to" attributes would be enough for str.encode, bytes.decode and bytearray.decode to reject known bad encodings early, leaving the input type checks to the codecs for now (since it is correctly defining "encode_from" and "decode_from" for many stdlib codecs that would need the ByteSequence ABC). The original idea we discussed some time ago was to add a mapping or list attribute to CodecInfo which lists all supported type combinations. The codecs module could then make this information available through a simple type check API (which also caches the lookups for performance reasons), e.g. codecs.types_supported(encoding, input_type, output_type) -> boolean. Returns True/False depending on whether the codec for encoding supports the given input and output types. Usage: if not codecs.types_support(encoding, str, bytes): # not a Unicode -> 8-bit codec ...

On 16.11.2013 10:16, Nick Coghlan wrote:
> 
> Nick Coghlan added the comment:
> 
> The full input/output type specifications can't be implemented sensibly without also defining at least a ByteSequence ABC. While I think it's a good idea in the long run, there's no feasible way to design such a system in the time remaining before the Python 3.4 feature freeze.
> 
> However, we could do something much simpler as a blacklist API:
> 
>     def is_unicode_codec(name):
>         """Returns true if this is the name of a known Unicode text encoding"""
> 
>     def set_as_non_unicode(name):
>         """Indicates that the named codec is not a Unicode codec"""
> 
> And then the codecs module would just maintain a set internally of all the names explicitly flagged as non-unicode.

That doesn't look flexible enough to cover the various different
input/output types.

> Such an API remains useful even if the input/output type support is added in Python 3.5 (since "codecs.is_unicode_codec(name)" is a bit simpler thing to explain than the exact type restrictions).
> 
> Alternatively, implementing just the "encodes_to" and "decodes_to" attributes would be enough for str.encode, bytes.decode and bytearray.decode to reject known bad encodings early, leaving the input type checks to the codecs for now (since it is correctly defining "encode_from" and "decode_from" for many stdlib codecs that would need the ByteSequence ABC).

The original idea we discussed some time ago was to add a mapping
or list attribute to CodecInfo which lists all supported type
combinations.

The codecs module could then make this information available through
a simple type check API (which also caches the lookups for performance
reasons), e.g.

codecs.types_supported(encoding, input_type, output_type) -> boolean.

    Returns True/False depending on whether the codec for
    encoding supports the given input and output types.

Usage:

if not codecs.types_support(encoding, str, bytes):
    # not a Unicode -> 8-bit codec
    ...

History
Date	User	Action	Args
2013-11-16 12:01:26	lemburg	set	recipients: + lemburg, doerwalter, ncoghlan, vstinner, ezio.melotti, serhiy.storchaka
2013-11-16 12:01:26	lemburg	link	issue19619 messages
2013-11-16 12:01:26	lemburg	create