Message 202520 - Python tracker

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author	lemburg
Recipients	lemburg, martin.panter, ncoghlan
Date	2013-11-10.11:40:06
SpamBayes Score	-1.0
Marked as misclassified	Yes
Message-id	<527F7086.1080708@egenix.com>
In-reply-to	<1384075247.27.0.0100141647279.issue19543@psf.upfronthosting.co.za>

Content
On 10.11.2013 10:20, Nick Coghlan wrote: > > The long discussion in issue 7475 and some subsequent discussions I had with Armin Ronacher have made it clear to me that the key distinction between the codec systems in Python 2 and Python 3 is the following differences in type signatures of various operations: > > Python 2 (8 bit str): > > codecs module: object <-> object > convenience methods: basestring <-> basestring > available codecs: unicode <-> str, str <-> str, unicode <-> unicode > > Python 3 (Unicode str): > > codecs module: object <-> object > convenience methods: str <-> bytes > available codecs: str <-> bytes, bytes <-> bytes, str <-> str > > The significant distinction is the fact that, in Python 2, the convenience methods covered all standard library codecs, but for Python 3, the codecs module needs to be used directly for the bytes <-> bytes codecs and the one str <-> str codec (since those codecs no longer satisfy the constraints of the text model related convenience methods). Please remember that the codec sub-system is extensible. It's easily possible to add more codecs via registered codec search functions. Whatever you add as warning has to be aware of the fact that there may be codecs in the system that are not part of the stdlib and which can potentially implement codecs that use other type combinations that the ones you listed above.

On 10.11.2013 10:20, Nick Coghlan wrote:
> 
> The long discussion in issue 7475 and some subsequent discussions I had with Armin Ronacher have made it clear to me that the key distinction between the codec systems in Python 2 and Python 3 is the following differences in type signatures of various operations:
> 
> Python 2 (8 bit str):
> 
>     codecs module: object <-> object
>     convenience methods: basestring <-> basestring
>     available codecs: unicode <-> str, str <-> str, unicode <-> unicode
> 
> Python 3 (Unicode str):
> 
>     codecs module: object <-> object
>     convenience methods: str <-> bytes
>     available codecs: str <-> bytes, bytes <-> bytes, str <-> str
> 
> The significant distinction is the fact that, in Python 2, the convenience methods covered all standard library codecs, but for Python 3, the codecs module needs to be used directly for the bytes <-> bytes codecs and the one str <-> str codec (since those codecs no longer satisfy the constraints of the text model related convenience methods).

Please remember that the codec sub-system is extensible. It's
easily possible to add more codecs via registered codec
search functions.

Whatever you add as warning has to be aware of the fact that
there may be codecs in the system that are not part of the
stdlib and which can potentially implement codecs that use
other type combinations that the ones you listed above.

History
Date	User	Action	Args
2013-11-10 11:40:06	lemburg	set	recipients: + lemburg, ncoghlan, martin.panter
2013-11-10 11:40:06	lemburg	link	issue19543 messages
2013-11-10 11:40:06	lemburg	create