Author ncoghlan
Recipients ncoghlan
Date 2013-11-10.09:20:46
SpamBayes Score -1.0
Marked as misclassified Yes
Message-id <1384075247.27.0.0100141647279.issue19543@psf.upfronthosting.co.za>
In-reply-to
Content
The long discussion in issue 7475 and some subsequent discussions I had with Armin Ronacher have made it clear to me that the key distinction between the codec systems in Python 2 and Python 3 is the following differences in type signatures of various operations:

Python 2 (8 bit str):

    codecs module: object <-> object
    convenience methods: basestring <-> basestring
    available codecs: unicode <-> str, str <-> str, unicode <-> unicode

Python 3 (Unicode str):

    codecs module: object <-> object
    convenience methods: str <-> bytes
    available codecs: str <-> bytes, bytes <-> bytes, str <-> str

The significant distinction is the fact that, in Python 2, the convenience methods covered all standard library codecs, but for Python 3, the codecs module needs to be used directly for the bytes <-> bytes codecs and the one str <-> str codec (since those codecs no longer satisfy the constraints of the text model related convenience methods).

After attempting to implement a 2to3 fixer for these non-Unicode codecs in issue 17823, I realised that wouldn't really work properly (since it's a data driven error based on the behaviour of the named codec), so I'm rejecting that proposal and replacing it with this one for additional Py3k warnings in Python 2.7.7.

My proposal is to take the following cases and make them produce warnings under Python 2.7.7 when Py3k warnings are enabled (remember, these are the 2.7 types, not the 3.x ones):

- the str.encode method is called (redirect to codecs.encode to handle arbitrary input types in a forward compatible way)

- the unicode.decode method is called (redirect to codecs.decode to handle arbitrary input types)

- PyUnicode_AsEncodedString produces something other than an 8-bit string (redirect to codecs.encode for arbitrary output types)

- PyUnicode_Decode produces something other than a unicode string (redirect to codecs.decode for arbitrary output types)

For the latter two cases, issue 17828 includes updates to the Python 3 error messages to similarly redirect to the convenience functions in the codecs module. However, the removed convenience methods will continue to simply trigger AttributeError in Python 3 with no special casing.
History
Date User Action Args
2013-11-10 09:20:47ncoghlansetrecipients: + ncoghlan
2013-11-10 09:20:47ncoghlansetmessageid: <1384075247.27.0.0100141647279.issue19543@psf.upfronthosting.co.za>
2013-11-10 09:20:47ncoghlanlinkissue19543 messages
2013-11-10 09:20:46ncoghlancreate