This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author lemburg
Recipients benspiller, docs@python, ezio.melotti, josh.r, lemburg, serhiy.storchaka, steven.daprano, terry.reedy
Date 2016-05-20.08:01:11
SpamBayes Score -1.0
Marked as misclassified Yes
Message-id <1463731271.39.0.897765908373.issue26369@psf.upfronthosting.co.za>
In-reply-to
Content
Ben, the methods on stings and Unicode objects in Python 2.x are direct interfaces to the underlying codecs. The codecs can handle any number of input and output types, so there are some which only work on 8-bit strings (bytes) and others which take Unicode as input.

As a result, you sometimes see errors due to the conversion of an 8-bit string to Unicode (in the case, where the codec expects a Unicode input).

As example, take the UTF-8 codec. This expects a Unicode input when decoding, so when you pass in an 8-bit string, Python will convert this to Unicode using the default encoding (which is normally set to 'ascii') and then applies the codec operation.

When the 8-bit string is plain ASCII this works great. If not, chances are high that you'll run into a Unicode error.

Now, in Python 2.x you can change the default encoding to either make this work by assuming that all your 8-bit strings are UTF-8 (set it to 'utf-8' in sitecustomize.py), or you can disable the automatic conversion altogether by setting the default encoding to 'unknown', which is a codec specifically created for this purpose. The latter will also raise an exception when attempting to convert an 8-bit string to Unicode - similar to what Python 3 does, except that the error type is different.

Hope that helps.
History
Date User Action Args
2016-05-20 08:01:11lemburgsetrecipients: + lemburg, terry.reedy, ezio.melotti, steven.daprano, docs@python, serhiy.storchaka, josh.r, benspiller
2016-05-20 08:01:11lemburgsetmessageid: <1463731271.39.0.897765908373.issue26369@psf.upfronthosting.co.za>
2016-05-20 08:01:11lemburglinkissue26369 messages
2016-05-20 08:01:11lemburgcreate