This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author lemburg
Recipients lemburg, pitrou, vstinner
Date 2010-06-07.21:25:02
SpamBayes Score 0.004720101
Marked as misclassified No
Message-id <4C0D63AC.4060705@egenix.com>
In-reply-to <1275848637.97.0.606582513318.issue8922@psf.upfronthosting.co.za>
Content
STINNER Victor wrote:
> 
> New submission from STINNER Victor <victor.stinner@haypocalc.com>:
> 
> PyUnicode_Decode() and PyUnicode_AsEncodedString() calls directly builtin decoders/encoders for some known encodings (eg. "utf-8"), instead of using the slow path (call PyCodec_Decode() / PyCodec_Encode()). 
> 
> PyUnicode_Decode() does normalize the encoding name: convert to lower and replace "_" by "-", as normalizestring() does. But PyUnicode_AsEncodedString() doesn't normalize the encoding name, it just use strcmp(). PyUnicode_Decode() has a shortcut for ISO-8859-1, whereas PyUnicode_AsEncodedString() doesn't (only for "latin-1").
> 
> Attached patch creates a subfunction (static) normalize_encoding(), use it in PyUnicode_Decode() and PyUnicode_AsEncodedString(), and adds a shortcut for ISO-8859-1 to PyUnicode_AsEncodedString().

The normalization in PyUnicode_Decode() must have been added to
Python3 only. It is not present in Python2.

I'm not sure whether it's a good idea to extend this further:
the shortcuts were meant for Python internal use only. Python
itself and it's stdlib should only use the shortcut names
for the resp. special encodings and no variants.

Dealing with variants and normalization is left to the encodings
package and its alias machinery.

Since the Python stdlib and the core already mostly use
the shortcut names, adding normalization won't buy us much.

Note that your change has also made it impossible for the
compiler to do loop unrolling - there's not upper limit
on the size of lower anymore.

In terms of coding style, "static" should go on a separate line.
History
Date User Action Args
2010-06-07 21:25:04lemburgsetrecipients: + lemburg, pitrou, vstinner
2010-06-07 21:25:03lemburglinkissue8922 messages
2010-06-07 21:25:02lemburgcreate