This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author vstinner
Recipients belopolsky, eric.araujo, ezio.melotti, jcea, lemburg, sdaoden, vstinner
Date 2011-02-24.23:56:19
SpamBayes Score 4.2182924e-12
Marked as misclassified No
Message-id <1298591781.45.0.843671847574.issue11303@psf.upfronthosting.co.za>
In-reply-to
Content
>> That won't work, Victor, since it makes invalid encoding
>> names valid, e.g. 'utf(=)-8'.

> .. but this *is* valid: ...

Ah yes, it's because of encodings.normalize_encoding(). It's funny: we have 3 functions to normalize an encoding name, and each function does something else :-) E.g. encodings.normalize_encoding() doesn't replace non-ASCII letters, and don't convert to lowercase.

more_aggressive_normalization.patch changes all of the 3 normalization functions and add tests on encodings.normalize_encoding().

I think that speed and backward compatibility is more important than conforming to IANA or other standards.

Even if "~~ utf#8 ~~" is ugly, I don't think that it really matter that we accept it.

--

If you don't want to touch the normalization functions and just add more aliases in C fast-paths: we should also add utf8, utf16 and utf32.

Use of "utf8" in Python: random.Random.seed(), smtpd.SMTPChannel.collect_incoming_data(), tarfile, multiprocessing.connection (xml serialization)

PS: On error, UTF-8 decoder raises a UnicodeDecodeError with "utf8" as the encoding name :-)
History
Date User Action Args
2011-02-24 23:56:21vstinnersetrecipients: + vstinner, lemburg, jcea, belopolsky, ezio.melotti, eric.araujo, sdaoden
2011-02-24 23:56:21vstinnersetmessageid: <1298591781.45.0.843671847574.issue11303@psf.upfronthosting.co.za>
2011-02-24 23:56:19vstinnerlinkissue11303 messages
2011-02-24 23:56:19vstinnercreate