Author terry.reedy
Recipients Jim.Jewett, cvrebert, docs@python, eli.bendersky, eric.araujo, ezio.melotti, flox, giampaolo.rodola, nadeem.vawda, ncoghlan, paul.moore, pitrou, terry.reedy, tshepang, vstinner
Date 2012-02-17.22:25:23
SpamBayes Score 0.00355868
Marked as misclassified No
Message-id <1329517527.26.0.827944331099.issue13997@psf.upfronthosting.co.za>
In-reply-to
Content
I agree with no new builtin and appreciate that being taken off the table.

I think the place is the Unicode How-to. I think that document should be renamed Encodings and Unicode How-to. The reasons are 1) one has to first understand the concept of encoding characters and text as numbers, and 2) this issue (and the python-ideas discussion) is not about Unicode, but about using pre- (and non-)Unicode encodings with Python3's bytes and string types, and how that differs in Python3 versus using Python2's unicode and string types. If only Unicode encodings were used, with utf-8 dominant on the Internet (and it is now most common for web pages), the problems of concern here would not exist.

Learning about Unicode would mean learning about code units versus codepoints, normal versus surrogate chars, BMP versus extended chars (all of which are non-issues in wide builds and Py 3.3), 256-char planes, BOMs, surrogates, normalization forms, and character properties. While sometimes useful, these subjects are not the issue here.
History
Date User Action Args
2012-02-17 22:25:27terry.reedysetrecipients: + terry.reedy, paul.moore, ncoghlan, pitrou, vstinner, giampaolo.rodola, nadeem.vawda, ezio.melotti, eric.araujo, eli.bendersky, cvrebert, flox, docs@python, tshepang, Jim.Jewett
2012-02-17 22:25:27terry.reedysetmessageid: <1329517527.26.0.827944331099.issue13997@psf.upfronthosting.co.za>
2012-02-17 22:25:24terry.reedylinkissue13997 messages
2012-02-17 22:25:23terry.reedycreate