Message 153606 - Python tracker

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author	terry.reedy
Recipients	Jim.Jewett, cvrebert, docs@python, eli.bendersky, eric.araujo, ezio.melotti, flox, giampaolo.rodola, nadeem.vawda, ncoghlan, paul.moore, pitrou, terry.reedy, tshepang, vstinner
Date	2012-02-17.22:25:23
SpamBayes Score	0.0035586762
Marked as misclassified	No
Message-id	<1329517527.26.0.827944331099.issue13997@psf.upfronthosting.co.za>
In-reply-to

Content
I agree with no new builtin and appreciate that being taken off the table. I think the place is the Unicode How-to. I think that document should be renamed Encodings and Unicode How-to. The reasons are 1) one has to first understand the concept of encoding characters and text as numbers, and 2) this issue (and the python-ideas discussion) is not about Unicode, but about using pre- (and non-)Unicode encodings with Python3's bytes and string types, and how that differs in Python3 versus using Python2's unicode and string types. If only Unicode encodings were used, with utf-8 dominant on the Internet (and it is now most common for web pages), the problems of concern here would not exist. Learning about Unicode would mean learning about code units versus codepoints, normal versus surrogate chars, BMP versus extended chars (all of which are non-issues in wide builds and Py 3.3), 256-char planes, BOMs, surrogates, normalization forms, and character properties. While sometimes useful, these subjects are not the issue here.

I agree with no new builtin and appreciate that being taken off the table.

I think the place is the Unicode How-to. I think that document should be renamed Encodings and Unicode How-to. The reasons are 1) one has to first understand the concept of encoding characters and text as numbers, and 2) this issue (and the python-ideas discussion) is not about Unicode, but about using pre- (and non-)Unicode encodings with Python3's bytes and string types, and how that differs in Python3 versus using Python2's unicode and string types. If only Unicode encodings were used, with utf-8 dominant on the Internet (and it is now most common for web pages), the problems of concern here would not exist.

Learning about Unicode would mean learning about code units versus codepoints, normal versus surrogate chars, BMP versus extended chars (all of which are non-issues in wide builds and Py 3.3), 256-char planes, BOMs, surrogates, normalization forms, and character properties. While sometimes useful, these subjects are not the issue here.

History
Date	User	Action	Args
2012-02-17 22:25:27	terry.reedy	set	recipients: + terry.reedy, paul.moore, ncoghlan, pitrou, vstinner, giampaolo.rodola, nadeem.vawda, ezio.melotti, eric.araujo, eli.bendersky, cvrebert, flox, docs@python, tshepang, Jim.Jewett
2012-02-17 22:25:27	terry.reedy	set	messageid: <1329517527.26.0.827944331099.issue13997@psf.upfronthosting.co.za>
2012-02-17 22:25:24	terry.reedy	link	issue13997 messages
2012-02-17 22:25:23	terry.reedy	create