Author ncoghlan
Recipients docs@python, gvanrossum, ncoghlan, pitrou, vstinner
Date 2014-06-05.12:34:07
SpamBayes Score -1.0
Marked as misclassified Yes
Message-id <1401971648.18.0.0439993458759.issue21667@psf.upfronthosting.co.za>
In-reply-to
Content
If someone doesn't understand what "Unicode code point" means, that's going to be the least of their problems when it comes to implementing a conformant Python implementation. We could link to http://unicode.org/glossary/#code_point, but that doesn't really add much beyond "value from 0 to 0x10FFFF". If you try to dive into the formal Unicode spec instead, you end up in a twisty maze of definitions of things that are all closely related, but generally not the same thing (code positions, code units, code spaces, abstract characters, glyphs, graphemes, etc).

The main advantage of using the more formal "code point" over the informal "character" is that it discourages people from assuming they know what they are (with the usual mistaken assumption being that Unicode code points correspond directly to glyphs the way ASCII and Extended ASCII printable characters correspond to their glyphs). The rest of the paragraph then provides the mechanical details of the meaningful interpretations of them in Python (as length 1 strings and as numbers in a particular range) and the operations for translating between those two formats (chr and ord).

Fair point about the slicing - it may be better to just talk about indexing.
History
Date User Action Args
2014-06-05 12:34:08ncoghlansetrecipients: + ncoghlan, gvanrossum, pitrou, vstinner, docs@python
2014-06-05 12:34:08ncoghlansetmessageid: <1401971648.18.0.0439993458759.issue21667@psf.upfronthosting.co.za>
2014-06-05 12:34:08ncoghlanlinkissue21667 messages
2014-06-05 12:34:07ncoghlancreate