classification
Title: Definition of a "character" is wrong
Type: enhancement Stage: resolved
Components: Documentation, Unicode Versions: Python 3.1, Python 3.2, Python 2.7, Python 2.6
process
Status: closed Resolution: duplicate
Dependencies: Superseder: Issues in Unicode HOWTO
View: 20906
Assigned To: docs@python Nosy List: BreamoreBoy, Rhamphoryncus, ajaksu2, docs@python, ezio.melotti, georg.brandl, lemburg, loewis
Priority: normal Keywords:

Created on 2006-10-20 10:13 by Rhamphoryncus, last changed 2014-07-06 19:35 by ezio.melotti. This issue is now closed.

Messages (9)
msg61023 - (view) Author: Adam Olsen (Rhamphoryncus) Date: 2006-10-20 10:13
Python's definition of a character does not match that
of Unicode.  Python's documentation should, at a
minimum, explain how python definition compares to
Unicode's definition of a code unit, code point, glyph,
grapheme cluster, or character.

Unicode's definition of a character can be found here:
http://unicode.org/reports/tr17/

Python seems to use the Code Units option given here:
http://www.unicode.org/faq/char_combmark.html#7
msg61024 - (view) Author: Martin v. Löwis (loewis) * (Python committer) Date: 2006-10-20 15:40
Logged In: YES 
user_id=21627

The Python string type is not at all Unicode compliant, so I
don't see a need to use Unicode terminology to explain it.
msg61025 - (view) Author: Adam Olsen (Rhamphoryncus) Date: 2006-10-20 23:35
Logged In: YES 
user_id=12364

Sorry, I wasn't clear.  I only intended this to be about the
unicode type.
msg61026 - (view) Author: Martin v. Löwis (loewis) * (Python committer) Date: 2006-10-21 07:11
Logged In: YES 
user_id=21627

Ok. Can you come up with a patch?
msg61027 - (view) Author: Adam Olsen (Rhamphoryncus) Date: 2006-10-21 10:00
Logged In: YES 
user_id=12364

Not at the moment.
msg84524 - (view) Author: Daniel Diniz (ajaksu2) (Python triager) Date: 2009-03-30 07:20
Anyone brave enough can find the mentioned definitions in the thread
below. Reading all of it is necessary, as there are some contradictory
quotes and interpretations before an agreement is (sort of) achieved.

http://mail.python.org/pipermail/python-dev/2008-July/080886.html
msg84554 - (view) Author: Marc-Andre Lemburg (lemburg) * (Python committer) Date: 2009-03-30 14:42
See this talk for an explanation of the various Unicode terms and how
they map to Python's implementation:

http://www.egenix.com/library/presentations/#PythonAndUnicode

Also note that the Unicode standard has evolved a lot since Unicode
support was added to Python in late 1999. Some terms used in Python
differ from those used in Unicode 5.0 or have been defined in more
strict ways than were common at the time.

And finally: don't forget that Python provides ways of *working* with
Unicode, i.e. it does not guarantee that a Python Unicode string always
contains all code points required for e.g. UTF-16. It is well possible
to store lone surrogates and invalid or unassigned code points in a
Python Unicode string.
msg112466 - (view) Author: Marc-Andre Lemburg (lemburg) * (Python committer) Date: 2010-08-02 13:36
Without patch, I don't see how this issue can be moved forward.

Adding a list of such Unicode term definitions would at best cause additional confusion and only address people knowledgable in the Unicode field.

Note that Python's use of code units and code points matches those of the Unicode standard in most respects. Glyphs and all higher-level definitions are out-of-scope for Python.
msg214532 - (view) Author: Mark Lawrence (BreamoreBoy) * Date: 2014-03-23 01:10
Can this be tied in with the work being done on the unicode howto #20906?
History
Date User Action Args
2014-07-06 19:35:50ezio.melottisetstatus: languishing -> closed
superseder: Issues in Unicode HOWTO
resolution: duplicate
stage: needs patch -> resolved
2014-03-23 01:10:31BreamoreBoysetnosy: + BreamoreBoy
messages: + msg214532
2010-08-02 13:39:40eric.araujosetassignee: docs@python
stage: needs patch

nosy: + docs@python
versions: + Python 2.6, Python 3.2
2010-08-02 13:36:47lemburgsetstatus: open -> languishing

messages: + msg112466
2010-07-29 15:31:48georg.brandlsetassignee: georg.brandl -> (no value)
2009-07-05 19:35:08ezio.melottisetnosy: + ezio.melotti
2009-03-30 14:42:01lemburgsetnosy: + lemburg
messages: + msg84554
2009-03-30 07:20:16ajaksu2setassignee: georg.brandl
type: enhancement
components: + Unicode
versions: + Python 3.1, Python 2.7
nosy: + georg.brandl, ajaksu2

messages: + msg84524
2006-10-20 10:13:07rhamphoryncus.historiccreate