This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author gnosis
Recipients
Date 2003-07-03.01:52:58
SpamBayes Score
Marked as misclassified
Message-id
In-reply-to
Content
The alleged codepoints unichr(0xFFFE) and
unichr(0xFFFF) are not unicode characters.  This document:

  http://www.unicode.org/charts/PDF/UFFF0.pdf

Contains:

  Noncharacters
  These codes are intended for process internal uses, but
  are not permitted for interchange.

  FFFE !<not a character>
  ¨ the value FFFE !is guaranteed not to be
    a Unicode character at all
  ¨ may be used to detect byte order by
    contrast with FEFF which is a character
    FEFF zero width no-break space

  FFFF !<not a character>
  ¨ the value FFFF !is guaranteed not to be
    a Unicode character at all

In particular, an XML document that contains such an
alleged unicode entity in not well-formed.

All unicode-aware versions of Python threat these
codepoints in the same manner as other codepoints, e.g.
both unichr(0xFFFE) and u'\uffff' pass without complaint.

I believe the correct behavior would be for Python to
raise an exception, or at least a warning, on access to
these spurious characters.

History
Date User Action Args
2007-08-23 14:14:30adminlinkissue765036 messages
2007-08-23 14:14:30admincreate