Message 16831 - Python tracker

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author	gnosis
Recipients
Date	2003-07-03.01:52:58
SpamBayes Score
Marked as misclassified
Message-id
In-reply-to

Content
The alleged codepoints unichr(0xFFFE) and unichr(0xFFFF) are not unicode characters. This document: http://www.unicode.org/charts/PDF/UFFF0.pdf Contains: Noncharacters These codes are intended for process internal uses, but are not permitted for interchange. FFFE !<not a character> ¨ the value FFFE !is guaranteed not to be a Unicode character at all ¨ may be used to detect byte order by contrast with FEFF which is a character FEFF zero width no-break space FFFF !<not a character> ¨ the value FFFF !is guaranteed not to be a Unicode character at all In particular, an XML document that contains such an alleged unicode entity in not well-formed. All unicode-aware versions of Python threat these codepoints in the same manner as other codepoints, e.g. both unichr(0xFFFE) and u'\uffff' pass without complaint. I believe the correct behavior would be for Python to raise an exception, or at least a warning, on access to these spurious characters.

The alleged codepoints unichr(0xFFFE) and
unichr(0xFFFF) are not unicode characters.  This document:

  http://www.unicode.org/charts/PDF/UFFF0.pdf

Contains:

  Noncharacters
  These codes are intended for process internal uses, but
  are not permitted for interchange.

  FFFE !<not a character>
  ¨ the value FFFE !is guaranteed not to be
    a Unicode character at all
  ¨ may be used to detect byte order by
    contrast with FEFF which is a character
    FEFF zero width no-break space

  FFFF !<not a character>
  ¨ the value FFFF !is guaranteed not to be
    a Unicode character at all

In particular, an XML document that contains such an
alleged unicode entity in not well-formed.

All unicode-aware versions of Python threat these
codepoints in the same manner as other codepoints, e.g.
both unichr(0xFFFE) and u'\uffff' pass without complaint.

I believe the correct behavior would be for Python to
raise an exception, or at least a warning, on access to
these spurious characters.

History
Date	User	Action	Args
2007-08-23 14:14:30	admin	link	issue765036 messages
2007-08-23 14:14:30	admin	create