Message16831
The alleged codepoints unichr(0xFFFE) and
unichr(0xFFFF) are not unicode characters. This document:
http://www.unicode.org/charts/PDF/UFFF0.pdf
Contains:
Noncharacters
These codes are intended for process internal uses, but
are not permitted for interchange.
FFFE !<not a character>
¨ the value FFFE !is guaranteed not to be
a Unicode character at all
¨ may be used to detect byte order by
contrast with FEFF which is a character
FEFF zero width no-break space
FFFF !<not a character>
¨ the value FFFF !is guaranteed not to be
a Unicode character at all
In particular, an XML document that contains such an
alleged unicode entity in not well-formed.
All unicode-aware versions of Python threat these
codepoints in the same manner as other codepoints, e.g.
both unichr(0xFFFE) and u'\uffff' pass without complaint.
I believe the correct behavior would be for Python to
raise an exception, or at least a warning, on access to
these spurious characters.
|
|
Date |
User |
Action |
Args |
2007-08-23 14:14:30 | admin | link | issue765036 messages |
2007-08-23 14:14:30 | admin | create | |
|