Message72316
Marc, I don't understand what you're saying. UTF-16's surrogates are
not optional. Unicode 2.0 and later require them, and Python is
supposed to support it.
Likewise, UCS-4 originally allowed a much larger range of code points,
but it no longer does; allowing them would mean supporting only old,
archaic versions of the standards (which is clearly not desirable.)
You are right in that I shouldn't have said "a pair of ill-formed code
units". I should have said "a pair of unassigned code points", which is
how UCS-2 always have and always will classify them.
Although python may allow ill-formed sequences to be created internally
(primarily lone surrogates on UTF-16 builds), it cannot encode or decode
them. The standard is clear that these are to be treated as errors,
which the .decode()'s "errors" argument controls. You could add a new
value for "errors" to pass-through the garbage, but I fail to see a use
case for it. |
|
Date |
User |
Action |
Args |
2008-09-02 06:51:55 | Rhamphoryncus | set | recipients:
+ Rhamphoryncus, lemburg, terry.reedy, benjamin.peterson, ezio.melotti |
2008-09-02 06:51:54 | Rhamphoryncus | set | messageid: <1220338314.81.0.154303338973.issue3297@psf.upfronthosting.co.za> |
2008-09-02 06:51:54 | Rhamphoryncus | link | issue3297 messages |
2008-09-02 06:51:53 | Rhamphoryncus | create | |
|