This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author Rhamphoryncus
Recipients Rhamphoryncus, benjamin.peterson, ezio.melotti, lemburg, terry.reedy
Date 2008-09-02.06:51:53
SpamBayes Score 3.093973e-06
Marked as misclassified No
Message-id <1220338314.81.0.154303338973.issue3297@psf.upfronthosting.co.za>
In-reply-to
Content
Marc, I don't understand what you're saying.  UTF-16's surrogates are
not optional.  Unicode 2.0 and later require them, and Python is
supposed to support it.

Likewise, UCS-4 originally allowed a much larger range of code points,
but it no longer does; allowing them would mean supporting only old,
archaic versions of the standards (which is clearly not desirable.)

You are right in that I shouldn't have said "a pair of ill-formed code
units".  I should have said "a pair of unassigned code points", which is
how UCS-2 always have and always will classify them.

Although python may allow ill-formed sequences to be created internally
(primarily lone surrogates on UTF-16 builds), it cannot encode or decode
them.  The standard is clear that these are to be treated as errors,
which the .decode()'s "errors" argument controls.  You could add a new
value for "errors" to pass-through the garbage, but I fail to see a use
case for it.
History
Date User Action Args
2008-09-02 06:51:55Rhamphoryncussetrecipients: + Rhamphoryncus, lemburg, terry.reedy, benjamin.peterson, ezio.melotti
2008-09-02 06:51:54Rhamphoryncussetmessageid: <1220338314.81.0.154303338973.issue3297@psf.upfronthosting.co.za>
2008-09-02 06:51:54Rhamphoryncuslinkissue3297 messages
2008-09-02 06:51:53Rhamphoryncuscreate