This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author doerwalter
Recipients
Date 2002-03-15.17:06:11
SpamBayes Score
Marked as misclassified
Message-id
In-reply-to
Content
Logged In: YES 
user_id=89016

For encoding it's always (end-start)*u"?":
>>> u"ää".encode("ascii", "replace")
'??'

But for decoding, it is neither nor:
>>> "\\Ux\\U".decode("unicode-escape", "replace")
u'\ufffd\ufffd'

i.e. a sequence of 5 illegal characters was replace by two 
replacement characters. This might mean that decoders can't 
collect all the illegal characters and call the callback 
once. They might have to call the callback for every single 
illegal byte sequence to get the old behaviour.

(It seems that this patch would be much, much simpler, if 
we only change the encoders)
History
Date User Action Args
2007-08-23 15:06:07adminlinkissue432401 messages
2007-08-23 15:06:07admincreate