Message 36798 - Python tracker

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author	doerwalter
Recipients
Date	2002-03-15.17:06:11
SpamBayes Score
Marked as misclassified
Message-id
In-reply-to

Content
Logged In: YES user_id=89016 For encoding it's always (end-start)*u"?": >>> u"ää".encode("ascii", "replace") '??' But for decoding, it is neither nor: >>> "\\Ux\\U".decode("unicode-escape", "replace") u'\ufffd\ufffd' i.e. a sequence of 5 illegal characters was replace by two replacement characters. This might mean that decoders can't collect all the illegal characters and call the callback once. They might have to call the callback for every single illegal byte sequence to get the old behaviour. (It seems that this patch would be much, much simpler, if we only change the encoders)

Logged In: YES 
user_id=89016

For encoding it's always (end-start)*u"?":
>>> u"ää".encode("ascii", "replace")
'??'

But for decoding, it is neither nor:
>>> "\\Ux\\U".decode("unicode-escape", "replace")
u'\ufffd\ufffd'

i.e. a sequence of 5 illegal characters was replace by two 
replacement characters. This might mean that decoders can't 
collect all the illegal characters and call the callback 
once. They might have to call the callback for every single 
illegal byte sequence to get the old behaviour.

(It seems that this patch would be much, much simpler, if 
we only change the encoders)

History
Date	User	Action	Args
2007-08-23 15:06:07	admin	link	issue432401 messages
2007-08-23 15:06:07	admin	create