Message 100687 - Python tracker

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author	vstinner
Recipients	vstinner
Date	2010-03-09.01:11:54
SpamBayes Score	2.0574147e-07
Marked as misclassified	No
Message-id	<1268097120.16.0.950101613769.issue8092@psf.upfronthosting.co.za>
In-reply-to

Content
This issue is a regression introduced by r72208 to fix the issue #3672. Attached patch fixes PyUnicode_EncodeUTF8() if unicode_encode_call_errorhandler() returns an unicode string (eg. backslackreplace error handler). I don't know unicodeobject.c code (very well), and my patch should be far from being perfect. I suppose that the maximum length of an escaped characters is 8 bytes (xmlcharrefreplace error error for U+DFFFF). When the first lone surrogate is found, reallocate the buffer to size*8 bytes. The escaped character have to be an ASCII character or an UnicodeEncodeError is raised. Note: unicode_encode_ucs1() doesn't have hardcoded for the maximum length ot escaped string. Its code might be reused in PyUnicode_EncodeUTF8() to remove the hardcoded limits.

This issue is a regression introduced by r72208 to fix the issue #3672.

Attached patch fixes PyUnicode_EncodeUTF8() if unicode_encode_call_errorhandler() returns an unicode string (eg. backslackreplace error handler). I don't know unicodeobject.c code (very well), and my patch should be far from being perfect.

I suppose that the maximum length of an escaped characters is 8 bytes (xmlcharrefreplace error error for U+DFFFF). When the first lone surrogate is found, reallocate the buffer to size*8 bytes. The escaped character have to be an ASCII character or an UnicodeEncodeError is raised.

Note: unicode_encode_ucs1() doesn't have hardcoded for the maximum length ot escaped string. Its code might be reused in PyUnicode_EncodeUTF8() to remove the hardcoded limits.

History
Date	User	Action	Args
2010-04-20 19:36:44	vstinner	unlink	issue8092 messages
2010-03-09 01:12:00	vstinner	set	recipients: + vstinner
2010-03-09 01:12:00	vstinner	set	messageid: <1268097120.16.0.950101613769.issue8092@psf.upfronthosting.co.za>
2010-03-09 01:11:58	vstinner	link	issue8092 messages
2010-03-09 01:11:57	vstinner	create