Message 167831 - Python tracker

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author	vstinner
Recipients	Arfrever, ishimoto, loewis, methane, mrabarnett, ncoghlan, pitrou, rurpy2, serhiy.storchaka, vstinner
Date	2012-08-09.20:42:47
SpamBayes Score	-1.0
Marked as misclassified	Yes
Message-id	<1344544970.59.0.677493089722.issue15216@psf.upfronthosting.co.za>
In-reply-to

Content
Oh, set_encoding.patch is wrong: + offset = self._decoded_chars_used - len(next_input) self._decoded_chars_used is a number of Unicode characters, len(next_input) is a number of bytes. It only works with 7 and 8 bit encodings like ascii or latin1, but not with multibyte encodings like utf8 or ucs-4. > peeking into the underlying buffer would be enough to > handle encoding detection. I wrote a new patch using this idea. It does not work (yet?) with non seekable streams. The raw read buffer (bytes string) is not stored in the _snapshot attribute if the stream is not seeakble. It may be changed to solve this issue. set_encoding-2.patch is still a work-in-progress. It does not patch the _io module for example.

Oh, set_encoding.patch is wrong:

+            offset = self._decoded_chars_used - len(next_input)

self._decoded_chars_used is a number of Unicode characters, len(next_input) is a number of bytes. It only works with 7 and 8 bit encodings like ascii or latin1, but not with multibyte encodings like utf8 or ucs-4.

> peeking into the underlying buffer would be enough to
> handle encoding detection.

I wrote a new patch using this idea. It does not work (yet?) with non seekable streams. The raw read buffer (bytes string) is not stored in the _snapshot attribute if the stream is not seeakble. It may be changed to solve this issue.

set_encoding-2.patch is still a work-in-progress. It does not patch the _io module for example.

History
Date	User	Action	Args
2012-08-09 20:42:50	vstinner	set	recipients: + vstinner, loewis, ishimoto, ncoghlan, pitrou, mrabarnett, Arfrever, methane, rurpy2, serhiy.storchaka
2012-08-09 20:42:50	vstinner	set	messageid: <1344544970.59.0.677493089722.issue15216@psf.upfronthosting.co.za>
2012-08-09 20:42:49	vstinner	link	issue15216 messages
2012-08-09 20:42:49	vstinner	create