Issue 6213: Incremental encoder incompatibility between 2.x and py3k

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

This issue has been migrated to GitHub: https://github.com/python/cpython/issues/50462

classification

Title:	Incremental encoder incompatibility between 2.x and py3k
Type:	behavior	Stage:
Components:		Versions:	Python 2.7

process

Status:	closed	Resolution:	fixed
Dependencies:		Superseder:
Assigned To:		Nosy List:	doerwalter, pitrou, vstinner
Priority:	normal	Keywords:	patch

Created on 2009-06-05 20:17 by pitrou, last changed 2022-04-11 14:56 by admin. This issue is now closed.

Files
File name	Uploaded	Description	Edit
utf_8_16.patch	vstinner, 2010-07-24 03:41

Messages (8)
msg88972 - (view)	Author: Antoine Pitrou (pitrou) *	Date: 2009-06-05 20:17
The behaviour of several incremental encoders is inconsistent between 2.x and py3k. In 2.x: >>> enc = codecs.getincrementalencoder('utf-16')() >>> enc.getstate() 0 >>> enc.setstate(0) >>> enc.encode(u'abc') '\xff\xfea\x00b\x00c\x00' In py3k: >>> enc = codecs.getincrementalencoder('utf-16')() >>> enc.getstate() 2 >>> enc.setstate(0) >>> enc.encode('abc') b'a\x00b\x00c\x00'
msg89073 - (view)	Author: Walter Dörwald (doerwalter) *	Date: 2009-06-08 11:13
This was done because the codec state is part of the return value of tell(). To have a reasonable return value (i.e. one with just the position itself) in as many cases as possible it makes sense to design the codec state in such a way, that the most common state is 0. This is what was done for py3k: The default state (no BOM read/written yet) is 2 not 0.
msg89074 - (view)	Author: Antoine Pitrou (pitrou) *	Date: 2009-06-08 11:19
Yes, I agree with py3k's behaviour. But it should be backported to 2.x as well. I don't know where the changes must be done so if someone else could do it it would be nice :-) (I'm backporting the py3k IO lib and I had to disable two tests because of this)
msg89075 - (view)	Author: Walter Dörwald (doerwalter) *	Date: 2009-06-08 11:59
AFAICR the difference is: 2.x may return any object in getstate(), but py3k must return a (buffered input, integer) tuple. Simply moving py3ks getstate/setstate implementation over to 2.x might do the trick.
msg111423 - (view)	Author: STINNER Victor (vstinner) *	Date: 2010-07-24 03:41
Codecs are inconsistents: utf-32 has working getstate() / setstate() methods, whereas utf-8-sig and utf-16 don't (getstate() always return 0, setstate() does nothing). > Simply moving py3ks getstate/setstate implementation > over to 2.x might do the trick. That's what my patch does :-) It just a copy/paste of Python3 code. It does fix #5006 tests (which are re-enabled by the patch). Using the patch, it's possible to save/restore utf-8-sig and utf-16 codecs state.
msg111745 - (view)	Author: Antoine Pitrou (pitrou) *	Date: 2010-07-27 22:47
The patch looks ok to me (I suppose you have tested it).
msg111760 - (view)	Author: STINNER Victor (vstinner) *	Date: 2010-07-28 01:45
> The patch looks ok to me Ok, commited to 2.7 (r83198). > (I suppose you have tested it) I ran test_io which does test the incremental encoders. -- I'm not brave enough to commit it to 2.6 (test_io in 2.6 doesn't use incremental encoders).
msg111762 - (view)	Author: STINNER Victor (vstinner) *	Date: 2010-07-28 01:59
> I'm not brave enough to commit it to 2.6 > (test_io in 2.6 doesn't use incremental encoders) Oh, I just remembered that I choosed to fix this issue to be able to backport #5006 to 2.6 :-) So r83199 is the incremental encoder fix for 2.6, and r83200 is the BOM fix for the io library.

History
Date	User	Action	Args
2022-04-11 14:56:49	admin	set	github: 50462
2010-07-28 01:59:06	vstinner	set	messages: + msg111762
2010-07-28 01:45:22	vstinner	set	status: open -> closed resolution: fixed messages: + msg111760
2010-07-27 22:47:30	pitrou	set	messages: + msg111745 versions: - Python 3.2
2010-07-24 03:41:56	vstinner	set	files: + utf_8_16.patch nosy: + vstinner messages: + msg111423 keywords: + patch
2009-06-08 11:59:30	doerwalter	set	messages: + msg89075
2009-06-08 11:19:07	pitrou	set	messages: + msg89074
2009-06-08 11:13:55	doerwalter	set	nosy: + doerwalter messages: + msg89073
2009-06-05 20:17:41	pitrou	create