Issue 25863: ISO-2022 seeking forgets state

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

This issue has been migrated to GitHub: https://github.com/python/cpython/issues/70050

classification

Title:	ISO-2022 seeking forgets state
Type:	behavior	Stage:	test needed
Components:	Extension Modules, IO, Unicode	Versions:	Python 3.6, Python 3.5, Python 2.7

process

Status:	open	Resolution:
Dependencies:		Superseder:
Assigned To:		Nosy List:	ezio.melotti, johnwalker, martin.panter, serhiy.storchaka, vstinner
Priority:	normal	Keywords:	patch

Created on 2015-12-15 02:37 by martin.panter, last changed 2022-04-11 14:58 by admin.

Files
File name	Uploaded	Description	Edit
25863-unittest.patch	johnwalker, 2015-12-30 05:27		review

Messages (4)
msg256431 - (view)	Author: Martin Panter (martin.panter) *	Date: 2015-12-15 02:37
>>> from io import * >>> text = TextIOWrapper(BytesIO(), "iso-2022-jp") >>> text.write(u"P") 1 >>> text.tell() 1 >>> text.write(u"anter 正") 7 >>> text.tell() 12 >>> text.write(u"孝") 1 >>> text.seek(12) 12 >>> text.read() # Should return 孝, not ASCII "9'" >>> text.buffer.getvalue() b"Panter \x1b$B@59'" >>> text.seek(1) 1 >>> text.read(7) 'anter 正' >>> text.tell() # Another bug? Traceback (most recent call last): File "<stdin>", line 1, in <module> UnicodeDecodeError: 'iso2022_jp' codec can't decode bytes in position 2-3: illegal multibyte sequence
msg257144 - (view)	Author: Ezio Melotti (ezio.melotti) *	Date: 2015-12-28 22:43
I confirmed the problem on default (3.6) and verified that it works as expected using utf-8 instead of iso-2022-jp. The code in the above message should be converted into a unittest, and related codecs should be checked as well. The problem is probably in Modules/cjkcodecs/_codecs_iso2022.c
msg257227 - (view)	Author: John Walker (johnwalker) *	Date: 2015-12-30 05:27
Here is Martin's message as a unit test. It checks utf-8 and the iso-2022 family except iso-2022-cn and iso-2022-cn-ext because they are not supported. The errors occur with all iso-2022 charsets.
msg258635 - (view)	Author: Martin Panter (martin.panter) *	Date: 2016-01-20 02:09
After thinking about Issue 26158, I realize the seek() magic numbers don’t store any _encoder_ state, only _decoder_ state. That would explain the first bug (write, seek, then read). Though for this codec I suspect the decoder state is not recorded either, hence the bug with tell(). Personally I don’t care much for seeking text files. But if someone wanted to fix the second bug, that might require fixing the incremental decoder’s getstate() implementation.

History
Date	User	Action	Args
2022-04-11 14:58:24	admin	set	github: 70050
2016-01-20 02:09:55	martin.panter	set	messages: + msg258635
2015-12-30 05:27:05	johnwalker	set	files: + 25863-unittest.patch keywords: + patch messages: + msg257227
2015-12-30 00:11:10	johnwalker	set	nosy: + johnwalker
2015-12-28 22:43:44	ezio.melotti	set	messages: + msg257144 components: + Extension Modules stage: needs patch -> test needed
2015-12-18 18:34:21	serhiy.storchaka	set	nosy: + serhiy.storchaka stage: needs patch
2015-12-15 02:37:24	martin.panter	create