Issue 26990: file.tell affect decoding

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

This issue has been migrated to GitHub: https://github.com/python/cpython/issues/71177

classification

process

Status:	open	Resolution:
Dependencies:		Superseder:
Assigned To:		Nosy List:	ezio.melotti, martin.panter, mfmain, vstinner
Priority:	normal	Keywords:

Created on 2016-05-10 03:34 by mfmain, last changed 2022-04-11 14:58 by admin.

Messages (2)
msg265224 - (view)	Author: mfmain (mfmain)	Date: 2016-05-10 03:34
C:\tmp>hexdump badtell.txt 000000: 61 20 6B 0D 0A D2 BB B0-E3 a k...... C:\tmp>type test.py with open(r'c:\tmp\badtell.txt', "r", encoding='gbk') as f: while True: pos = f.tell() line = f.readline(); if not line: break print(line) C:\tmp>python test.py a k Traceback (most recent call last): File "test.py", line 4, in <module> line = f.readline(); UnicodeDecodeError: 'gbk' codec can't decode byte 0xd2 in position 0: incomplete multibyte sequence When I remove f.tell() statement, it decoded successfully. I tried python3.4/3.5 x64 on win7/win10, it is all the same.
msg268994 - (view)	Author: Martin Panter (martin.panter) *	Date: 2016-06-21 13:21
See also the second part of Issue 25863, a similar symptom with the iso-2022-jp codec. I suspect many of the multibyte CJK type codecs don’t properly support saving and restoring their state.

History
Date	User	Action	Args
2022-04-11 14:58:30	admin	set	github: 71177
2016-06-21 13:21:28	martin.panter	set	nosy: + martin.panter messages: + msg268994
2016-05-10 03:34:33	mfmain	create