This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: file.tell affect decoding
Type: Stage:
Components: Unicode Versions: Python 3.5
process
Status: open Resolution:
Dependencies: Superseder:
Assigned To: Nosy List: ezio.melotti, martin.panter, mfmain, vstinner
Priority: normal Keywords:

Created on 2016-05-10 03:34 by mfmain, last changed 2022-04-11 14:58 by admin.

Messages (2)
msg265224 - (view) Author: mfmain (mfmain) Date: 2016-05-10 03:34
C:\tmp>hexdump badtell.txt

    000000: 61 20 6B 0D 0A D2 BB B0-E3                       a k......

C:\tmp>type test.py

    with open(r'c:\tmp\badtell.txt', "r", encoding='gbk') as f:
        while True:
            pos = f.tell()
            line = f.readline();
            if not line: break
            print(line)

C:\tmp>python test.py

    a k
    
    Traceback (most recent call last):
      File "test.py", line 4, in <module>
        line = f.readline();
    UnicodeDecodeError: 'gbk' codec can't decode byte 0xd2 in position 0:  incomplete multibyte sequence


When I remove f.tell() statement, it decoded successfully.
I tried python3.4/3.5 x64 on win7/win10, it is all the same.
msg268994 - (view) Author: Martin Panter (martin.panter) * (Python committer) Date: 2016-06-21 13:21
See also the second part of Issue 25863, a similar symptom with the iso-2022-jp codec. I suspect many of the multibyte CJK type codecs don’t properly support saving and restoring their state.
History
Date User Action Args
2022-04-11 14:58:30adminsetgithub: 71177
2016-06-21 13:21:28martin.pantersetnosy: + martin.panter
messages: + msg268994
2016-05-10 03:34:33mfmaincreate