classification
Title: codecs: StremReader readline() breaks on undocumented characters
Type: Stage: resolved
Components: Documentation Versions: Python 3.3, Python 3.4, Python 2.7
process
Status: closed Resolution: duplicate
Dependencies: Superseder: codecs.open interprets FS, RS, GS as line ends
View: 18291
Assigned To: docs@python Nosy List: docs@python, guettli, serhiy.storchaka
Priority: normal Keywords:

Created on 2013-07-01 07:30 by guettli, last changed 2013-07-01 09:36 by serhiy.storchaka. This issue is now closed.

Messages (2)
msg192112 - (view) Author: Thomas Guettler (guettli) Date: 2013-07-01 07:30
The stream reader of codecs.open() breaks on undocumented characters:

http://docs.python.org/2/library/codecs.html?highlight=codecs%20readline#codecs.StreamReader.readline

import tempfile
temp=tempfile.mktemp()
fd=open(temp, 'wb')
fd.write('abc\ndef\x85ghi')
fd.close()

import codecs
fd=codecs.open(temp, 'rb', 'latin1')
while True:
    line=fd.readline()
    if not line:
        break
    print repr(line)

Result:
u'abc\n'
u'def\x85'
u'ghi'

Related: http://stackoverflow.com/questions/16227114/utf-8-files-read-in-python-will-line-break-at-character-x85
msg192117 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2013-07-01 09:36
Thank you for your report. This is a duplicate of issue18291.
History
Date User Action Args
2013-07-01 09:36:06serhiy.storchakasetstatus: open -> closed

superseder: codecs.open interprets FS, RS, GS as line ends
versions: + Python 3.3, Python 3.4
nosy: + serhiy.storchaka

messages: + msg192117
resolution: duplicate
stage: resolved
2013-07-01 07:30:57guettlicreate