This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author guettli
Recipients docs@python, guettli
Date 2013-07-01.07:30:57
SpamBayes Score -1.0
Marked as misclassified Yes
Message-id <1372663857.57.0.575592634078.issue18337@psf.upfronthosting.co.za>
In-reply-to
Content
The stream reader of codecs.open() breaks on undocumented characters:

http://docs.python.org/2/library/codecs.html?highlight=codecs%20readline#codecs.StreamReader.readline

import tempfile
temp=tempfile.mktemp()
fd=open(temp, 'wb')
fd.write('abc\ndef\x85ghi')
fd.close()

import codecs
fd=codecs.open(temp, 'rb', 'latin1')
while True:
    line=fd.readline()
    if not line:
        break
    print repr(line)

Result:
u'abc\n'
u'def\x85'
u'ghi'

Related: http://stackoverflow.com/questions/16227114/utf-8-files-read-in-python-will-line-break-at-character-x85
History
Date User Action Args
2013-07-01 07:30:57guettlisetrecipients: + guettli, docs@python
2013-07-01 07:30:57guettlisetmessageid: <1372663857.57.0.575592634078.issue18337@psf.upfronthosting.co.za>
2013-07-01 07:30:57guettlilinkissue18337 messages
2013-07-01 07:30:57guettlicreate