classification
Title: StreamReader Readlines behavior odd
Type: behavior Stage: resolved
Components: Unicode Versions: Python 2.7
process
Status: closed Resolution: duplicate
Dependencies: Superseder: When I use codecs.open(...) and f.readline() follow up by f.read() return bad result
View: 8260
Assigned To: Nosy List: Thomas.Barnet-Lamb, haypo, lemburg, serhiy.storchaka
Priority: normal Keywords:

Created on 2011-06-30 01:43 by Thomas.Barnet-Lamb, last changed 2012-12-07 20:03 by serhiy.storchaka. This issue is now closed.

Messages (3)
msg139457 - (view) Author: Thomas Barnet-Lamb (Thomas.Barnet-Lamb) Date: 2011-06-30 01:43
It appears that StreamReader's readlines method behaves in a strange manner if the StreamReader has, in a previous read operation, decoded more characters than the user asked for; this happens when both the chars and size parameters are used, but only in some circumstances.

See the following:

Python 2.7.2 (default, Jun 26 2011, 02:56:25) 
[GCC 4.0.1 (Apple Inc. build 5490)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import codecs
>>> 
>>> ## First make a file
... with codecs.open('temp.tmp','wb', encoding='utf8') as f:
...     f.write(u'This\u00ab is a test line\nThis is another test line\n')
... 
>>> 
>>> ## Now open it for reading 
... UTF8Reader = codecs.getreader('utf-8')
>>> with UTF8Reader(codecs.open('temp.tmp','rb')) as f:
...     print(repr(f.read(size=5, chars=5)))
...     print(f.readlines())
... 
u'This\xab'
[u' is ']
# The expected output is 
# u'This\xab'
# [u' is a test line\n', u'This is another test line\n']


I believe the culprit is codecs.py, line 466-467 (the two starred lines below). I think they ought to be replaced with 'pass'.


            if chars < 0:
                if size < 0:
*                   if self.charbuffer:
*                       break      
                elif len(self.charbuffer) >= size:
                    break

Best wishes,
Thomas

PS - I will apologize in advance for any oversights or mistakes in the formatting etc. of this bug report---this is my first time!
msg139464 - (view) Author: STINNER Victor (haypo) * (Python committer) Date: 2011-06-30 08:07
See also #8260.
msg177121 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2012-12-07 20:03
Yes, this is obviously a duplicate of issue8260.
History
Date User Action Args
2012-12-07 20:03:22serhiy.storchakasetstatus: open -> closed

superseder: When I use codecs.open(...) and f.readline() follow up by f.read() return bad result

nosy: + serhiy.storchaka
messages: + msg177121
resolution: duplicate
stage: resolved
2011-06-30 08:07:59hayposetmessages: + msg139464
2011-06-30 05:22:02amaury.forgeotdarcsetnosy: + lemburg, haypo
2011-06-30 01:44:18Thomas.Barnet-Lambsettitle: StreamReader Readlines -> StreamReader Readlines behavior odd
2011-06-30 01:43:53Thomas.Barnet-Lambsettype: behavior
2011-06-30 01:43:25Thomas.Barnet-Lambcreate