StreamReader Readlines behavior odd #56655

ThomasBarnet-Lamb · 2011-06-30T01:43:26Z

BPO	12446
Nosy	@malemburg, @vstinner, @serhiy-storchaka
Superseder	bpo-8260: When I use codecs.open(...) and f.readline() follow up by f.read() return bad result

^{Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.}

Show more details

GitHub fields:

assignee = None
closed_at = <Date 2012-12-07.20:03:22.658>
created_at = <Date 2011-06-30.01:43:25.689>
labels = ['type-bug', 'expert-unicode']
title = 'StreamReader Readlines behavior odd'
updated_at = <Date 2012-12-07.20:03:22.656>
user = 'https://bugs.python.org/ThomasBarnet-Lamb'

bugs.python.org fields:

activity = <Date 2012-12-07.20:03:22.656>
actor = 'serhiy.storchaka'
assignee = 'none'
closed = True
closed_date = <Date 2012-12-07.20:03:22.658>
closer = 'serhiy.storchaka'
components = ['Unicode']
creation = <Date 2011-06-30.01:43:25.689>
creator = 'Thomas.Barnet-Lamb'
dependencies = []
files = []
hgrepos = []
issue_num = 12446
keywords = []
message_count = 3.0
messages = ['139457', '139464', '177121']
nosy_count = 4.0
nosy_names = ['lemburg', 'vstinner', 'Thomas.Barnet-Lamb', 'serhiy.storchaka']
pr_nums = []
priority = 'normal'
resolution = 'duplicate'
stage = 'resolved'
status = 'closed'
superseder = '8260'
type = 'behavior'
url = 'https://bugs.python.org/issue12446'
versions = ['Python 2.7']

ThomasBarnet-Lamb · 2011-06-30T01:43:25Z

It appears that StreamReader's readlines method behaves in a strange manner if the StreamReader has, in a previous read operation, decoded more characters than the user asked for; this happens when both the chars and size parameters are used, but only in some circumstances.

See the following:

Python 2.7.2 (default, Jun 26 2011, 02:56:25) 
[GCC 4.0.1 (Apple Inc. build 5490)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import codecs
>>> 
>>> ## First make a file
... with codecs.open('temp.tmp','wb', encoding='utf8') as f:
...     f.write(u'This\u00ab is a test line\nThis is another test line\n')
... 
>>> 
>>> ## Now open it for reading 
... UTF8Reader = codecs.getreader('utf-8')
>>> with UTF8Reader(codecs.open('temp.tmp','rb')) as f:
...     print(repr(f.read(size=5, chars=5)))
...     print(f.readlines())
... 
u'This\xab'
[u' is ']
# The expected output is 
# u'This\xab'
# [u' is a test line\n', u'This is another test line\n']

I believe the culprit is codecs.py, line 466-467 (the two starred lines below). I think they ought to be replaced with 'pass'.

            if chars < 0:
                if size < 0:
*                   if self.charbuffer:
*                       break      
                elif len(self.charbuffer) >= size:
                    break

Best wishes,
Thomas

PS - I will apologize in advance for any oversights or mistakes in the formatting etc. of this bug report---this is my first time!

vstinner · 2011-06-30T08:08:00Z

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

StreamReader Readlines behavior odd #56655

StreamReader Readlines behavior odd #56655

ThomasBarnet-Lamb mannequin commented Jun 30, 2011

ThomasBarnet-Lamb mannequin commented Jun 30, 2011

vstinner commented Jun 30, 2011

serhiy-storchaka commented Dec 7, 2012

StreamReader Readlines behavior odd #56655

StreamReader Readlines behavior odd #56655

Comments

ThomasBarnet-Lamb mannequin commented Jun 30, 2011

ThomasBarnet-Lamb mannequin commented Jun 30, 2011

vstinner commented Jun 30, 2011

serhiy-storchaka commented Dec 7, 2012