Message 139457 - Python tracker

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author	Thomas.Barnet-Lamb
Recipients	Thomas.Barnet-Lamb
Date	2011-06-30.01:43:25
SpamBayes Score	5.1718526e-08
Marked as misclassified	No
Message-id	<1309398206.26.0.914481831308.issue12446@psf.upfronthosting.co.za>
In-reply-to

Content
It appears that StreamReader's readlines method behaves in a strange manner if the StreamReader has, in a previous read operation, decoded more characters than the user asked for; this happens when both the chars and size parameters are used, but only in some circumstances. See the following: Python 2.7.2 (default, Jun 26 2011, 02:56:25) [GCC 4.0.1 (Apple Inc. build 5490)] on darwin Type "help", "copyright", "credits" or "license" for more information. >>> import codecs >>> >>> ## First make a file ... with codecs.open('temp.tmp','wb', encoding='utf8') as f: ... f.write(u'This\u00ab is a test line\nThis is another test line\n') ... >>> >>> ## Now open it for reading ... UTF8Reader = codecs.getreader('utf-8') >>> with UTF8Reader(codecs.open('temp.tmp','rb')) as f: ... print(repr(f.read(size=5, chars=5))) ... print(f.readlines()) ... u'This\xab' [u' is '] # The expected output is # u'This\xab' # [u' is a test line\n', u'This is another test line\n'] I believe the culprit is codecs.py, line 466-467 (the two starred lines below). I think they ought to be replaced with 'pass'. if chars < 0: if size < 0: * if self.charbuffer: * break elif len(self.charbuffer) >= size: break Best wishes, Thomas PS - I will apologize in advance for any oversights or mistakes in the formatting etc. of this bug report---this is my first time!

It appears that StreamReader's readlines method behaves in a strange manner if the StreamReader has, in a previous read operation, decoded more characters than the user asked for; this happens when both the chars and size parameters are used, but only in some circumstances.

See the following:

Python 2.7.2 (default, Jun 26 2011, 02:56:25) 
[GCC 4.0.1 (Apple Inc. build 5490)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import codecs
>>> 
>>> ## First make a file
... with codecs.open('temp.tmp','wb', encoding='utf8') as f:
...     f.write(u'This\u00ab is a test line\nThis is another test line\n')
... 
>>> 
>>> ## Now open it for reading 
... UTF8Reader = codecs.getreader('utf-8')
>>> with UTF8Reader(codecs.open('temp.tmp','rb')) as f:
...     print(repr(f.read(size=5, chars=5)))
...     print(f.readlines())
... 
u'This\xab'
[u' is ']
# The expected output is 
# u'This\xab'
# [u' is a test line\n', u'This is another test line\n']


I believe the culprit is codecs.py, line 466-467 (the two starred lines below). I think they ought to be replaced with 'pass'.


            if chars < 0:
                if size < 0:
*                   if self.charbuffer:
*                       break      
                elif len(self.charbuffer) >= size:
                    break

Best wishes,
Thomas

PS - I will apologize in advance for any oversights or mistakes in the formatting etc. of this bug report---this is my first time!

History
Date	User	Action	Args
2011-06-30 01:43:26	Thomas.Barnet-Lamb	set	recipients: + Thomas.Barnet-Lamb
2011-06-30 01:43:26	Thomas.Barnet-Lamb	set	messageid: <1309398206.26.0.914481831308.issue12446@psf.upfronthosting.co.za>
2011-06-30 01:43:25	Thomas.Barnet-Lamb	link	issue12446 messages
2011-06-30 01:43:25	Thomas.Barnet-Lamb	create