When I use codecs.open(...) and f.readline() follow up by f.read() return bad result #52507

harobed · 2010-03-29T15:09:54Z

BPO	8260
Nosy	@malemburg, @amauryfa, @ncoghlan, @vstinner, @devdanzin, @merwok, @bitdancer, @serhiy-storchaka
Files	codecs_read.patch codecs_read-2.patch codecs_read-3.patch

^{Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.}

Show more details

GitHub fields:

assignee = 'https://github.com/serhiy-storchaka'
closed_at = <Date 2014-01-26.17:34:32.355>
created_at = <Date 2010-03-29.15:09:53.729>
labels = ['type-bug', 'library']
title = 'When I use codecs.open(...) and f.readline() follow up by f.read() return bad result'
updated_at = <Date 2014-01-26.17:34:32.354>
user = 'https://bugs.python.org/harobed'

bugs.python.org fields:

activity = <Date 2014-01-26.17:34:32.354>
actor = 'serhiy.storchaka'
assignee = 'serhiy.storchaka'
closed = True
closed_date = <Date 2014-01-26.17:34:32.355>
closer = 'serhiy.storchaka'
components = ['Library (Lib)']
creation = <Date 2010-03-29.15:09:53.729>
creator = 'harobed'
dependencies = []
files = ['16705', '16706', '33410']
hgrepos = []
issue_num = 8260
keywords = ['patch']
message_count = 15.0
messages = ['101892', '101980', '101987', '101988', '101990', '122823', '138265', '138273', '139465', '177119', '177123', '207875', '209330', '209335', '209337']
nosy_count = 10.0
nosy_names = ['lemburg', 'amaury.forgeotdarc', 'ncoghlan', 'vstinner', 'ajaksu2', 'eric.araujo', 'r.david.murray', 'harobed', 'python-dev', 'serhiy.storchaka']
pr_nums = []
priority = 'normal'
resolution = 'fixed'
stage = 'resolved'
status = 'closed'
superseder = None
type = 'behavior'
url = 'https://bugs.python.org/issue8260'
versions = ['Python 2.7', 'Python 3.3', 'Python 3.4']

harobed · 2010-03-29T15:09:53Z

This is an example, last assert return an error :

f = open('data.txt', 'w')
f.write("""line 1
line 2
line 3
line 4
line 5
line 6
line 7
line 8
line 9
line 10
line 11
""")
f.close()


f = open('data.txt', 'r')
assert f.readline() == 'line 1\n'

assert f.read() == """line 2
line 3
line 4
line 5
line 6
line 7
line 8
line 9
line 10
line 11
"""

f.close()

import codecs

f = codecs.open('data.txt', 'r', 'utf8')

assert f.read() == """line 1
line 2
line 3
line 4
line 5
line 6
line 7
line 8
line 9
line 10
line 11
"""

f.close()

f = codecs.open('data.txt', 'r', 'utf8')

assert f.readline() == 'line 1\n'

# this assert return a ERROR
assert f.read() == """line 2
line 3
line 4
line 5
line 6
line 7
line 8
line 9
line 10
line 11
"""

f.close()

Regards,
Stephane

devdanzin · 2010-03-31T06:12:21Z

Hi Stephane,

I think you're seeing different buffering behavior, which I suspect is correct according to docs.

codecs.open should default to line buffering[1], while open uses the system default[2].

The read() where the assert fails is returning the remaining buffer from the readline (which read 72 chars).

Asserting e.g. "f.read(1024) == ..." will give you the expected result.

[1] http://docs.python.org/library/codecs.html#codecs.open
[2] http://docs.python.org/library/functions.html#open

amauryfa · 2010-03-31T10:00:19Z

Buffering applies when writing, not when reading a file.

There is indeed a problem in codecs.py: after a readline(), read() will return the content of the internal buffer, and not more.

The "size" parameter is a hint, and should not be used to decide whether the character buffer is enough to satisfy the read() request.
Patch is attached, with test.

malemburg · 2010-03-31T10:28:20Z

Amaury Forgeot d'Arc wrote:

Amaury Forgeot d'Arc <amauryfa@gmail.com> added the comment:

Buffering applies when writing, not when reading a file.

There is indeed a problem in codecs.py: after a readline(), read() will return the content of the internal buffer, and not more.

The "size" parameter is a hint, and should not be used to decide whether the character buffer is enough to satisfy the read() request.
Patch is attached, with test.

Agreed.

The patch looks good except the if-line should read:

if chars >= 0 and len(self.charbuffer) >= chars:
  ...

Thanks,

Marc-Andre Lemburg
eGenix.com

::: Try our new mxODBC.Connect Python Database Interface for free ! ::::

eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48
D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg
Registered at Amtsgericht Duesseldorf: HRB 46611
http://www.egenix.com/company/contact/

amauryfa · 2010-03-31T11:11:07Z

Updated patch.

[I also tried to avoid reading the underlying file if len(self.bytebuffer)>=size, but it does not work with multibytes chars when size=1]

merwok · 2010-11-29T16:17:29Z

I applied the diff to test_codecs in py3k, removed the u prefixes and ran: failure. I applied the fix and the test passed.

harobed · 2011-06-13T17:55:55Z

Up, I think this patch isn't applied in Python 3.3a0.

bitdancer · 2011-06-13T19:43:50Z

According to this ticket it hasn't been applied anywhere yet (a message will be posted here when it is).

vstinner · 2011-06-30T08:08:06Z

See also bpo-12446.

serhiy-storchaka · 2012-12-07T19:52:34Z

I think the patch is wrong or is not optimal for case when chars is -1, but size is not.

If we want to read all data in any case, then we should call self.stream.read() without argument if chars < 0 or size < 0.

If we want to read no more than size bytes, then all loop code should be totally rewritten.

Perhaps I am wrong.

serhiy-storchaka · 2012-12-07T20:04:06Z

As showed in bpo-12446, bpo-14475 and bpo-16636 there are different methods to reproduce this bug (read(size, chars) + readlines(), readline() + readlines()). All this cases should be tested.

serhiy-storchaka · 2014-01-10T19:40:38Z

Here is revised patch.

Behavior is changed less. read() is less greedy and uses characters from the buffer when read() is called with only one argument (size). It is now a little closer to io stream's read() than with previous patch.
Added tests for cases of bpo-12446 and bpo-16636.
Fixed read() for for the TransformCodecTest.test_read test added in 3.4. Actually the uu_codec and zlib_codec are broken.

ncoghlan · 2014-01-26T15:40:05Z

Patch looks good to me, but if any specific features are needed to work around misbehaving codecs (as per bpo-20132), a comment in the appropriate place referencing that issue would be helpful.

And if that workaround means we can remove the special casing from the test_readlines test for the binary transform, cool :)

serhiy-storchaka · 2014-01-26T16:23:24Z

Actually this patch doesn't work around misbehaving codecs. It just makes
specific tests (one readline, one read) be passed. More complex tests which use
multiple readline's or read's still can fail with these misbehaving codecs.

python-dev · 2014-01-26T17:30:37Z

New changeset e24265eb2271 by Serhiy Storchaka in branch '2.7':
Issue bpo-8260: The read(), readline() and readlines() methods of
http://hg.python.org/cpython/rev/e24265eb2271

New changeset 9c96c266896e by Serhiy Storchaka in branch '3.3':
Issue bpo-8260: The read(), readline() and readlines() methods of
http://hg.python.org/cpython/rev/9c96c266896e

New changeset b72508a785de by Serhiy Storchaka in branch 'default':
Issue bpo-8260: The read(), readline() and readlines() methods of
http://hg.python.org/cpython/rev/b72508a785de

harobed mannequin added stdlib Python modules in the Lib dir type-bug An unexpected behavior, bug, or error labels Mar 29, 2010

malemburg changed the title ~~When I use codecs.open(...) and f.readline() follow up by f.read() return bad result~~ When I use codecs.open(...) and f.readline() follow up by f.read() return bad result Mar 31, 2010

merwok changed the title ~~When I use codecs.open(...) and f.readline() follow up by f.read() return bad result~~ When I use codecs.open(...) and f.readline() follow up by f.read() return bad result Nov 29, 2010

serhiy-storchaka self-assigned this Jan 26, 2014

serhiy-storchaka closed this as completed Jan 26, 2014

ezio-melotti transferred this issue from another repository Apr 10, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

When I use codecs.open(...) and f.readline() follow up by f.read() return bad result #52507

When I use codecs.open(...) and f.readline() follow up by f.read() return bad result #52507

harobed mannequin commented Mar 29, 2010

harobed mannequin commented Mar 29, 2010

devdanzin mannequin commented Mar 31, 2010

amauryfa commented Mar 31, 2010

malemburg commented Mar 31, 2010

amauryfa commented Mar 31, 2010

merwok commented Nov 29, 2010

harobed mannequin commented Jun 13, 2011

bitdancer commented Jun 13, 2011

vstinner commented Jun 30, 2011

serhiy-storchaka commented Dec 7, 2012

serhiy-storchaka commented Dec 7, 2012

serhiy-storchaka commented Jan 10, 2014

ncoghlan commented Jan 26, 2014

serhiy-storchaka commented Jan 26, 2014

python-dev mannequin commented Jan 26, 2014

When I use codecs.open(...) and f.readline() follow up by f.read() return bad result #52507

When I use codecs.open(...) and f.readline() follow up by f.read() return bad result #52507

Comments

harobed mannequin commented Mar 29, 2010

harobed mannequin commented Mar 29, 2010

devdanzin mannequin commented Mar 31, 2010

amauryfa commented Mar 31, 2010

malemburg commented Mar 31, 2010

Thanks,

amauryfa commented Mar 31, 2010

merwok commented Nov 29, 2010

harobed mannequin commented Jun 13, 2011

bitdancer commented Jun 13, 2011

vstinner commented Jun 30, 2011

serhiy-storchaka commented Dec 7, 2012

serhiy-storchaka commented Dec 7, 2012

serhiy-storchaka commented Jan 10, 2014

ncoghlan commented Jan 26, 2014

serhiy-storchaka commented Jan 26, 2014

python-dev mannequin commented Jan 26, 2014