New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
When I use codecs.open(...) and f.readline() follow up by f.read() return bad result #52507
Comments
This is an example, last assert return an error : f = open('data.txt', 'w')
f.write("""line 1
line 2
line 3
line 4
line 5
line 6
line 7
line 8
line 9
line 10
line 11
""")
f.close()
f = open('data.txt', 'r')
assert f.readline() == 'line 1\n' assert f.read() == """line 2 f.close()
import codecs
f = codecs.open('data.txt', 'r', 'utf8') assert f.read() == """line 1 f.close()
f = codecs.open('data.txt', 'r', 'utf8') assert f.readline() == 'line 1\n' # this assert return a ERROR f.close() Regards, |
Hi Stephane, I think you're seeing different buffering behavior, which I suspect is correct according to docs. codecs.open should default to line buffering[1], while open uses the system default[2]. The read() where the assert fails is returning the remaining buffer from the readline (which read 72 chars). Asserting e.g. "f.read(1024) == ..." will give you the expected result. [1] http://docs.python.org/library/codecs.html#codecs.open |
Buffering applies when writing, not when reading a file. There is indeed a problem in codecs.py: after a readline(), read() will return the content of the internal buffer, and not more. The "size" parameter is a hint, and should not be used to decide whether the character buffer is enough to satisfy the read() request. |
Amaury Forgeot d'Arc wrote:
Agreed. The patch looks good except the if-line should read: if chars >= 0 and len(self.charbuffer) >= chars:
... Thanks,Marc-Andre Lemburg ::: Try our new mxODBC.Connect Python Database Interface for free ! :::: eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 |
Updated patch. [I also tried to avoid reading the underlying file if len(self.bytebuffer)>=size, but it does not work with multibytes chars when size=1] |
I applied the diff to test_codecs in py3k, removed the u prefixes and ran: failure. I applied the fix and the test passed. |
Up, I think this patch isn't applied in Python 3.3a0. |
According to this ticket it hasn't been applied anywhere yet (a message will be posted here when it is). |
See also bpo-12446. |
I think the patch is wrong or is not optimal for case when chars is -1, but size is not. If we want to read all data in any case, then we should call self.stream.read() without argument if chars < 0 or size < 0. If we want to read no more than size bytes, then all loop code should be totally rewritten. Perhaps I am wrong. |
Here is revised patch.
|
Patch looks good to me, but if any specific features are needed to work around misbehaving codecs (as per bpo-20132), a comment in the appropriate place referencing that issue would be helpful. And if that workaround means we can remove the special casing from the test_readlines test for the binary transform, cool :) |
Actually this patch doesn't work around misbehaving codecs. It just makes |
New changeset e24265eb2271 by Serhiy Storchaka in branch '2.7': New changeset 9c96c266896e by Serhiy Storchaka in branch '3.3': New changeset b72508a785de by Serhiy Storchaka in branch 'default': |
Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.
Show more details
GitHub fields:
bugs.python.org fields:
The text was updated successfully, but these errors were encountered: