New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
codecs.open() buffering doc needs fix #54553
Comments
codecs.readline has an internal buffer of 72 chars so calling codecs.open with buffering=0 doesn't work as expected although buffering is passed to the underlying __builtin__.open call. Example session: Python 3.2a3+ (py3k, Nov 6 2010, 16:17:14)
[GCC 4.5.1] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import codecs
>>> f = codecs.open("foo.txt", "w", "utf-8")
>>> word = "bar\n"
>>> content = word * 1000
>>> f.write(content)
>>> f.close()
>>> f = codecs.open("foo.txt", "rb", "utf-8", buffering=0)
>>> f.readline()
'bar\n'
>>> f.tell()
72 |
Antoine, should codecs.open() be removed or simply aliased to open()? |
Amaury Forgeot d'Arc wrote:
Both is not possible: codecs.open() provides a different API than Regarding the issue itself: I think this is a wrong interpretation of Besides, switching buffering off in open() is only allowed for The only way to implement "unbuffered" .readline() in the way I think we should close this issue as "won't fix". |
Marc-Andre Lemburg wrote:
Ok. But builtin's readline buffering works like (I) expected. So there is a difference in behavior between builtins readline an codecs.readline (and it bite me). ¿Maybe it should be noted in documentation? Python 3.2a3+ (py3k, Nov 6 2010, 16:17:14)
[GCC 4.5.1] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> f = open("foo.txt", "rb", buffering=0)
>>> f.readline()
b'bar\n'
>>> f.tell()
4 |
Please suggest a specific alteration in the codecs.readline doc that we can then discuss. |
Something seems wrong somewhere. First, The Note says "Files are always opened in binary mode, even if no binary mode was specified.". However, the code is (buffering=1) means line buffered. However, the doc for builtin open() says about buffering "1 to select line buffering (only usable in text mode)" So the default buffering is one that is not usable in the normal forced binary mode. Marc-Andre, can you explain this? (The doc for open() does not specify what happens when the buffering conflicts with the mode.) The doc for StreamReader.readline() says ""size, if given, is passed as size argument to the stream’s readline() method.". If that were true, size would the max bytes to read. However, the docstring for the same in codecs.py says "size, if given, is passed as size argument to the read() method.", and that is what the code does. If not given, 72 is used as the default. (Why not 80?) So, while the doc needs a minor tweak, I do not see what the OP's posted original result has to do with buffering. .readline does not have a fixed internal buffer of 72 chars that I can see. Rather, that is the default number of chars to read. So that is what it read, given that the file is longer than that. I believe this is what Marc-Andre said, in different words, in his first post, in between the distraction of whether to remove open. Santiago, yes, there is a difference between open.readline and codecs.readline. It will be more obvious when the codecs.readline size doc is corrected to specify that it is passed to read(), not readline(), and that it defaults to 72. |
What I described is the behavior of codecs.StreamReader. However, the streamreader associated with a particular encoding(codec) might do differently. My understanding is that StreamReader is an example that a particular codec can use, derive from, or merely mimic the interface of. |
A couple of specific problems have been raised by Terry here. Checking each against the current Python 3 status, some have already been fixed:
So that leaves these three problems, as I see it:
|
Alright, I closed this old issue. |
Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.
Show more details
GitHub fields:
bugs.python.org fields:
The text was updated successfully, but these errors were encountered: