This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: codecs.open() buffering doc needs fix
Type: behavior Stage: patch review
Components: Documentation Versions: Python 3.2, Python 3.3, Python 2.7
process
Status: open Resolution:
Dependencies: Superseder:
Assigned To: docs@python Nosy List: Santiago.Piccinini, amaury.forgeotdarc, docs@python, lemburg, martin.panter, terry.reedy, vstinner
Priority: normal Keywords: patch

Created on 2010-11-06 23:00 by Santiago.Piccinini, last changed 2022-04-11 14:57 by admin.

Pull Requests
URL Status Linked Edit
PR 4842 merged izbyshev, 2017-12-13 16:33
Messages (9)
msg120652 - (view) Author: Santiago Piccinini (Santiago.Piccinini) Date: 2010-11-06 23:00
codecs.readline has an internal buffer of 72 chars so calling codecs.open with buffering=0 doesn't work as expected although buffering is passed to the underlying __builtin__.open call.

Example session:

Python 3.2a3+ (py3k, Nov  6 2010, 16:17:14) 
[GCC 4.5.1] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import codecs
>>> f = codecs.open("foo.txt", "w", "utf-8")
>>> word = "bar\n"
>>> content = word * 1000
>>> f.write(content)
>>> f.close()
>>> f = codecs.open("foo.txt", "rb", "utf-8", buffering=0)
>>> f.readline()
'bar\n'
>>> f.tell()
72
msg120653 - (view) Author: Amaury Forgeot d'Arc (amaury.forgeotdarc) * (Python committer) Date: 2010-11-07 00:11
Antoine, should codecs.open() be removed or simply aliased to open()?
msg120656 - (view) Author: Marc-Andre Lemburg (lemburg) * (Python committer) Date: 2010-11-07 01:07
Amaury Forgeot d'Arc wrote:
> 
> Amaury Forgeot d'Arc <amauryfa@gmail.com> added the comment:
> 
> Antoine, should codecs.open() be removed or simply aliased to open()?

Both is not possible: codecs.open() provides a different API than
open(). Unlike open(), codecs.open() allow use of all available
codecs, not just ones that decode to Unicode.

Regarding the issue itself: I think this is a wrong interpretation of
what the buffering parameter does. File buffering is different
from .readline() buffering (which can be customized on a per-call
basis by specifying a size parameter).

Besides, switching buffering off in open() is only allowed for
binary files, so open() wouldn't "solve" the mentioned behavior.

The only way to implement "unbuffered" .readline() in the way
that Santiago appears to be after would be to set the size parameter
to 1 for all .readline() calls. That would result in very poor
performance, though.

I think we should close this issue as "won't fix".
msg120658 - (view) Author: Santiago Piccinini (Santiago.Piccinini) Date: 2010-11-07 01:41
Marc-Andre Lemburg wrote:
>Regarding the issue itself: I think this is a wrong interpretation of
>what the buffering parameter does. File buffering is different
>from .readline() buffering (which can be customized on a per-call
>basis by specifying a size parameter).

Ok. But builtin's readline buffering works like (I) expected. So there is a difference in behavior between builtins readline an codecs.readline (and it bite me). ¿Maybe it should be noted in documentation?

Python 3.2a3+ (py3k, Nov  6 2010, 16:17:14)
[GCC 4.5.1] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> f = open("foo.txt", "rb", buffering=0)
>>> f.readline()
b'bar\n'
>>> f.tell()
4
msg121052 - (view) Author: Terry J. Reedy (terry.reedy) * (Python committer) Date: 2010-11-12 17:59
Please suggest a specific alteration in the codecs.readline doc that we can then discuss.
msg151163 - (view) Author: Terry J. Reedy (terry.reedy) * (Python committer) Date: 2012-01-13 09:08
Something seems wrong somewhere. First,
codecs.open(filename, mode[, encoding[, errors[, buffering]]]) 
in the doc, should be, to match the code, in the current sytle
codecs.open(filename, mode='rb', encoding=None, errors='strict', buffering=1)
The other entries below follow this style.

The Note says "Files are always opened in binary mode, even if no binary mode was specified.". However, the code is
    if encoding is not None and \
       'b' not in mode:
        # Force opening of the file in binary mode
        mode = mode + 'b'
so the forcing only happens when an encoding is given. Since the intent is that codecs.open == open when no encoding is given, I believe the Note should be revised rather than the code.

(buffering=1) means line buffered. However, the doc for builtin open() says about buffering "1 to select line buffering (only usable in text mode)" So the default buffering is one that is not usable in the normal forced binary mode. Marc-Andre, can you explain this? (The doc for open() does not specify what happens when the buffering conflicts with the mode.)

The doc for StreamReader.readline() says ""size, if given, is passed as size argument to the stream’s readline() method.". If that were true, size would the max bytes to read. However, the docstring for the same in codecs.py says "size, if given, is passed as size argument to the read() method.", and that is what the code does. If not given, 72 is used as the default. (Why not 80?)

So, while the doc needs a minor tweak, I do not see what the OP's posted original result has to do with buffering. .readline does not have a fixed internal buffer of 72 chars that I can see. Rather, that is the default number of chars to read. So that is what it read, given that the file is longer than that.

I believe this is what Marc-Andre said, in different words, in his first post, in between the distraction of whether to remove open.

Santiago, yes, there is a difference between open.readline and codecs.readline. It will be more obvious when the codecs.readline size doc is corrected to specify that it is passed to read(), not readline(), and that it defaults to 72.
msg151164 - (view) Author: Terry J. Reedy (terry.reedy) * (Python committer) Date: 2012-01-13 09:16
What I described is the behavior of codecs.StreamReader. However, the streamreader associated with a particular encoding(codec) might do differently. My understanding is that StreamReader is an example that a particular codec can use, derive from, or merely mimic the interface of.
msg244354 - (view) Author: Martin Panter (martin.panter) * (Python committer) Date: 2015-05-28 23:48
A couple of specific problems have been raised by Terry here. Checking each against the current Python 3 status, some have already been fixed:

* The codecs.open() signature has been fixed in Issue 19548.

* The StreamReader.readline(size=...) parameter documentation has been fixed to match the docstring in Issue 18336.

So that leaves these three problems, as I see it:

1. The notice about opening in binary mode still needs fixing for encoding=None.

2. The buffering parameter is applied to the underlying builtins.open() call, so should be clarified in the documentation.

3. codecs.open(filename, encoding=...) will by default call builtins.open(filename, "rb", buffering=1), which makes no sense according the the documentation.
msg328099 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2018-10-20 00:22
New changeset a2670565d8f5c502388378aba1fe73023fd8c8d4 by Victor Stinner (Alexey Izbyshev) in branch 'master':
bpo-32236: open() emits RuntimeWarning if buffering=1 for binary mode (GH-4842)
https://github.com/python/cpython/commit/a2670565d8f5c502388378aba1fe73023fd8c8d4
History
Date User Action Args
2022-04-11 14:57:08adminsetgithub: 54553
2018-10-20 00:22:36vstinnersetnosy: + vstinner
messages: + msg328099
2017-12-13 16:33:05izbyshevsetkeywords: + patch
stage: needs patch -> patch review
pull_requests: + pull_request4731
2015-05-28 23:49:16pitrousetnosy: - pitrou
2015-05-28 23:48:59martin.pantersetnosy: + martin.panter
title: codecs.StreamReader.readline doc needs fix -> codecs.open() buffering doc needs fix
messages: + msg244354

stage: needs patch
2012-01-13 09:16:54terry.reedysetmessages: + msg151164
title: codecs.readline doesn't care buffering=0 -> codecs.StreamReader.readline doc needs fix
2012-01-13 09:08:04terry.reedysetmessages: + msg151163
versions: + Python 3.3, - Python 3.1
2010-11-12 17:59:50terry.reedysetnosy: + terry.reedy, docs@python
messages: + msg121052

assignee: docs@python
components: + Documentation, - Library (Lib)
2010-11-07 01:41:40Santiago.Piccininisetmessages: + msg120658
2010-11-07 01:07:52lemburgsetnosy: + lemburg
messages: + msg120656
2010-11-07 00:11:14amaury.forgeotdarcsetnosy: + amaury.forgeotdarc, pitrou
messages: + msg120653
2010-11-06 23:00:16Santiago.Piccininicreate