Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

codecs.open() buffering doc needs fix #54553

Closed
SantiagoPiccinini mannequin opened this issue Nov 6, 2010 · 11 comments
Closed

codecs.open() buffering doc needs fix #54553

SantiagoPiccinini mannequin opened this issue Nov 6, 2010 · 11 comments
Labels
docs Documentation in the Doc dir type-bug An unexpected behavior, bug, or error

Comments

@SantiagoPiccinini
Copy link
Mannequin

SantiagoPiccinini mannequin commented Nov 6, 2010

BPO 10344
Nosy @malemburg, @terryjreedy, @amauryfa, @vstinner, @vadmium
PRs
  • bpo-32236: Issue RuntimeWarning if buffering=1 for open() in binary mode #4842
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = None
    closed_at = None
    created_at = <Date 2010-11-06.23:00:16.336>
    labels = ['type-bug', 'docs']
    title = 'codecs.open() buffering doc needs fix'
    updated_at = <Date 2018-10-20.00:22:36.886>
    user = 'https://bugs.python.org/SantiagoPiccinini'

    bugs.python.org fields:

    activity = <Date 2018-10-20.00:22:36.886>
    actor = 'vstinner'
    assignee = 'docs@python'
    closed = False
    closed_date = None
    closer = None
    components = ['Documentation']
    creation = <Date 2010-11-06.23:00:16.336>
    creator = 'Santiago.Piccinini'
    dependencies = []
    files = []
    hgrepos = []
    issue_num = 10344
    keywords = ['patch']
    message_count = 9.0
    messages = ['120652', '120653', '120656', '120658', '121052', '151163', '151164', '244354', '328099']
    nosy_count = 7.0
    nosy_names = ['lemburg', 'terry.reedy', 'amaury.forgeotdarc', 'vstinner', 'docs@python', 'Santiago.Piccinini', 'martin.panter']
    pr_nums = ['4842']
    priority = 'normal'
    resolution = None
    stage = 'patch review'
    status = 'open'
    superseder = None
    type = 'behavior'
    url = 'https://bugs.python.org/issue10344'
    versions = ['Python 2.7', 'Python 3.2', 'Python 3.3']

    @SantiagoPiccinini
    Copy link
    Mannequin Author

    SantiagoPiccinini mannequin commented Nov 6, 2010

    codecs.readline has an internal buffer of 72 chars so calling codecs.open with buffering=0 doesn't work as expected although buffering is passed to the underlying __builtin__.open call.

    Example session:

    Python 3.2a3+ (py3k, Nov  6 2010, 16:17:14) 
    [GCC 4.5.1] on linux2
    Type "help", "copyright", "credits" or "license" for more information.
    >>> import codecs
    >>> f = codecs.open("foo.txt", "w", "utf-8")
    >>> word = "bar\n"
    >>> content = word * 1000
    >>> f.write(content)
    >>> f.close()
    >>> f = codecs.open("foo.txt", "rb", "utf-8", buffering=0)
    >>> f.readline()
    'bar\n'
    >>> f.tell()
    72

    @SantiagoPiccinini SantiagoPiccinini mannequin added stdlib Python modules in the Lib dir type-bug An unexpected behavior, bug, or error labels Nov 6, 2010
    @amauryfa
    Copy link
    Member

    amauryfa commented Nov 7, 2010

    Antoine, should codecs.open() be removed or simply aliased to open()?

    @malemburg
    Copy link
    Member

    Amaury Forgeot d'Arc wrote:

    Amaury Forgeot d'Arc <amauryfa@gmail.com> added the comment:

    Antoine, should codecs.open() be removed or simply aliased to open()?

    Both is not possible: codecs.open() provides a different API than
    open(). Unlike open(), codecs.open() allow use of all available
    codecs, not just ones that decode to Unicode.

    Regarding the issue itself: I think this is a wrong interpretation of
    what the buffering parameter does. File buffering is different
    from .readline() buffering (which can be customized on a per-call
    basis by specifying a size parameter).

    Besides, switching buffering off in open() is only allowed for
    binary files, so open() wouldn't "solve" the mentioned behavior.

    The only way to implement "unbuffered" .readline() in the way
    that Santiago appears to be after would be to set the size parameter
    to 1 for all .readline() calls. That would result in very poor
    performance, though.

    I think we should close this issue as "won't fix".

    @SantiagoPiccinini
    Copy link
    Mannequin Author

    SantiagoPiccinini mannequin commented Nov 7, 2010

    Marc-Andre Lemburg wrote:

    Regarding the issue itself: I think this is a wrong interpretation of
    what the buffering parameter does. File buffering is different
    from .readline() buffering (which can be customized on a per-call
    basis by specifying a size parameter).

    Ok. But builtin's readline buffering works like (I) expected. So there is a difference in behavior between builtins readline an codecs.readline (and it bite me). ¿Maybe it should be noted in documentation?

    Python 3.2a3+ (py3k, Nov  6 2010, 16:17:14)
    [GCC 4.5.1] on linux2
    Type "help", "copyright", "credits" or "license" for more information.
    >>> f = open("foo.txt", "rb", buffering=0)
    >>> f.readline()
    b'bar\n'
    >>> f.tell()
    4

    @terryjreedy
    Copy link
    Member

    Please suggest a specific alteration in the codecs.readline doc that we can then discuss.

    @terryjreedy terryjreedy added docs Documentation in the Doc dir and removed stdlib Python modules in the Lib dir labels Nov 12, 2010
    @terryjreedy
    Copy link
    Member

    Something seems wrong somewhere. First,
    codecs.open(filename, mode[, encoding[, errors[, buffering]]])
    in the doc, should be, to match the code, in the current sytle
    codecs.open(filename, mode='rb', encoding=None, errors='strict', buffering=1)
    The other entries below follow this style.

    The Note says "Files are always opened in binary mode, even if no binary mode was specified.". However, the code is
    if encoding is not None and \
    'b' not in mode:
    # Force opening of the file in binary mode
    mode = mode + 'b'
    so the forcing only happens when an encoding is given. Since the intent is that codecs.open == open when no encoding is given, I believe the Note should be revised rather than the code.

    (buffering=1) means line buffered. However, the doc for builtin open() says about buffering "1 to select line buffering (only usable in text mode)" So the default buffering is one that is not usable in the normal forced binary mode. Marc-Andre, can you explain this? (The doc for open() does not specify what happens when the buffering conflicts with the mode.)

    The doc for StreamReader.readline() says ""size, if given, is passed as size argument to the stream’s readline() method.". If that were true, size would the max bytes to read. However, the docstring for the same in codecs.py says "size, if given, is passed as size argument to the read() method.", and that is what the code does. If not given, 72 is used as the default. (Why not 80?)

    So, while the doc needs a minor tweak, I do not see what the OP's posted original result has to do with buffering. .readline does not have a fixed internal buffer of 72 chars that I can see. Rather, that is the default number of chars to read. So that is what it read, given that the file is longer than that.

    I believe this is what Marc-Andre said, in different words, in his first post, in between the distraction of whether to remove open.

    Santiago, yes, there is a difference between open.readline and codecs.readline. It will be more obvious when the codecs.readline size doc is corrected to specify that it is passed to read(), not readline(), and that it defaults to 72.

    @terryjreedy
    Copy link
    Member

    What I described is the behavior of codecs.StreamReader. However, the streamreader associated with a particular encoding(codec) might do differently. My understanding is that StreamReader is an example that a particular codec can use, derive from, or merely mimic the interface of.

    @terryjreedy terryjreedy changed the title codecs.readline doesn't care buffering=0 codecs.StreamReader.readline doc needs fix Jan 13, 2012
    @vadmium
    Copy link
    Member

    vadmium commented May 28, 2015

    A couple of specific problems have been raised by Terry here. Checking each against the current Python 3 status, some have already been fixed:

    • The codecs.open() signature has been fixed in bpo-19548.

    • The StreamReader.readline(size=...) parameter documentation has been fixed to match the docstring in bpo-18336.

    So that leaves these three problems, as I see it:

    1. The notice about opening in binary mode still needs fixing for encoding=None.

    2. The buffering parameter is applied to the underlying builtins.open() call, so should be clarified in the documentation.

    3. codecs.open(filename, encoding=...) will by default call builtins.open(filename, "rb", buffering=1), which makes no sense according the the documentation.

    @vadmium vadmium changed the title codecs.StreamReader.readline doc needs fix codecs.open() buffering doc needs fix May 28, 2015
    @vstinner
    Copy link
    Member

    New changeset a267056 by Victor Stinner (Alexey Izbyshev) in branch 'master':
    bpo-32236: open() emits RuntimeWarning if buffering=1 for binary mode (GH-4842)
    a267056

    @furkanonder
    Copy link
    Sponsor Contributor

    @vstinner The issue seems to have been resolved. I think we can close the issue.

    @vstinner vstinner closed this as completed Jun 6, 2023
    @vstinner
    Copy link
    Member

    vstinner commented Jun 6, 2023

    Alright, I closed this old issue.

    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    docs Documentation in the Doc dir type-bug An unexpected behavior, bug, or error
    Projects
    Development

    No branches or pull requests

    6 participants