Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Clarify that readlines() is not needed to iterate over a file #57719

Closed
peterotten mannequin opened this issue Nov 30, 2011 · 17 comments
Closed

Clarify that readlines() is not needed to iterate over a file #57719

peterotten mannequin opened this issue Nov 30, 2011 · 17 comments
Assignees
Labels
docs Documentation in the Doc dir easy type-feature A feature request or enhancement

Comments

@peterotten
Copy link
Mannequin

peterotten mannequin commented Nov 30, 2011

BPO 13510
Nosy @terryjreedy, @pitrou, @ezio-melotti, @merwok, @meadori, @kushaldas, @ashwch
Files
  • demote-readlines.patch: Reorganize reading lines from a file based on comments.
  • demote-readlines-v2.patch: Incorporate Ezio's comment.
  • demote-readlines-v3.patch: Update patch to include changes to io.rst.
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = 'https://github.com/ezio-melotti'
    closed_at = <Date 2013-04-15.16:13:13.182>
    created_at = <Date 2011-11-30.17:42:01.866>
    labels = ['easy', 'type-feature', 'docs']
    title = 'Clarify that readlines() is not needed to iterate over a file'
    updated_at = <Date 2013-04-15.16:13:13.181>
    user = 'https://bugs.python.org/peterotten'

    bugs.python.org fields:

    activity = <Date 2013-04-15.16:13:13.181>
    actor = 'ezio.melotti'
    assignee = 'ezio.melotti'
    closed = True
    closed_date = <Date 2013-04-15.16:13:13.182>
    closer = 'ezio.melotti'
    components = ['Documentation']
    creation = <Date 2011-11-30.17:42:01.866>
    creator = 'peter.otten'
    dependencies = []
    files = ['29822', '29859', '29868']
    hgrepos = []
    issue_num = 13510
    keywords = ['patch', 'easy']
    message_count = 17.0
    messages = ['148679', '148693', '148703', '148704', '173730', '173761', '185488', '186818', '186827', '186936', '186937', '186940', '186965', '186966', '187002', '187005', '187006']
    nosy_count = 13.0
    nosy_names = ['terry.reedy', 'peter.otten', 'pitrou', 'ezio.melotti', 'eric.araujo', 'cvrebert', 'meador.inge', 'docs@python', 'python-dev', 'mikehoy', 'kushal.das', 'ashwch', 'dan.riti']
    pr_nums = []
    priority = 'normal'
    resolution = 'fixed'
    stage = 'resolved'
    status = 'closed'
    superseder = None
    type = 'enhancement'
    url = 'https://bugs.python.org/issue13510'
    versions = ['Python 2.7', 'Python 3.3', 'Python 3.4']

    @peterotten
    Copy link
    Mannequin Author

    peterotten mannequin commented Nov 30, 2011

    I've been looking at code on the tutor mailing list for some time, and

    for line in file.readlines(): ...

    is a common idiom there. I suppose this is because the readlines() method is easily discoverable while the proper way (iterate over the file object directly) is not.

    A note added to the readlines() documentation might help:

    """
    You don't need the readlines() method to loop over the lines of a file.
    for line in file: process(line)
    consumes less memory and is often faster.
    """

    @peterotten peterotten mannequin assigned docspython Nov 30, 2011
    @peterotten peterotten mannequin added docs Documentation in the Doc dir type-feature A feature request or enhancement labels Nov 30, 2011
    @terryjreedy
    Copy link
    Member

    This current line is
    "Read and return a list of lines from the stream. hint can be specified to control the number of lines read: no more lines will be read if the total size (in bytes/characters) of all lines so far exceeds hint."

    I would like to see added
    "Since a file is a line iterator, file.readlines() == list(file). To save time and space, iterate over lines of a file with for line in file: process(line)."
    (with code markup for the two snippets).

    @meadori
    Copy link
    Member

    meadori commented Dec 1, 2011

    I am skeptical that such a note will help. The iterator behavior is clearly pointed out in the Python Tutorial [1] and in the IOBase documentation [2]. I suspect this bad code pattern is just being copied and pasted from other sources without folks ever even looking at the Python documentation.

    [1] http://docs.python.org/dev/tutorial/inputoutput.html?highlight=readlines#methods-of-file-objects
    [2] http://docs.python.org/dev/library/io.html#io.IOBase

    @ezio-melotti
    Copy link
    Member

    FWIW I've seen several persons using "for line in file.readlines(): ..." or even "while 1: line = file.readline()". IMHO it's a good idea to document that "without sizehint, it's equivalent to list(file)" and that "for line in file: ..." can be used directly. Even if some people don't read the doc, the ones who do will benefit from this. The same note might also be added to the docstring (I think it's somewhat common to learn about readlines() through dir(file) + help(file.readlines)).

    @cvrebert
    Copy link
    Mannequin

    cvrebert mannequin commented Oct 25, 2012

    file.readlines() (and perhaps dare I say even file.readline()) should not even be mentioned in the tutorial, IMO. It is difficult to imagine a use case where just iterating over the file object isn't superior. I cannot remember the last time that I have used either of these methods. They ought to be relegated to the library docs. Presenting `for line in a_file:` as merely "An alternative approach" in the official tutorial is practically archaic.

    @merwok
    Copy link
    Member

    merwok commented Oct 25, 2012

    readlines might be discouraged; readline has its use cases (two days ago I used it to get one line to pass to csv.Sniffer.sniff), but next(file) works too, so it could be de-emphasized. There may be a difference with respect to the trailing newline however; I don’t remember if __iter__ or readline keep it or not.

    @kushaldas
    Copy link
    Member

    Working on a patch for this.

    @danriti
    Copy link
    Mannequin

    danriti mannequin commented Apr 13, 2013

    After reading the comments, I generated a patch that does the following:

    • Reorganize to present `for line in f:` as the first approach for reading lines. I refrained from saying it's the *preferred* approach, however I can add that if desired.
    • Reorganize to present `f.readlines()` as the alternative approach.

    Any feedback is more then welcome! Thanks.

    @pitrou
    Copy link
    Member

    pitrou commented Apr 13, 2013

    In 3.x I think we'll want to drop the following sentence: "Since the two approaches manage line buffering differently, they should not be mixed". But it can wait for another issue.

    @ezio-melotti
    Copy link
    Member

    I would actually remove the whole section about readlines() or possibly just mention it briefly (something like "If you want to read all the lines of a file in a list you can also use f.readlines().")
    The sizehint arg is rarely used, so I don't see the point of going in such details about it in the tutorial. In Lib/, there are only a couple of places where it's actually used:
    Lib/fileinput.py:358: self._buffer = self._file.readlines(self._bufsize)
    Lib/idlelib/GrepDialog.py:90: block = f.readlines(100000)

    @pitrou
    Copy link
    Member

    pitrou commented Apr 14, 2013

    I would actually remove the whole section about readlines() or
    possibly just mention it briefly (something like "If you want to read
    all the lines of a file in a list you can also use f.readlines().")
    The sizehint arg is rarely used, so I don't see the point of going in
    such details about it in the tutorial.

    You are right, Ezio.

    @terryjreedy
    Copy link
    Member

    I agree with removing .readlines section. If .readlines did not exist, I do not think we would add it now.

    @danriti
    Copy link
    Mannequin

    danriti mannequin commented Apr 15, 2013

    Added a new version of the patch to incorporate Ezio's comment!

    @ezio-melotti
    Copy link
    Member

    Patch LGTM. I think it would also be better to say something like "Note that it's already possible to iterate on file objects using for line in file: ... without calling file.readlines()." in Doc/library/io.rst:readlines, as suggested in msg148703.

    @danriti
    Copy link
    Mannequin

    danriti mannequin commented Apr 15, 2013

    Agreed Ezio, I've updated the patch to include the change to Doc/library/io.rst:readlines.

    @python-dev
    Copy link
    Mannequin

    python-dev mannequin commented Apr 15, 2013

    New changeset 1e8be05a4039 by Ezio Melotti in branch '3.3':
    bpo-13510: clarify that f.readlines() is note necessary to iterate over a file. Patch by Dan Riti.
    http://hg.python.org/cpython/rev/1e8be05a4039

    New changeset 7f4325dc4256 by Ezio Melotti in branch 'default':
    bpo-13510: merge with 3.3.
    http://hg.python.org/cpython/rev/7f4325dc4256

    New changeset 6a4746b0afaf by Ezio Melotti in branch '2.7':
    bpo-13510: clarify that f.readlines() is note necessary to iterate over a file. Patch by Dan Riti.
    http://hg.python.org/cpython/rev/6a4746b0afaf

    @ezio-melotti
    Copy link
    Member

    Fixed, thanks for the patch!
    I also realized I missed Terry suggestion about file.readlines() == list(file), so I added that too.

    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    docs Documentation in the Doc dir easy type-feature A feature request or enhancement
    Projects
    None yet
    Development

    No branches or pull requests

    6 participants