classification
Title: tarfile's extractfile documentation is misleading
Type: Stage: resolved
Components: Documentation, Library (Lib) Versions: Python 3.10, Python 3.9
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: docs@python Nosy List: akuchling, docs@python, dorosch, ethan.furman, josh.r, miss-islington
Priority: normal Keywords: easy, newcomer friendly, patch

Created on 2020-02-20 07:12 by josh.r, last changed 2020-10-20 14:32 by akuchling. This issue is now closed.

Pull Requests
URL Status Linked Edit
PR 18639 merged dorosch, 2020-02-24 06:03
PR 22819 merged miss-islington, 2020-10-20 14:05
Messages (4)
msg362298 - (view) Author: Josh Rosenberg (josh.r) * (Python triager) Date: 2020-02-20 07:12
The documentation for extractfile ( https://docs.python.org/3/library/tarfile.html#tarfile.TarFile.extractfile ) says:

"Extract a member from the archive as a file object. member may be a filename or a TarInfo object. If member is a regular file or a link, an io.BufferedReader object is returned. Otherwise, None is returned."

Before reading further, answer for yourself: What do you think happens when a provided filename doesn't exist, based on that documentation?

In teaching a Python class that uses tarfile in the final project, and expects students to catch predictable errors (e.g. a random tarball being provided, rather than one produced by a different mode of the program with specific expected files) and convert them to user-friendly error messages, I've found this documentation to confuse students repeatedly (if they actually read it, rather than just guessing and checking interactively).

Specifically, the documentation:

1. Says nothing about what happens if member doesn't exist (TarFile.getmember does mention KeyError, but extractfile doesn't describe itself in terms of getmember)
2. Loosely implies that it should return None in such a scenario "If member is a regular file or a link, an io.BufferedReader object is returned. Otherwise, None is returned." The intent is likely to mean "all other member types are None, and we're saying nothing about non-existent members", but everyone I've taught who has read the docs came away with a different impression until they tested it.

Perhaps just reword from:

"If member is a regular file or a link, an io.BufferedReader object is returned. Otherwise, None is returned."

to:

"If member is a regular file or a link, an io.BufferedReader object is returned. For all other existing members, None is returned. If member does not appear in the archive, KeyError is raised."

Similar adjustments may be needed for extract, and/or both of them could be adjusted to explicitly refer to getmember by stating that filenames are converted to TarInfo objects via getmember.
msg379126 - (view) Author: A.M. Kuchling (akuchling) * (Python committer) Date: 2020-10-20 14:05
New changeset ec42789e6e14f6b6ac13569aeadc13798d7173a8 by Andrey Doroschenko in branch 'master':
bpo-39693: mention KeyError in tarfile extractfile documentation (GH-18639)
https://github.com/python/cpython/commit/ec42789e6e14f6b6ac13569aeadc13798d7173a8
msg379129 - (view) Author: A.M. Kuchling (akuchling) * (Python committer) Date: 2020-10-20 14:31
New changeset b249aeae89f55b9da5cdff082ba271e2b15b7825 by Miss Skeleton (bot) in branch '3.9':
bpo-39693: mention KeyError in tarfile extractfile documentation (GH-18639) 
https://github.com/python/cpython/commit/b249aeae89f55b9da5cdff082ba271e2b15b7825
msg379130 - (view) Author: A.M. Kuchling (akuchling) * (Python committer) Date: 2020-10-20 14:32
Closing after applying the PR.  Thanks!
History
Date User Action Args
2020-10-20 14:32:18akuchlingsetstatus: open -> closed
versions: + Python 3.10, - Python 3.7, Python 3.8
messages: + msg379130

resolution: fixed
stage: patch review -> resolved
2020-10-20 14:31:22akuchlingsetmessages: + msg379129
2020-10-20 14:05:36miss-islingtonsetnosy: + miss-islington
pull_requests: + pull_request21775
2020-10-20 14:05:17akuchlingsetnosy: + akuchling
messages: + msg379126
2020-02-24 06:03:28doroschsetkeywords: + patch
nosy: + dorosch

pull_requests: + pull_request18005
stage: patch review
2020-02-20 07:33:22xtreaksetnosy: + ethan.furman
2020-02-20 07:12:11josh.rcreate