classification
Title: tarfile's extractfile documentation is misleading
Type: Stage: patch review
Components: Documentation, Library (Lib) Versions: Python 3.9, Python 3.8, Python 3.7
process
Status: open Resolution:
Dependencies: Superseder:
Assigned To: docs@python Nosy List: docs@python, dorosch, ethan.furman, josh.r
Priority: normal Keywords: easy, newcomer friendly, patch

Created on 2020-02-20 07:12 by josh.r, last changed 2020-02-24 06:03 by dorosch.

Pull Requests
URL Status Linked Edit
PR 18639 open dorosch, 2020-02-24 06:03
Messages (1)
msg362298 - (view) Author: Josh Rosenberg (josh.r) * (Python triager) Date: 2020-02-20 07:12
The documentation for extractfile ( https://docs.python.org/3/library/tarfile.html#tarfile.TarFile.extractfile ) says:

"Extract a member from the archive as a file object. member may be a filename or a TarInfo object. If member is a regular file or a link, an io.BufferedReader object is returned. Otherwise, None is returned."

Before reading further, answer for yourself: What do you think happens when a provided filename doesn't exist, based on that documentation?

In teaching a Python class that uses tarfile in the final project, and expects students to catch predictable errors (e.g. a random tarball being provided, rather than one produced by a different mode of the program with specific expected files) and convert them to user-friendly error messages, I've found this documentation to confuse students repeatedly (if they actually read it, rather than just guessing and checking interactively).

Specifically, the documentation:

1. Says nothing about what happens if member doesn't exist (TarFile.getmember does mention KeyError, but extractfile doesn't describe itself in terms of getmember)
2. Loosely implies that it should return None in such a scenario "If member is a regular file or a link, an io.BufferedReader object is returned. Otherwise, None is returned." The intent is likely to mean "all other member types are None, and we're saying nothing about non-existent members", but everyone I've taught who has read the docs came away with a different impression until they tested it.

Perhaps just reword from:

"If member is a regular file or a link, an io.BufferedReader object is returned. Otherwise, None is returned."

to:

"If member is a regular file or a link, an io.BufferedReader object is returned. For all other existing members, None is returned. If member does not appear in the archive, KeyError is raised."

Similar adjustments may be needed for extract, and/or both of them could be adjusted to explicitly refer to getmember by stating that filenames are converted to TarInfo objects via getmember.
History
Date User Action Args
2020-02-24 06:03:28doroschsetkeywords: + patch
nosy: + dorosch

pull_requests: + pull_request18005
stage: patch review
2020-02-20 07:33:22xtreaksetnosy: + ethan.furman
2020-02-20 07:12:11josh.rcreate