This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: tarfile: add support for creating an archive of potentially changing files
Type: enhancement Stage: patch review
Components: Library (Lib) Versions: Python 3.11
process
Status: open Resolution:
Dependencies: Superseder:
Assigned To: Nosy List: ethan.furman, marko-tuononen
Priority: normal Keywords: patch

Created on 2021-08-12 11:43 by marko-tuononen, last changed 2022-04-11 14:59 by admin.

Files
File name Uploaded Description Edit
tarfile_ut.py marko-tuononen, 2021-10-29 13:43 Unit test case to reproduce the problem in question
Pull Requests
URL Status Linked Edit
PR 30402 closed marko-tuononen, 2022-01-04 14:21
PR 30426 open marko-tuononen, 2022-01-06 08:16
Messages (2)
msg399443 - (view) Author: Marko Tuononen (marko-tuononen) * Date: 2021-08-12 11:43
I have a use case where I need to create a tar archive from a collection of potentially changing files. I need to use system resources sparingly and because of that it is not possible to first make a copy of the files.

Current state of the tarfile library: Creating a tar archive is interrupted with an OSError "unexpected end of data" (example below), if any of the files changes when it is collected. Using the tarfile library in streaming mode does not work either. You might find this bug report relevant: https://bugs.python.org/issue26877

   File "/usr/lib64/python3.7/tarfile.py", line 1946, in add
     self.addfile(tarinfo, f)
   File "/usr/lib64/python3.7/tarfile.py", line 1974, in addfile
     copyfileobj(fileobj, self.fileobj, tarinfo.size, bufsize=bufsize)
   File "/usr/lib64/python3.7/tarfile.py", line 249, in copyfileobj
     raise exception("unexpected end of data")
   OSError: unexpected end of data

Target state of the tarfile library: Creating a tar archive is not interrupted even if a file changes while collected. The tarfile library's add() method would just return an exit value indicating that some files were changed while being archived. See e.g. how GNU tar handles similar situation: https://man7.org/linux/man-pages/man1/tar.1.html#RETURN_VALUE
msg405303 - (view) Author: Marko Tuononen (marko-tuononen) * Date: 2021-10-29 13:20
Please find attached an example how to reproduce the problem in question.

$ python3 -m unittest tarfile_ut.py
E
======================================================================
ERROR: test_stat (tarfile_ut.TestClass)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/usr/lib64/python3.6/unittest/mock.py", line 1183, in patched
    return func(*args, **keywargs)
  File "/var/work/mtuonone/tarfile_ut.py", line 39, in test_stat
    tar.add(TEMP_FILENAME)
  File "/usr/lib64/python3.6/tarfile.py", line 1952, in add
    self.addfile(tarinfo, f)
  File "/usr/lib64/python3.6/tarfile.py", line 1980, in addfile
    copyfileobj(fileobj, self.fileobj, tarinfo.size, bufsize=bufsize)
  File "/usr/lib64/python3.6/tarfile.py", line 257, in copyfileobj
    raise exception("unexpected end of data")
OSError: unexpected end of data

----------------------------------------------------------------------
Ran 1 test in 0.006s

FAILED (errors=1)
$
History
Date User Action Args
2022-04-11 14:59:48adminsetgithub: 89062
2022-01-06 08:16:54marko-tuononensetpull_requests: + pull_request28632
2022-01-04 14:21:46marko-tuononensetkeywords: + patch
stage: patch review
pull_requests: + pull_request28610
2021-10-29 13:44:06marko-tuononensetfiles: - tarfile_ut.py
2021-10-29 13:43:37marko-tuononensetfiles: + tarfile_ut.py
2021-10-29 13:20:50marko-tuononensetfiles: + tarfile_ut.py

messages: + msg405303
2021-08-18 00:13:16ethan.furmansetnosy: + ethan.furman

versions: + Python 3.11, - Python 3.7
2021-08-12 11:43:12marko-tuononencreate