Title: shutil.copyfile -- allow sparse copying
Type: enhancement Stage:
Components: Library (Lib) Versions: Python 3.2
Status: open Resolution:
Dependencies: Superseder:
Assigned To: tarek Nosy List: Samuel Shapiro, giampaolo.rodola, karaken12, pitrou, r.david.murray, tarek
Priority: normal Keywords: patch

Created on 2010-10-02 14:16 by karaken12, last changed 2018-06-12 10:21 by giampaolo.rodola.

File name Uploaded Description Edit
shutil-2.6.patch karaken12, 2010-10-02 14:16 Patch for shutil (from Python 2.6)
shutil-2.7.patch karaken12, 2010-10-02 14:17 Patch for shutil (from Python 2.7)
shutil-3.2.1.patch karaken12, 2010-10-02 14:18 Patch for shutil (from Python 3.2.1)
Messages (8)
msg117878 - (view) Author: Tom Potts (karaken12) Date: 2010-10-02 14:16
Copying a sparse file under Linux using shutil.copyfile will not result in a sparse file at the end of the process.  I'm submitting a patch that will remedy this.

Note that I am only concerned with Linux at the moment -- as far as I know this patch will not mess things up on other platforms, but this will need to be tested.  It depends on the behaviour of os.truncate() when the pointer is past the end of the file, which according to the docs is platform dependant.


P.S. This is my first time submitting an issue -- if there's anything I need to do and haven't, please let me know.
msg117879 - (view) Author: Tom Potts (karaken12) Date: 2010-10-02 14:17
(see opening message)
msg117881 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2010-10-02 15:21
You are right that this needs to be tested on other platforms.  In order to so test it (and in any case!), the patch will need unit tests.  It also needs doc updates.

In general patch itself looks good to me, modulo the concern you raise about truncate.  You could move the '\0'*buflen constant outside the loop.  Also, the py3k IO module doesn't define constants for 'seek', the docs just refer to the integers, so it might be best not to use the os constants even though they are equivalent (the new io module is not a wrapper around os functions the way the old file implementation was).

FYI, patches should (currently, pending the hg migration) be against the py3k trunk, and whoever commits it would backport it if appropriate.  In this case, however, it is a new feature and so can only go into py3k trunk.
msg117930 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2010-10-03 20:24
The use of fdst.truncate() is indeed wrong, since truncate() in 3.x is defined as truncating up to the current file position (which has been moved forward by the latest seek()).
msg117943 - (view) Author: Tom Potts (karaken12) Date: 2010-10-04 11:12
Hmm... the online docs and the contents of the doc directory on the trunk branch say differently:
Resize the stream to the given *size* in bytes (or the current position if *size* is not specified).  The current stream position isn't changed. This resizing can extend or reduce the current file size.  In case of extension, the contents of the new file area depend on the platform (on most systems, additional bytes are zero-filled, on Windows they're undetermined).  The new file size is returned.
Unless you know something else about this, I'm going to assume it's still okay to use.

Thanks for your comments -- I'm trying to put together some unit tests and documentation, against the Subversion trunk.

msg117944 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2010-10-04 11:57
Ok, after experimenting, I now understand what the truncate() call is for.

However, your heuristic for detecting sparse files is wrong. The unit for st_blocks is undefined as per the POSIX standard, although it gives recommendations:

“The unit for the st_blocks member of the stat structure is not defined within IEEE Std 1003.1-2001. In some implementations it is 512 bytes. It may differ on a file system basis. There is no correlation between values of the st_blocks and st_blksize, and the f_bsize (from <sys/statvfs.h>) structure members.

Traditionally, some implementations defined the multiplier for st_blocks in <sys/param.h> as the symbol DEV_BSIZE.”


Under Linux, 512 turns out to be the right multiplier (and not st_blksize):

>>> f = open("foo", "wb")
>>> f.write(b"x" * 4096)
>>> f.truncate(16384)
>>> f.close()
>>> st = os.stat("foo")
>>> st.st_size
>>> st.st_blocks
>>> st.st_blocks * st.st_blksize
>>> st.st_blocks * 512

Also, GNU `cp` uses S_BLKSIZE rather than DEV_BSIZE when trying to detect the st_blocks unit size (both are 512 under Linux).
msg117946 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2010-10-04 12:03
By the way:

> Thanks for your comments -- I'm trying to put together some unit tests > and documentation, against the Subversion trunk.

Please ignore trunk; all development (new features) should be done against branches/py3k.
msg282614 - (view) Author: Samuel Shapiro (Samuel Shapiro) Date: 2016-12-07 10:48
Patch fails on CentOS 6 -- python 2.6

[root@LG-E1A-LNX python2.6]# patch --dry-run -l -p1 -i shutil-2.6.patch
patching file
Hunk #1 succeeded at 22 (offset 1 line).
Hunk #2 succeeded at 52 with fuzz 1 (offset 1 line).
Hunk #3 FAILED at 61.
1 out of 3 hunks FAILED -- saving rejects to file
Date User Action Args
2018-06-12 10:21:52giampaolo.rodolasetnosy: + giampaolo.rodola
2016-12-07 10:48:26Samuel Shapirosetnosy: + Samuel Shapiro
messages: + msg282614
2010-10-04 12:03:53pitrousetmessages: + msg117946
2010-10-04 11:57:43pitrousetmessages: + msg117944
2010-10-04 11:12:48karaken12setmessages: + msg117943
2010-10-04 08:57:33tareksetassignee: tarek
2010-10-03 20:24:20pitrousetnosy: + pitrou
messages: + msg117930
2010-10-02 15:21:08r.david.murraysetnosy: + tarek, r.david.murray

messages: + msg117881
versions: - Python 2.6, Python 2.7
2010-10-02 14:18:07karaken12setfiles: + shutil-3.2.1.patch
2010-10-02 14:17:36karaken12setfiles: + shutil-2.7.patch

messages: + msg117879
2010-10-02 14:16:54karaken12create