Author serhiy.storchaka
Recipients bmwiedemann, lars.gustaebel, rhettinger, serhiy.storchaka
Date 2017-06-18.17:21:59
SpamBayes Score -1.0
Marked as misclassified Yes
Message-id <1497806519.77.0.494594715182.issue30693@psf.upfronthosting.co.za>
In-reply-to
Content
The patch for similar issue with the glob module was rejected recently since it is easy to sort the result of glob.glob() (see issue30461). This issue looks similar, but there are differences. On one side, the command line tar utility doesn't have the option for sorting file names and seems don't sort them by default (I didn't checked). It is possible to use external sorting with the tarfile module as with the tar utility (generate the list of all files and directories, sort it, and pass every item to TarFile.add with the option recursive=False). But on other side, this is not so easy as for glob.glob(). And the overhead of the sorting is expected to be smaller than for glob.glob(). This may be considered as additional arguments for approving the patch.

If this approach will be approved, it should be applied also to the ZIP archives.

FYI the order of archived files can affect the compression ratio of the compressed tar archive. For example the 7-Zip archiver sorts files by extensions, this increases the chance that files of the same type (text, multimedia, spreadsheet, executables, etc) are grouped together and use the common dictionary for global compression. This isn't directly related to this issue, just a material for possible future enhancement.
History
Date User Action Args
2017-06-18 17:21:59serhiy.storchakasetrecipients: + serhiy.storchaka, rhettinger, lars.gustaebel, bmwiedemann
2017-06-18 17:21:59serhiy.storchakasetmessageid: <1497806519.77.0.494594715182.issue30693@psf.upfronthosting.co.za>
2017-06-18 17:21:59serhiy.storchakalinkissue30693 messages
2017-06-18 17:21:59serhiy.storchakacreate