This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: shutil: sort files before archiving for consistency
Type: enhancement Stage: resolved
Components: Distutils Versions: Python 3.3
process
Status: closed Resolution: wont fix
Dependencies: Superseder:
Assigned To: tarek Nosy List: dstufft, eric.araujo, iritkatriel, tarek, techtonik
Priority: low Keywords: patch

Created on 2010-06-03 21:13 by techtonik, last changed 2022-04-11 14:57 by admin. This issue is now closed.

Files
File name Uploaded Description Edit
sort_files_in_zip.26.patch techtonik, 2010-06-03 21:13 python 2.6 bugfix
sort_files_in_zip.27.patch techtonik, 2010-06-03 21:14 python 2.7 bugfix
Messages (16)
msg106983 - (view) Author: anatoly techtonik (techtonik) Date: 2010-06-03 21:13
I am troubleshooting local issue with distutils and UAC on Windows, and I need to compare resulting binary archives. Unfortunately files to bdist archives are added in random order and this complicates comparisons. This patch makes distutils archives more deterministic.
msg106984 - (view) Author: Éric Araujo (eric.araujo) * (Python committer) Date: 2010-06-03 21:32
This would be a new feature, so it can’t go into 2.6 unless I’m mistaken. It may even not go into 2.7.

Your renaming of z to zip does not add much value and shadows a builtin; I advise against doing that.
msg106986 - (view) Author: anatoly techtonik (techtonik) Date: 2010-06-03 21:40
On Fri, Jun 4, 2010 at 12:32 AM, Éric Araujo <report@bugs.python.org> wrote:
>
> This would be a new feature, so it can’t go into 2.6 unless I’m mistaken. It may even not go into 2.7.

It is not a feature, but a bugfix for wrong order of files in archive.
That means that on different filesystems you will get different
archives. I doubt that having files in random order is a feature.

> Your renaming of z to zip does not add much value and shadows a builtin; I advise against doing that.

I've just copy-pasted the same block from Python 2.7, so the bug with
shadowing is already there.
msg106989 - (view) Author: Éric Araujo (eric.araujo) * (Python committer) Date: 2010-06-03 22:05
> It is not a feature, but a bugfix for wrong order of files in archive.

That’s debatable. If the docs did advertise ordering and if it’s not a regression from an older version, it’s a new feature. I’m not saying I don’t like it, just clarifying Python’s process.

> I've just copy-pasted the same block from Python 2.7, so the bug with
> shadowing is already there.

I see in the log that Tarek has done this to respect PEP 8, which advocates using meaningful names. Not a big deal here.

FTR, this function has been moved to shutil, still with the zip shadowing and without the sorting.

I’ll shut up now and let Tarek and the release manager judge whether this is a bugfix or a new feature :)
msg106990 - (view) Author: Éric Araujo (eric.araujo) * (Python committer) Date: 2010-06-03 22:10
Sorry for writing when tired. Clearer first sentence: If it does not change the code to match the docs or to fix a regression from an older version, it’s a feature.
msg107020 - (view) Author: anatoly techtonik (techtonik) Date: 2010-06-04 09:47
On Fri, Jun 4, 2010 at 1:11 AM, Éric Araujo <report@bugs.python.org> wrote:
>
> Sorry for writing when tired. Clearer first sentence: If it does not change the code to match the docs or to fix a regression from an older version, it’s a feature.

This is the biggest problem with rigidness Python process. In this
specific case the patch doesn't make Python any more unstable and
according to policy it won't be integrated into Python 2.7 unless
release manager chooses otherwise. But! release manager is
overwhelmed, so it is VERY unlikely that he will include this patch,
because it is a distraction, and there are always more important stuff
to judge. In addition RM can be incompetent in this particular part of
Python dist and just couldn't take the risk of making random
decisions.

To resolve this bottleneck and help release managers make decisions,
community members should be able to vote on patches. Then release
managers could be able to make releases that satisfy more Python
users. In addition the part of this decision for particular component
of Python dist could be delegated to component maintainers preserving
RM's right to veto any opinion.
msg107021 - (view) Author: Tarek Ziadé (tarek) * (Python committer) Date: 2010-06-04 10:02
> community members should be able to vote on patches

*or* the core dev responsible for the development of the incriminated package, which is me for distutils. This is an improvement, not a feature, and this won't make it to 2.7.

While distutils is now frozen, I agree that we can add it in 3.2
msg107022 - (view) Author: Tarek Ziadé (tarek) * (Python committer) Date: 2010-06-04 10:12
by the way, I am not sure what you call a binary sirting of zip files (since two equivalent zip files can have different metadata) but if you mean comparing a unzip -l output, you could use zipinfo instead, to sort the output.

Overall, you need to compare the size and CRC of each file. I don't know if zipinfo does this.

Maybe this could be a feature in the zipfile module in python. a same_archive() function.
msg107023 - (view) Author: Éric Araujo (eric.araujo) * (Python committer) Date: 2010-06-04 11:21
“This is an improvement, not a feature”
I used the two terms with the same meaning :)

Do we add this to distutils in 3.2 and distutils2 too?
msg107030 - (view) Author: anatoly techtonik (techtonik) Date: 2010-06-04 11:50
> Tarek Ziadé <ziade.tarek@gmail.com> added the comment:
>
>> community members should be able to vote on patches
>
> *or* the core dev responsible for the development of the incriminated package, which is me for distutils.

This is said in the last part of the quoted msg107020:
"""In addition the part of this decision for particular component
of Python dist could be delegated to component maintainers preserving
RM's right to veto any opinion."""

> While distutils is now frozen, I agree that we can add it in 3.2

It would be nice if Python process could allow me to maintain my own
patched version of Python stdlibs so that I can use it instead of main
stdlib and quickly switch between them. It would be nice to be able to
share such patches and see in which versions (or forks) they were
integrated. I wonder if PSF license allows that?

> by the way, I am not sure what you call a binary sirting of zip files

I am not sure where did you see me mention that "binary sirting" too. =)

> (since two equivalent zip files can have different metadata) but if you mean comparing a unzip -l output, you could use zipinfo instead, to sort the output.

I use well-defined development toolchain for working with binary files
that can detect insignificant change in some kind of binary data like
timestamps in .zip archive, but comparing moving blocks is a disaster.
I need to analyze exact binary copies for troubleshooting issue8871
closely related to issue8870 to exclude any chance that binary .exe
generated by distutils on non-MS filesystem differs from the one
generated on MS FS. Even if it seems such a minor issue, believe me
that you do not want to meet any other minor issues when investigating
12 points checklist for some distutils bug that could be actually a
well-known MS problem, when the problem you need to solve is
misbehaving SCons installer that needs to install a couple of files in
somehow "seems to be protected" windows directories in Python
installation.

> Overall, you need to compare the size and CRC of each file. I don't know if zipinfo does this.
>
> Maybe this could be a feature in the zipfile module in python. a same_archive() function.

No. The archives should be generated consistently, but it is
impossible to create perfectly matching bdist_wininst archive anyway,
because timestamps will differ.

> FTR, this function has been moved to shutil, still with the zip shadowing and without the sorting.

Tarek, are you going to deal with shadowing?
msg107036 - (view) Author: Tarek Ziadé (tarek) * (Python committer) Date: 2010-06-04 12:03
"""
I use well-defined development toolchain for working with binary files
that can detect insignificant change in some kind of binary data like
timestamps in .zip archive, but comparing moving blocks is a disaster.
"""

Please explain us how you compare the content of two zip archives.
msg107040 - (view) Author: Éric Araujo (eric.araujo) * (Python committer) Date: 2010-06-04 12:10
> It would be nice if Python process could allow me to maintain my own
> patched version of Python stdlibs so that I can use it instead of main
> stdlib and quickly switch between them.
It’s free software, you have the right to copy, edit and release it.
As for the technical aspect of easy switching, editing sys.path seems the way to go, or use PYTHONPATH to give your custom stdlib modules to have precedence over the real stdlib. I’ll stop being off-topic now. :)

Tarek, seen my question about distutils2?
msg107047 - (view) Author: anatoly techtonik (techtonik) Date: 2010-06-04 13:23
> """
> I use well-defined development toolchain for working with binary files
> that can detect insignificant change in some kind of binary data like
> timestamps in .zip archive, but comparing moving blocks is a disaster.
> """
>
> Please explain us how you compare the content of two zip archives.

Here is the open source approach. I use soviet "swiss army knife" Far
Manager tool from http://www.farmanager.com/ that some years ago
became open sourced under revised BSD license. It can not compare
files itself, but allows you to switch forth and back between two
dumps of files in hex view with Ctrl-Tab / Ctrl-Shift-Tab shortcuts.
The comparison is done with standard windows command line tool "fc".
It is better to explain by example - I will list the key you need to
type and explanation below - keyboard shortcuts are in square
barckets. Right panel is C:\Downloads\python-wget\dist, left panel is
M:\, the cursor is placed on file wget-0.6.win32.force.exe that is
present in left and right panels and is the subject for comparison

1. fc /b [Ctrl-Enter] [Ctrl-]][Ctrl-Enter]
  this will give you the command line `fc /b wget-0.6.win32.force.exe
M:\wget-0.6.win32.force.exe`

2. [Ctrl-Home]edit:<
  this will give you the command line `edit:<fc /b
wget-0.6.win32.force.exe M:\wget-0.6.win32.force.exe`

3. [Enter]
  this will execute the output and open embedded editor with the
results. You will see hex offsets of differences

4. [Ctrl-Tab]
  your are back to file panels, but editor stays in a background -
notice the [0+1] marker in top left corner - it says that 0 viewers
and 1editor window are available.

5. F3
  you've opened current wget-0.6.win32.force.exe in embedded viewer

6. F4
  you've opened hex view for this file

7. [Ctrl-Shift-Tab]
  you're back at the fc output, copy the hex offset into clipboard
with [Ctrl-Ins]

8. [Ctrl-Tab]
  you're again at the hex view of the subject file

9. [Alt-F8][Shift-Ins][Enter]
  you're at the offset where difference start

10. [Ctrl-Tab]
  you're back at file panels

11. [Tab]
  switch to passive panel, repeat 5,6 and 9 for file from passive panel

12.
  now you can switch back and forth between differences in files with
[Ctrl-Tab]/[Ctrl-Shift-Tab]. You may want to switch to text view with
F4 for convenience.

I make all 12 steps in less than 20 seconds without using any plugins or macros.
msg107048 - (view) Author: Éric Araujo (eric.araujo) * (Python committer) Date: 2010-06-04 13:25
I think Takek asked more for a description of the diff algo (e.g.
“compare the CRC”, “compare all files”, etc.), not the UI of one tool.
msg107049 - (view) Author: anatoly techtonik (techtonik) Date: 2010-06-04 13:28
On Fri, Jun 4, 2010 at 3:10 PM, Éric Araujo <report@bugs.python.org> wrote:
>
>> It would be nice if Python process could allow me to maintain my own
>> patched version of Python stdlibs so that I can use it instead of main
>> stdlib and quickly switch between them.
> It’s free software, you have the right to copy, edit and release it.

Sound good, but in reality I still have to get back the rest of the quote:
"""I wonder if PSF license allows that?"""

> As for the technical aspect of easy switching, editing sys.path seems the way to go, or use PYTHONPATH to give your custom stdlib modules to have precedence over the real stdlib.

Good starting point. The sync with stdlib and patch management is a
bigger issue though.

> I’ll stop being off-topic now. :)

Yep. In Google Wave we could split it earlier and clean up this one. =)
msg415994 - (view) Author: Irit Katriel (iritkatriel) * (Python committer) Date: 2022-03-25 11:33
distutils is deprecated now, so there won't be any more enhancements to it.
History
Date User Action Args
2022-04-11 14:57:01adminsetgithub: 53137
2022-03-25 11:33:38iritkatrielsetnosy: + dstufft
components: + Distutils, - Library (Lib)
2022-03-25 11:33:25iritkatrielsetstatus: open -> closed

nosy: + iritkatriel
messages: + msg415994

resolution: wont fix
stage: needs patch -> resolved
2011-09-19 14:42:10eric.araujosettitle: sort files before archiving for consistency -> shutil: sort files before archiving for consistency
stage: test needed -> needs patch
components: + Library (Lib), - Distutils2
versions: + Python 3.3, - 3rd party
2010-11-26 04:44:42eric.araujosetnosy: techtonik, tarek, eric.araujo
versions: + 3rd party, - Python 3.2
type: enhancement
components: + Distutils2, - Distutils
stage: test needed
2010-06-04 13:28:07techtoniksetmessages: + msg107049
2010-06-04 13:25:51eric.araujosetmessages: + msg107048
2010-06-04 13:23:49techtoniksetmessages: + msg107047
2010-06-04 12:11:03eric.araujosetmessages: - msg107039
2010-06-04 12:10:10eric.araujosetmessages: + msg107040
2010-06-04 12:10:07eric.araujosetmessages: + msg107039
2010-06-04 12:03:46tareksetmessages: + msg107036
2010-06-04 11:50:55techtoniksetmessages: + msg107030
2010-06-04 11:21:34eric.araujosetmessages: + msg107023
2010-06-04 10:12:51tareksetmessages: + msg107022
2010-06-04 10:02:38tareksetpriority: normal -> low

messages: + msg107021
versions: - Python 2.7
2010-06-04 09:47:39techtoniksetmessages: + msg107020
2010-06-03 22:10:59eric.araujosetmessages: + msg106990
2010-06-03 22:05:51eric.araujosetmessages: + msg106989
2010-06-03 21:40:28techtoniksetmessages: + msg106986
2010-06-03 21:32:18eric.araujosetnosy: + eric.araujo

messages: + msg106984
versions: + Python 3.2, - Python 2.6
2010-06-03 21:14:04techtoniksetfiles: + sort_files_in_zip.27.patch
2010-06-03 21:13:35techtonikcreate