classification
Title: Distutils filelist selects too many files on Windows
Type: Stage: resolved
Components: Distutils, Windows Versions: Python 3.3, Python 3.2, Python 2.7
process
Status: closed Resolution: duplicate
Dependencies: Superseder: Impossible to include file in sdist that starts with 'build' on Win32
View: 6884
Assigned To: eric.araujo Nosy List: eric.araujo, jason.coombs, nadeem.vawda, python-dev, tarek
Priority: normal Keywords: easy

Created on 2012-02-13 20:57 by jason.coombs, last changed 2012-03-18 19:40 by python-dev. This issue is now closed.

Files
File name Uploaded Description Edit
smime.p7s jason.coombs, 2012-02-15 15:34
Messages (14)
msg153302 - (view) Author: Jason R. Coombs (jason.coombs) * (Python committer) Date: 2012-02-13 20:57
When using a MANIFEST.in with only "include *.txt", on Windows, distutils grabs too many files. I set DISTUTILS_DEBUG=1 and ran ./setup.py sdist on the keyring library and it included this output:

    include *.txt
    include_pattern: applying regex r'^[^/]*\.txt\Z(?ms)'
     adding CHANGES.txt
     adding CONTRIBUTORS.txt
     adding .hg\last-message.txt
     adding keyring.egg-info\dependency_links.txt
     adding keyring.egg-info\requires.txt
     adding keyring.egg-info\SOURCES.txt
     adding keyring.egg-info\top_level.txt

As you can see, this is not a recursive include, but it's matching several files not in the supplied pattern (files in .hg/ and keyring.egg-info/).

It's obvious from the regular expression that the regex doesn't properly account for os.sep.
msg153341 - (view) Author: Éric Araujo (eric.araujo) * (Python committer) Date: 2012-02-14 15:38
This code just got changed (#13193) to use '/' instead of os.path.join, as it is documented that paths in MANIFEST.in must use only '/'.  Can you build an up-to-date Python 2.7 from Mercurial and check again?

Your report comes timely, as there was some doubt about what the correct fix was.  Even if your bug is now gone, I’ll want to add a regression test for it.
msg153371 - (view) Author: Nadeem Vawda (nadeem.vawda) * (Python committer) Date: 2012-02-14 21:32
I've been able to reproduce this on current builds of 2.7, 3.2 and 3.3.

The problem seems to be that distutils.filelist pathnames using the OS
directory separator, but the regexps used for matching paths are written
to always assume "/"-based paths.

I've been able to get the correct behaviour in this case by making the
module build paths using "/" regardless of the OS directory separator:

    --- a/Lib/distutils/filelist.py
    +++ b/Lib/distutils/filelist.py
    @@ -55,10 +55,10 @@

         def sort(self):
             # Not a strict lexical sort!
    -        sortable_files = sorted(map(os.path.split, self.files))
    +        sortable_files = sorted(f.split("/") for f in self.files)
             self.files = []
             for sort_tuple in sortable_files:
    -            self.files.append(os.path.join(*sort_tuple))
    +            self.files.append("/".join(sort_tuple))


         # -- Other miscellaneous utility methods ---------------------------
    @@ -258,7 +258,7 @@

             for name in names:
                 if dir != os.curdir:        # avoid the dreaded "./" syndrome
    -                fullname = os.path.join(dir, name)
    +                fullname = dir + "/" + name
                 else:
                     fullname = name

I'm not entirely comfortable with this fix, though - it seems like it
would be better to produce paths using the OS directory separator at the
end of the process. Maybe it's possible to add a translation step after
the file list is built and de-duplicated? I don't know much about how
the rest of distutils uses this module.

(Random aside: do we support any platforms that don't support "/" as a
directory separator (even if it isn't the preferred one)? Hmm...)
msg153378 - (view) Author: Jason R. Coombs (jason.coombs) * (Python committer) Date: 2012-02-15 00:04
The bug is indeed fixed in the latest 2.7 tip:

    PS C:\Users\jaraco\projects\public\keyring> C:\Users\jaraco\projects\public\cpython\PCbuild\amd64\python.exe setup.py sdist 2> NULL | findstr .hg
    (produces no output)
msg153379 - (view) Author: Jason R. Coombs (jason.coombs) * (Python committer) Date: 2012-02-15 00:06
If always using a posix path separator, I recommend using posixpath.join instead of hard-coding the path separator.
msg153391 - (view) Author: Nadeem Vawda (nadeem.vawda) * (Python committer) Date: 2012-02-15 07:24
> The bug is indeed fixed in the latest 2.7 tip:
>
>     PS C:\Users\jaraco\projects\public\keyring> C:\Users\jaraco\projects\public\cpython\PCbuild\amd64\python.exe setup.py sdist 2> NULL | findstr .hg
>    (produces no output)

I suspect you forgot to set DISTUTILS_DEBUG before running this -
otherwise that command should at least output a line like this:

    exclude_pattern: applying regex r'(^|/|\\)(RCS|CVS|\.svn|\.hg|\.git|\.bzr|_darcs)(/|\\).*'


> If always using a posix path separator, I recommend using posixpath.join instead of hard-coding the path separator.

Interesting. I had thought that posixpath and ntpath were each only
available on their respective platforms, but looking at the source,
I see that I was mistaken.
msg153403 - (view) Author: Jason R. Coombs (jason.coombs) * (Python committer) Date: 2012-02-15 15:34
It's not that I forgot to set DISTUTILS_DEBUG, I simply did not set it. If the 
bug were still present, I would have seen a line indicating that 
.hg/last-message.txt was being copied.

For completeness, here's the output with the DEBUG setting:

    PS C:\Users\jaraco\projects\public\keyring> $env:DISTUTILS_DEBUG=1
    PS C:\Users\jaraco\projects\public\keyring> 
..\cpython\PCbuild\amd64\python.exe setup.py sdist 2> NULL | findstr hg
     adding .hg\last-message.txt
    exclude_pattern: applying regex 
r'(^|/|\\)(RCS|CVS|\.svn|\.hg|\.git|\.bzr|_darcs)(/|\\).*'
     removing .hg\last-message.txt
msg153404 - (view) Author: Éric Araujo (eric.araujo) * (Python committer) Date: 2012-02-15 15:42
So it looks like that even if the exclusion of .hg removes .hg/last-message.txt, it should not have been added in the first place, as the command was include and not recursive-include.

At first glance, Nadeem’s proposed fix is not right: paths in MANIFEST.in use '/', but then filelist produces paths using os.sep, so that the MANIFEST file and other operations done by the sdist command use native paths.  So even though the currently supported OSes all accept '/', I think the right thing is to use os.sep.

(About posixpath: It is always available and can be used e.g. to manipulate URI paths.)
msg153428 - (view) Author: Nadeem Vawda (nadeem.vawda) * (Python committer) Date: 2012-02-15 19:26
> At first glance, Nadeem’s proposed fix is not right: paths in MANIFEST.in use '/', but then filelist produces paths using os.sep, so that the MANIFEST file and other operations done by the sdist command use native paths.  So even though the currently supported OSes all accept '/', I think the right thing is to use os.sep.

Yes, that sounds like a better solution. So the solution then is to fix
the regexps to use os.sep instead of "/" (and ensure that FileList uses
native paths throughout).

As an aside, it seems that the failing test from issue 13193 was
actually correct, and that the library itself was broken. I suppose all
of the tests will need to be changed to use native paths when this issue
is fixed.
msg153624 - (view) Author: Éric Araujo (eric.araujo) * (Python committer) Date: 2012-02-18 01:49
This regex bug was actually already known.  Please follow up on the other report.

> As an aside, it seems that the failing test from issue 13193 was
> actually correct, and that the library itself was broken.
Yes.  filelist operates in this way: it builds a list of all files in a tree (creating allfiles, which thus uses native path separators), then edits this list according to the MANIFEST.in commands (where the patterns must be transformed to a regex that uses native separators).  So the test was right in using os.path.join, and the regex was not right to always use '/'.
msg154258 - (view) Author: Roundup Robot (python-dev) Date: 2012-02-25 15:14
New changeset 47788c90f80b by Éric Araujo in branch '2.7':
Fix long-standing bugs with MANIFEST.in parsing on Windows (#6884).
http://hg.python.org/cpython/rev/47788c90f80b
msg154262 - (view) Author: Roundup Robot (python-dev) Date: 2012-02-25 15:32
New changeset 73aa4c9305b3 by Éric Araujo in branch '3.2':
Fix long-standing bugs with MANIFEST.in parsing on Windows (#6884).
http://hg.python.org/cpython/rev/73aa4c9305b3
msg154457 - (view) Author: Roundup Robot (python-dev) Date: 2012-02-27 11:29
New changeset 4d6a9f287543 by Éric Araujo in branch 'default':
Fix bugs with MANIFEST.in parsing on Windows (#6884).
http://hg.python.org/distutils2/rev/4d6a9f287543
msg156266 - (view) Author: Roundup Robot (python-dev) Date: 2012-03-18 19:40
New changeset edcdef70c44e by Éric Araujo in branch '3.2':
Fix long-standing bugs with MANIFEST.in parsing on Windows (#6884).
http://hg.python.org/cpython/rev/edcdef70c44e
History
Date User Action Args
2012-03-18 19:40:09python-devsetmessages: + msg156266
2012-02-27 11:29:46python-devsetmessages: + msg154457
2012-02-25 15:32:25python-devsetmessages: + msg154262
2012-02-25 15:14:01python-devsetnosy: + python-dev
messages: + msg154258
2012-02-18 01:49:16eric.araujosetstatus: open -> closed
resolution: duplicate
messages: + msg153624

superseder: Impossible to include file in sdist that starts with 'build' on Win32
stage: test needed -> resolved
2012-02-15 19:26:00nadeem.vawdasetmessages: + msg153428
2012-02-15 15:42:49eric.araujosetmessages: + msg153404
2012-02-15 15:34:59jason.coombssetfiles: + smime.p7s

messages: + msg153403
2012-02-15 07:24:10nadeem.vawdasetmessages: + msg153391
2012-02-15 00:06:10jason.coombssetmessages: + msg153379
2012-02-15 00:04:26jason.coombssetmessages: + msg153378
2012-02-14 21:32:31nadeem.vawdasetmessages: + msg153371
2012-02-14 15:38:12eric.araujosetversions: + Python 3.2, Python 3.3
messages: + msg153341

assignee: tarek -> eric.araujo
keywords: + easy
stage: test needed
2012-02-13 21:01:17nadeem.vawdasetnosy: + nadeem.vawda
2012-02-13 20:57:51jason.coombssetcomponents: + Windows
2012-02-13 20:57:07jason.coombscreate