classification
Title: add filter to zipapp
Type: enhancement Stage: resolved
Components: Library (Lib) Versions: Python 3.7
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: paul.moore Nosy List: brett.cannon, irmen, paul.moore, serhiy.storchaka
Priority: normal Keywords: easy

Created on 2017-07-28 16:50 by irmen, last changed 2017-08-26 17:04 by paul.moore. This issue is now closed.

Pull Requests
URL Status Linked Edit
PR 3021 merged jeffreyrack, 2017-08-08 04:36
PR 3049 merged paul.moore, 2017-08-09 19:38
Messages (15)
msg299409 - (view) Author: Irmen de Jong (irmen) Date: 2017-07-28 16:50
As briefly discussed on comp.lang.python, I propose to add an optional filter callback function to zipapp.create_archive.


The function could perhaps work like the os.walk generator or maybe just lets you to return a simple boolean for every folder/file that it wants to include in the zip.

My use case is that I sometimes don't want to include every file in the root folder into the zip file (I want to be able to skip temporary or irrelevant folders such as .git/.svn, .tox, .tmp and sometimes want to avoid including *.pyc/*.pyo files).  Right now, I first have to manually clean up the folder before I can use zipapp.create_archive.


(Instead of providing a filter callback fuction, another approach may be to provide your own dir/file generator instead, that fully replaces the internal file listing logic of zipapp.create_archive?)
msg299410 - (view) Author: Paul Moore (paul.moore) * (Python committer) Date: 2017-07-28 17:06
I'd propose an extra argument to zipapp.create_archive, include_file=None (feel free to bikeshed on the name). If the argument is not None, then it should be a callable which will be called with a pathlib.Path object for each file that's selected for inclusion in the archive. The function should return a boolean - False means don't include this file.

Because the create_archive function only gets a list of files internally (it uses Path.rglob()), the callable won't get passed directories, only the actual files (but it can of course check the full path to see what directory the file is in).

The include_file argument is ignored when copying anything other than a filesystem directory (i.e., when the source argument is a filename or an open file object).
msg299417 - (view) Author: Irmen de Jong (irmen) Date: 2017-07-28 17:56
That sounds fine to me. I guess the paths passed to the function should be relative to the root folder being zipped?
msg299423 - (view) Author: Paul Moore (paul.moore) * (Python committer) Date: 2017-07-28 19:17
Yes, they can be.
msg300004 - (view) Author: Paul Moore (paul.moore) * (Python committer) Date: 2017-08-09 13:37
New changeset b811d664defed085d16951088afb579fb649c58d by Paul Moore (Jeffrey Rackauckas) in branch 'master':
bpo-31072: Add filter to zipapp (#3021)
https://github.com/python/cpython/commit/b811d664defed085d16951088afb579fb649c58d
msg300006 - (view) Author: Paul Moore (paul.moore) * (Python committer) Date: 2017-08-09 13:39
Thanks to Jeffrey Rackauckas for the implementation of this feature.
msg300025 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2017-08-09 16:28
Wouldn't be better to name the parameter filterfunc for conformity with PyZipFile?

I think the new feature needs at least the versionadded directive in the module documentation. And may be an entry in the What's New document.
msg300028 - (view) Author: Paul Moore (paul.moore) * (Python committer) Date: 2017-08-09 19:07
Good point - I wasn't even aware of the filterfunc argument in PyZipFile. I'll rename the argument.

I wasn't initially sure about a what's new entry. I'll add one - and thanks for the reminder about versionadded.
msg300031 - (view) Author: Paul Moore (paul.moore) * (Python committer) Date: 2017-08-09 19:40
I've created a new PR 3049 adding the fixes you suggested (and tightening up the tests, as I noticed an untested aspect of the change while editing).
msg300157 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2017-08-11 08:55
There are two differences between filterfunc arguments in zipapp and PyZipFile:

1. An argument of filterfunc is a string in PyZipFile and a Path object in zipapp.

2. The patch in zipapp is relative to the root of the archive. The patch in PyZipFile is a path on FS (absolute or relative to the current working directory).

I afraid that these differences can cause confusions.
msg300160 - (view) Author: Paul Moore (paul.moore) * (Python committer) Date: 2017-08-11 09:41
Zipapp uses path objects throughout, so making the filter function take a path object is consistent with that. I guess modifying PyZipFile to take either a string or a path object would be possible.

As for the relative path, that's deliberate as I expect that a common use case will be to exclude a directory, and doing that via

    lambda pth: pth.parts[0] != 'dir_to_exclude'

is a simple possibility. Having to do this with an absolute path would just require the user to make the path relative, and we've already done that in zipapp so why duplicate the work?

So I guess I'm saying I want to keep both those choices. Do you think this is a sufficient problem that we should *not* use the same name? Any suggestions on a better name (or should we stick with the original ``include_file``)?
msg300179 - (view) Author: Brett Cannon (brett.cannon) * (Python committer) Date: 2017-08-11 21:18
What about simply 'filter' as a name? or 'path_filter'?
msg300180 - (view) Author: Paul Moore (paul.moore) * (Python committer) Date: 2017-08-11 21:38
Sounds reasonable :-) I'm not going to be checking mails for a week or so, so I'll revisit this once I get back.
msg300870 - (view) Author: Paul Moore (paul.moore) * (Python committer) Date: 2017-08-26 11:27
OK. There's been no further comments, and I think the differences with PyZipFile's filterfunc are sufficient to warrant using a different name. I'm going to go with "filter". It's short, and says what it means.
msg300886 - (view) Author: Paul Moore (paul.moore) * (Python committer) Date: 2017-08-26 17:04
New changeset 0780bf7578dc4c9c3852dc5e869aba515a2c65b1 by Paul Moore in branch 'master':
bpo-31072: Rename the new filter argument for zipapp.create_archive. (#3049)
https://github.com/python/cpython/commit/0780bf7578dc4c9c3852dc5e869aba515a2c65b1
History
Date User Action Args
2017-08-26 17:04:14paul.mooresetmessages: + msg300886
2017-08-26 11:27:50paul.mooresetmessages: + msg300870
2017-08-11 21:38:45paul.mooresetmessages: + msg300180
2017-08-11 21:18:11brett.cannonsetnosy: + brett.cannon
messages: + msg300179
2017-08-11 09:41:42paul.mooresetmessages: + msg300160
2017-08-11 08:55:42serhiy.storchakasetmessages: + msg300157
2017-08-09 19:40:50paul.mooresetmessages: + msg300031
2017-08-09 19:38:22paul.mooresetpull_requests: + pull_request3084
2017-08-09 19:07:35paul.mooresetmessages: + msg300028
2017-08-09 16:28:14serhiy.storchakasetnosy: + serhiy.storchaka

messages: + msg300025
versions: + Python 3.7
2017-08-09 13:39:28paul.mooresetstatus: open -> closed
resolution: fixed
messages: + msg300006

stage: resolved
2017-08-09 13:37:22paul.mooresetmessages: + msg300004
2017-08-08 04:36:32jeffreyracksetpull_requests: + pull_request3054
2017-07-28 19:17:24paul.mooresetmessages: + msg299423
2017-07-28 17:56:50irmensetmessages: + msg299417
2017-07-28 17:06:29paul.mooresetmessages: + msg299410
2017-07-28 16:50:52irmencreate