Author ncoghlan
Recipients ncoghlan
Date 2017-06-14.01:25:38
SpamBayes Score -1.0
Marked as misclassified Yes
Message-id <1497403539.2.0.760105950979.issue30661@psf.upfronthosting.co.za>
In-reply-to
Content
shutil.make_archive currently just uses the default tar format, which is GNU_FORMAT.

This format doesn't ensure that all character paths are encoded as UTF-8, and hence may end up embedding platform specific encoding assumptions into the generated tarball.

I see a few possible ways of resolving this:

1. Change the default tar format to PAX_FORMAT. It's been 16 years since that was defined, and Python itself has supported it since 2.6 was released in 2008, so perhaps we can rely on other tools supporting it now? (My main open question on that front would be "What happens if you specify "format=GNU_FORMAT" when attempting to read a PAX formatted archive?)

2. Add new shutil level "pax", "gzpax", "bzpax", "xzpax" format definitions to explicitly request PAX_FORMAT

3. Add a mechanism to shutil.make_archive that allows format-dependent settings to be based down to the underlying archive creation functions (e.g. "format=tarfile.PAX_FORMAT").
History
Date User Action Args
2017-06-14 01:25:39ncoghlansetrecipients: + ncoghlan
2017-06-14 01:25:39ncoghlansetmessageid: <1497403539.2.0.760105950979.issue30661@psf.upfronthosting.co.za>
2017-06-14 01:25:39ncoghlanlinkissue30661 messages
2017-06-14 01:25:38ncoghlancreate