classification
Title: distutils dereferences symlinks for zip but not for bztar/gztar target
Type: behavior Stage:
Components: Distutils, Distutils2, Documentation Versions: Python 3.2, Python 3.3, Python 2.7, 3rd party
process
Status: open Resolution:
Dependencies: Superseder:
Assigned To: eric.araujo Nosy List: alexis, epu, eric.araujo, fberger, tarek
Priority: normal Keywords: easy

Created on 2011-07-19 14:13 by fberger, last changed 2013-11-19 16:18 by epu.

Messages (6)
msg140669 - (view) Author: Florian Berger (fberger) Date: 2011-07-19 14:13
When creating a source distribution, formats=zip will dereference symbolic links while formats=bztar,gztar will not.

Example:

$ ls -l
drwxr-xr-x 3 4096 19. Jul 15:44 dist
-rw-r--r-- 1   53 19. Jul 15:15 foo.py
-rw-r--r-- 1   42 19. Jul 15:39 MANIFEST
-rw-r--r-- 1   42 19. Jul 15:39 MANIFEST.in
-rw-r--r-- 1  167 19. Jul 15:29 setup.py
-rw-r--r-- 1    5 19. Jul 15:16 test.dat
lrwxrwxrwx 1    8 19. Jul 15:16 test.symlink.dat -> test.dat

$ cat setup.py 
from distutils.core import setup
setup(name = 'foo',
      version = '0.1.0',
      py_modules = ['foo'],
      data_files = [("", ["test.dat", "test.symlink.dat"])])

$ python setup.py sdist --formats=gztar,zip

dist/foo-0.1.0.tar.gz does preserve the symbolic link test.symlink.dat -> test.dat, while dist/foo-0.1.0.zip does not.

This can lead to unexpected behaviour when a symlink points to a file outside the source tree. In the .zip file everything will be fine, while the .tar.* file will contain a broken link.

Actual behaviour: storing of symbolic links depends on the target format.

Expected behaviour: storing of symbolic links should not depend on the target format, i.e. format switches should be transparent. Since the zipfile module apparently does not support symbolic links, symlinks should be dereferenced for formats=gztar,bztar using the dereference=True parameter of tarfile.TarFile() in archive_util.py.
msg140670 - (view) Author: Éric Araujo (eric.araujo) * (Python committer) Date: 2011-07-19 14:33
Thanks for such a good report.  Symlinks handling in distutils is under-specified; this question showed up a few months ago on the distutils-sig mailing list, with no good answer.

distutils is a special part of the standard library: as it spent a long time without dedicated maintainer, people used to rely on undocumented behavior and bugs, so when Tarek took over maintenance and started to improve and fix things, a lot of third-party code was broken.  That’s why it was decided to put distutils under a feature freeze, fixing only clear bugs, and moving efforts for new development and cleanups into the distutils2 fork (also called packaging in the Python 3.3 standard library).

Because of the fragility of distutils, we have to be careful when dealing with bug reports.  Our process is that a bug is a behavior that contradicts the documentation, otherwise it’s classified as a new feature.  For this report, I’ve found only two mentions of symlinks in the distutils docs, the first one in a support function (Doc/distutils/apiref.rst) and the second one in the docs about the MANIFEST file (Doc/distutils/sourcedist.rst).  So the only promise that the docs make is that MANIFEST entries that are symlinks are supported, but nothing is said about what will end up in the sdist.

I hope that this explanation will let you see why I’m reluctant to change distutils: we don’t know what code we will break if we improve symlink handling.  So, do you think adding a warning about symlink handling issues in the docs would be enough?

For distutils2 however, compatibility concerns do not apply yet, so we’re free to fix and document symlink handling.  If you would like to work on a patch, here are some guidelines: <http://wiki.python.org/moin/Distutils/Contributing>.  If you can’t, then thanks again for your report, which will be a good starting point.
msg140671 - (view) Author: Florian Berger (fberger) Date: 2011-07-19 14:53
Hi,

thanks for the reply. I see your point with the legacy distutils.

> I hope that this explanation will let you see why I’m reluctant to
> change distutils: we don’t know what code we will break if we improve
> symlink handling.  So, do you think adding a warning about symlink
> handling issues in the docs would be enough?

Given the constraints, yes, it would be good to have that warning in the docs. Even better would be a runtime hint like

Notice: gztar target will preserve symbolic links.

or

Notice: zip target will dereference symbolic links.

> For distutils2 however, compatibility concerns do not apply yet,
> so we’re free to fix and document symlink handling.

That would be very welcome. I am afraid I will not be able to contribute code anytime soon, but it would be great if the regular developers could keep an eye on this inconsistency.
msg144292 - (view) Author: Éric Araujo (eric.araujo) * (Python committer) Date: 2011-09-19 15:55
>> So, do you think adding a warning about symlink
>> handling issues in the docs would be enough?
> Given the constraints, yes, it would be good to have that warning in
> the docs.

Okay.  Adding the easy keyword to lure contributors.

> Even better would be a runtime hint like

I’m not a fan of the idea.  You have to keep warnings few if you want people to heed them.

>> For distutils2 however, compatibility concerns do not apply yet,
>> so we’re free to fix and document symlink handling.

Thinking again about that, I wonder if there is something to fix at all; tar is a smart container format whereas zip is simpler, so I would not be surprised if the 
source of the difference is just that zip cannot contain links.
msg144295 - (view) Author: Florian Berger (fberger) Date: 2011-09-19 16:15
> Okay.  Adding the easy keyword to lure contributors.

Thanks.

> I wonder if there is something to fix at all; tar is a smart container
> format whereas zip is simpler, so I would not be surprised if the
> source of the difference is just that zip cannot contain links.

In my mind, that is no excuse for inconsistent behaviour on part of the wrapper. If the *user* (i.e. package creator) had any choice of storing symlinks "as is" or the file linked, I would agree; but there is no option, no parameter, in fact not even a hint at the behaviour. On the contrary, the *wrapper* (i.e. distutils) does have a choice of derefering symlinks in a tar file or not.

So, from my point of view: surprises == bad, options/parameters == good, transparent consitency == best.

P.S. "Explicit is better than implicit" may also apply here. ;-)
msg203398 - (view) Author: Erik Purins (epu) Date: 2013-11-19 16:18
Note that the zipfile module does not include a dereference option, but tarfile does.

The following links to python examples show that users are writing zipfiles with symlinks, so it is possible to preserve them in a zip archive.

https://gist.github.com/kgn/610907
http://doeidoei.wordpress.com/2010/11/23/compressing-files-with-python-symlink-trouble/

Maybe the right start is to add a dereference option to zipfile module?
History
Date User Action Args
2013-11-19 16:18:58epusetnosy: + epu
messages: + msg203398
2011-09-19 16:15:14fbergersetmessages: + msg144295
2011-09-19 15:55:35eric.araujosetkeywords: + easy

messages: + msg144292
components: + Documentation
versions: + 3rd party
2011-07-19 14:53:07fbergersetmessages: + msg140671
2011-07-19 14:33:02eric.araujosetversions: + Python 2.7, Python 3.3, - Python 2.6
nosy: + alexis

messages: + msg140670

assignee: tarek -> eric.araujo
components: + Distutils2
2011-07-19 14:13:56fbergercreate