classification
Title: relative symlinks in tarfile.extract broken (windows)
Type: behavior Stage: needs patch
Components: Library (Lib), Windows Versions: Python 3.10, Python 3.9, Python 3.8
process
Status: open Resolution:
Dependencies: 12926 Superseder:
Assigned To: lars.gustaebel Nosy List: Andreas.Gäer, Patrick.von.Reth, brian.curtin, eryksun, lars.gustaebel, paul.moore, steve.dower, tim.golden, zach.ware
Priority: normal Keywords:

Created on 2012-01-03 16:42 by Patrick.von.Reth, last changed 2020-05-30 17:07 by eryksun.

Messages (8)
msg150512 - (view) Author: Patrick von Reth (Patrick.von.Reth) Date: 2012-01-03 16:42
when extracting http://www.openssl.org/source/openssl-1.0.0d.tar.gz with python3.2 on windows 7 extraction fails with 

  File "C:\python32\lib\tarfile.py", line 2175, in extract
    set_attrs=set_attrs)
  File "C:\python32\lib\tarfile.py", line 2259, in _extract_member
    self.makelink(tarinfo, targetpath)
  File "C:\python32\lib\tarfile.py", line 2359, in makelink
    targetpath)
  File "C:\python32\lib\tarfile.py", line 2251, in _extract_member
    self.makefile(tarinfo, targetpath)
  File "C:\python32\lib\tarfile.py", line 2292, in makefile
    target = bltn_open(targetpath, "wb")
IOError: [Errno 22] Invalid argument: 'R:\\tmp\\os\\openssl-1.0.0d\\apps\\md4.c'

the reason is that the symlink is broken

R:\>dir R:\tmp\os\openssl-1.0.0d\apps\md4.c
 Volume in drive R has no label.
 Volume Serial Number is E8F0-7223
 Directory of R:\tmp\os\openssl-1.0.0d\apps
02.01.2012  20:13    <SYMLINK>      md4.c [../crypto/md4/md4.c]

it must be backslashes instead of front slashes and that's why python cant access the file the symlink is pointing to.
msg150671 - (view) Author: Lars Gustäbel (lars.gustaebel) * (Python committer) Date: 2012-01-05 16:47
You actually hit two bugs at the same time here: The target of the created symlink was not translated from unix to windows path delimiters and is therefore broken. The second bug is issue12926 which leads to the error in TarFile.makefile(). 

Brian, AFAIK all file-specific functions on windows accept forward slashes in pathnames, right? Has this been discussed in the course of the windows implementation of os.symlink()? I could certainly fix the slash translation in tarfile.py, but may be it's os.symlink() that should been fixed.
msg150672 - (view) Author: Patrick von Reth (Patrick.von.Reth) Date: 2012-01-05 17:21
to ignore the bug I also tried dereference=True, but it looks like python3 is ignoring it for extraction.
Is this the normal behavior or just another bug?
msg150673 - (view) Author: Lars Gustäbel (lars.gustaebel) * (Python committer) Date: 2012-01-05 17:24
The dereference option is only used for archive creation, so the contents of the file a symbolic link is pointing to is added instead of the symbolic link itself.
msg217989 - (view) Author: Andreas Gäer (Andreas.Gäer) Date: 2014-05-06 15:20
Is there any progress to the question if the problem should be fixed in os.symlink or in tarfile?

Because this currently seems to break installing source packages that contain symlinks with pip under Windows.

Try: "pip install networkx==1.8.1" for example
msg218029 - (view) Author: Eryk Sun (eryksun) * (Python triager) Date: 2014-05-06 23:52
This should be fixed in os.symlink. The Windows CreateSymbolicLink function can't be relied on to translate slash to backslash. It only normalizes an absolute link, or a path that's relative to the current working directory on a drive (e.g. "R:../crypto") since that's stored as an absolute link. 

For example:

    >>> os.symlink('C:/Program Files/Python34', 'Python34')
    >>> os.system('fsutil reparsepoint query Python34')
    Reparse Tag Value : 0xa000000c
    Tag value: Microsoft
    Tag value: Name Surrogate
    Tag value: Symbolic Link

    Reparse Data Length: 0x00000078
    Reparse Data:
    0000:  32 00 3a 00 00 00 32 00  00 00 00 00 43 00 3a 00  2.:...2.....C.:.
    0010:  2f 00 50 00 72 00 6f 00  67 00 72 00 61 00 6d 00  /.P.r.o.g.r.a.m.
    0020:  20 00 46 00 69 00 6c 00  65 00 73 00 2f 00 50 00   .F.i.l.e.s./.P.
    0030:  79 00 74 00 68 00 6f 00  6e 00 33 00 34 00 5c 00  y.t.h.o.n.3.4.\.
    0040:  3f 00 3f 00 5c 00 43 00  3a 00 5c 00 50 00 72 00  ?.?.\.C.:.\.P.r.
    0050:  6f 00 67 00 72 00 61 00  6d 00 20 00 46 00 69 00  o.g.r.a.m. .F.i.
    0060:  6c 00 65 00 73 00 5c 00  50 00 79 00 74 00 68 00  l.e.s.\.P.y.t.h.
    0070:  6f 00 6e 00 33 00 34 00                           o.n.3.4.

The print name uses forward slash, but the NT substitute name uses backslash. In this case, GetFinalPathNameByHandle works fine ("\??" is the NT DosDevices directory in which "C:" is a symbolic link to something like "\Device\HarddiskVolume1"):

    >>> print(os.path._getfinalpathname('Python34'))
    \\?\C:\Program Files\Python34

OTOH, forward slashes aren't translated in a relative link:

    >>> os.remove('Python34')
    >>> os.symlink('/Program Files/Python34', 'Python34')  
    >>> os.system('fsutil reparsepoint query Python34')
    Reparse Tag Value : 0xa000000c
    Tag value: Microsoft
    Tag value: Name Surrogate
    Tag value: Symbolic Link

    Reparse Data Length: 0x00000068
    Reparse Data:
    0000:  2e 00 2e 00 00 00 2e 00  01 00 00 00 2f 00 50 00  ............/.P.
    0010:  72 00 6f 00 67 00 72 00  61 00 6d 00 20 00 46 00  r.o.g.r.a.m. .F.
    0020:  69 00 6c 00 65 00 73 00  2f 00 50 00 79 00 74 00  i.l.e.s./.P.y.t.
    0030:  68 00 6f 00 6e 00 33 00  34 00 2f 00 50 00 72 00  h.o.n.3.4./.P.r.
    0040:  6f 00 67 00 72 00 61 00  6d 00 20 00 46 00 69 00  o.g.r.a.m. .F.i.
    0050:  6c 00 65 00 73 00 2f 00  50 00 79 00 74 00 68 00  l.e.s./.P.y.t.h.
    0060:  6f 00 6e 00 33 00 34 00                           o.n.3.4.

In this case GetFinalPathNameByHandle fails because the NT executive doesn't interpret forward slash as a path delimiter:

    >>> os.path._getfinalpathname('Python34')
    Traceback (most recent call last):
      File "<stdin>", line 1, in <module>
    OSError: [WinError 123] The filename, directory name, or volume label 
    syntax is incorrect: 'Python34'

I think this is a bug in CreateSymbolicLink, but os.symlink should work around it by first normalizing the target path to use os.sep.
msg218049 - (view) Author: Tim Golden (tim.golden) * (Python committer) Date: 2014-05-07 11:49
eryksun: could you essay a patch? I'd be happy to review & apply it.
msg370391 - (view) Author: Eryk Sun (eryksun) * (Python triager) Date: 2020-05-30 17:07
This is still a problem with WinAPI CreateSymbolicLinkW. It fails to replace slashes with backslashes in the substitute path if it's a relative path, which creates a broken link. As a workaround, os.symlink should replace slashes with backslashes in relative target paths. Except drive-relative targets such as "C:spam" can be ignored, since CreateSymbolicLinkW is forced to normalize them as fully-qualified paths.

Non-UNC rooted paths such as "/Program Files/Python38" are also relative paths. (ntpath.isabs incorrectly classifies them as absolute.) A relative target path gets resolved against the parsed, opened path of the symlink. For example, consider a symlink on a volume at r"Eggs\spam.txt" that targets r"\spam.txt". If the volume is mounted at "W:\\", then accessing r"W:\Eggs\spam.txt" resolves to r"W:\spam.txt". But if the volume is mounted at r"C:\Mount\Work", then accessing r"C:\Mount\Work\Eggs\spam.txt" resolves to r"C:\spam.txt".
History
Date User Action Args
2020-05-30 17:07:34eryksunsetversions: + Python 3.8, Python 3.9, Python 3.10, - Python 3.2, Python 3.3
nosy: + paul.moore, zach.ware, steve.dower

messages: + msg370391

components: + Library (Lib)
stage: test needed -> needs patch
2014-05-07 11:49:09tim.goldensetnosy: + tim.golden
messages: + msg218049
2014-05-06 23:52:18eryksunsetnosy: + eryksun
messages: + msg218029
2014-05-06 15:20:16Andreas.Gäersetnosy: + Andreas.Gäer
messages: + msg217989
2012-01-05 17:24:12lars.gustaebelsetmessages: + msg150673
2012-01-05 17:21:00Patrick.von.Rethsetmessages: + msg150672
2012-01-05 16:47:31lars.gustaebelsetdependencies: + tarfile tarinfo.extract*() broken with symlinks
messages: + msg150671
2012-01-04 06:14:34lars.gustaebelsetassignee: lars.gustaebel

nosy: + lars.gustaebel
versions: + Python 3.3
2012-01-04 04:32:57brian.curtinsetnosy: + brian.curtin

stage: test needed
2012-01-03 16:43:28Patrick.von.Rethsettitle: relative symlinks in tarfile.extract broken -> relative symlinks in tarfile.extract broken (windows)
2012-01-03 16:42:43Patrick.von.Rethcreate