classification
Title: Type: listdir() doesn't work with non-trivial symlinks behavior resolved Library (Lib), Windows Python 3.2, Python 3.3
process
Status: Resolution: closed fixed amaury.forgeotdarc, brian.curtin, jaraco, pitrou, python-dev, santoso.wijaya, tim.golden, vstinner normal patch

Created on 2012-01-11 20:01 by pitrou, last changed 2013-06-05 00:08 by python-dev. This issue is now closed.

Files
8bf88b31ebb7.diff jaraco, 2013-02-26 13:10
8bf88b31ebb7.diff jaraco, 2013-02-26 13:24 review
b744517b90bc.diff jaraco, 2013-02-26 13:51 review
Repositories containing patches
http://bitbucket.org/jaraco/cpython-issue13772#3.2
Messages (26)
msg151087 - (view) Author: Antoine Pitrou (pitrou) * Date: 2012-01-11 20:01
Note how _getfinalpathname works and calling listdir on the final path name also works:

>>> os.listdir("Lib\\bar")[:4]
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
NotADirectoryError: [Error 267] The directory name is invalid: 'Lib\\bar\\*.*'

>>> nt._getfinalpathname("Lib\\bar")
'\\\\?\\C:\\t\\pathlib\\Lib\\test'
>>> os.listdir(nt._getfinalpathname("Lib\\bar"))[:4]

Calling listdir on the non-final extended path doesn't work:

>>> os.listdir('\\\\?\\C:\\t\\pathlib\\Lib\\bar')[:4]
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
NotADirectoryError: [Error 267] The directory name is invalid: '\\\\?\\C:\\t\\pa
thlib\\Lib\\bar\\*.*'

But open() works:

>>> open('Lib\\bar\\regrtest.py')
<_io.TextIOWrapper name='Lib\\bar\\regrtest.py' mode='r' encoding='cp1252'>
msg151103 - (view) Author: Santoso Wijaya (santoso.wijaya) * Date: 2012-01-12 01:11
I think this is because "Lib\\bar" is NOT being created as a directory symlink, but rather as a regular one.

As such, the documentation for symlink where it states the optional target_is_directory=False argument should be automatically detect whether the source is a file or directory does not hold true.
msg151105 - (view) Author: Santoso Wijaya (santoso.wijaya) * Date: 2012-01-12 01:13
Confirmed (on latest py33 build):

>>> os.listdir('Lib\\bar')[:4]
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
NotADirectoryError: [Error 267] The directory name is invalid: 'Lib\\bar\\*.*'
[61658 refs]

... after manually deleting the file-symlink bar ...

[61658 refs]
>>> os.listdir('Lib\\bar')[:4]
[61666 refs]
msg151106 - (view) Author: Antoine Pitrou (pitrou) * Date: 2012-01-12 01:43
> I think this is because "Lib\\bar" is NOT being created as a directory
> symlink, but rather as a regular one.

Ha! I didn't even know about that option. Thanks for noticing.

> As such, the documentation for symlink where it states the optional
> target_is_directory=False argument should be automatically detect
> whether the source is a file or directory does not hold true.

I don't know if auto-detection is a good idea. Of a course, from a Unix
user's perspective, Windows' behaviour doesn't make a lot of sense.
Especially when functions other than listdir() work fine anyway.
msg151114 - (view) Author: Santoso Wijaya (santoso.wijaya) * Date: 2012-01-12 06:42
I agree that it's probably not a good idea. Windows users should be aware of symlink limitation on the platform to begin with, IMHO.

Besides, keeping the detection behaviour in light of this defect would be adding unnecessary complexity to the code. Currently, the src argument (=".\\test") is simply given to a Windows function to see if it is a directory, from the currently working directory.

To make it aware of its path relativity, in the context of the target, is much harder than just changing the documentation. ;-)
msg151115 - (view) Author: Antoine Pitrou (pitrou) * Date: 2012-01-12 06:42
> > As such, the documentation for symlink where it states the optional
> > target_is_directory=False argument should be automatically detect
> > whether the source is a file or directory does not hold true.
>
> I don't know if auto-detection is a good idea. Of a course, from a Unix
> user's perspective, Windows' behaviour doesn't make a lot of sense.
> Especially when functions other than listdir() work fine anyway.

*already* tries to autodetect the target's file type, but the detection
is broken with relative symlinks: it doesn't try to join them with the
link's directory and so it thinks the target doesn't exist.
msg151785 - (view) Author: Antoine Pitrou (pitrou) * Date: 2012-01-22 18:14
Another issue with the detection scheme is that it's not atomic: there can be a race condition between the GetFileAttributesExW() and CreateSymbolicLinkW() calls.

I propose applying the following patch to 3.2 and 3.3 (+ doc fix, not included in the patch).
msg151841 - (view) Author: Brian Curtin (brian.curtin) * Date: 2012-01-23 19:49
Looks good to me.
msg151884 - (view) Author: Roundup Robot (python-dev) Date: 2012-01-24 08:13
New changeset 839fa289e226 by Antoine Pitrou in branch '3.2':
Issue #13772: In os.symlink() under Windows, do not try to guess the link
http://hg.python.org/cpython/rev/839fa289e226

New changeset a7406565ef1c by Antoine Pitrou in branch 'default':
Issue #13772: In os.symlink() under Windows, do not try to guess the link
http://hg.python.org/cpython/rev/a7406565ef1c
msg151885 - (view) Author: Antoine Pitrou (pitrou) * Date: 2012-01-24 08:14
Should be fixed now!
msg166500 - (view) Author: Jason R. Coombs (jaraco) * Date: 2012-07-26 19:12
I apologize I missed this issue when it arose, so my comments come late.

The reason for inferring the directory status of the targets was so that use of os.symlink(link, target) would behave much the same on Unix as on Windows for the most common use cases. By requiring the directory status to be supplied when calling os.symlink, it makes the function barely portable (it becomes only portable for symlinks to files, not directories).

As was indicated by Larry Hastings in issue14917, there's an expectation that Python could easily detect the directory status of the target. There was apparently a bug in the earlier detection code for directories, which doesn't exist in the [reference implementation](https://bitbucket.org/jaraco/jaraco.windows/src/2.7/jaraco/windows/filesystem/__init__.py#cl-46) (done in ctypes):

PS C:\Users\jaraco\projects\jaraco.windows> mkdir -p Lib/test/foo
PS C:\Users\jaraco\projects\jaraco.windows> python
Python 2.7.3 (default, Apr 10 2012, 23:24:47) [MSC v.1500 64 bit (AMD64)] on win32
>>> from jaraco.windows import filesystem as fs
>>> import os
>>> os.listdir('Lib/bar')
['foo']

I suspect this bug crept in as we worked through the various challenges with directory detection at that low level of Python. As you can see, the reference implementation is clean an straightforward and correct.

In my opinion, it would be worthwhile restoring the directory detection and creating tests to capture cases where detection fails, rather than removing the implementation.

I've been using the ctypes implementation for some time, and having automatic directory detection is a huge benefit to portability and simplicity of use.

It was pointed out that there is a race condition, and theoretically, that is true, but I believe this not to be a problem because the automatic detection is a best-effort. It's used to guess the best possible directory status for the symlink. If there's a directory there one millisecond, then the target is then removed, and the symlink is created as a directory symlink, that's most probably still what the user would have wanted.

As a heavy user of symlinks in Windows, I would strongly prefer inference of directory status. Of course, I can always continue to use the ctypes implementation in my environments, but I imagine that other potential users would feel the same as I do and would appreciate a more portable implementation.

Without the directory feature, many uses of os.symlink are not portable and will fail (with ugly results) on Windows. With the directory detection feature, most (if not all) will simply work as expected. Removal of this feature is a regression of the intended functionality, and I'd like it to be considered to be restored.
msg166529 - (view) Author: Antoine Pitrou (pitrou) * Date: 2012-07-26 23:34
Le jeudi 26 juillet 2012 à 19:12 +0000, Jason R. Coombs a écrit :
> Without the directory feature, many uses of os.symlink are not
> portable and will fail (with ugly results) on Windows.

The target_is_directory argument is supposed to be "supported" (i.e.
ignored) on other platforms, so you should be able to write portable
code easily.

I'm not against restoring detection, but it should be non-buggy and
correctly unit-tested :) That said it's probably too late for 3.3.
msg182523 - (view) Author: Jason R. Coombs (jaraco) * Date: 2013-02-20 15:47
I've started work on restoring the directory detection in my bitbucket repo (https://bitbucket.org/jaraco/cpython-issue13772). I have added a regression test to capture the basic failure (where the link is not created in the current working directory) as well as a fix. The fix uses the [Microsoft Shell Lightweight Utility Functions](http://msdn.microsoft.com/en-us/library/bb759844%28v=vs.85%29.aspx) which has one benefit of being tested and robust, but has two disadvantages, namely:

- it doesn't support forward-slashes, which the reference implementation does, and which I believe the CPython implementation should.

Interestingly, it does detect the '/' character as a separator - it just doesn't treat it as one when stripping the trailing path.

Given these disadvantages, I'm inclined to write custom functions to support the directory detection. Any suggestions are appreciated.
msg183054 - (view) Author: Jason R. Coombs (jaraco) * Date: 2013-02-26 13:24
I tried the create-patch function, but it didn't work against bitbucket (produced an empty diff). So I'm attaching a diff explicitly.
msg183057 - (view) Author: Jason R. Coombs (jaraco) * Date: 2013-02-26 13:51
Please disregard the patch 8bf88b31ebb7.

Antoine, Brian, or anyone else - would you please review the attached patch (b744517b90bc.diff)? It adds a test for creating directory symlinks that are non-local (don't depend on the current directory for reference), backs out Antoine's changes, and replaces them with directory detection that fixes this issue while maintaining directory detection.

I'm not super-confident about the implementation of the path-manipulation functions, and I would prefer to use the Python implementations of the path-manipulation (dirname and join) instead. If there are any suggestions in this regard, I'd appreciate them.

That said, the tests now pass for me on Windows (and I don't see why the tests wouldn't also pass on Linux).
msg183059 - (view) Author: Antoine Pitrou (pitrou) * Date: 2013-02-26 14:11
> I'm not super-confident about the implementation of the
> path-manipulation functions, and I would prefer to use the Python
> implementations of the path-manipulation (dirname and join) instead.
> If there are any suggestions in this regard, I'd appreciate them.

From an implementation standpoint, I would indeed prefer the path handling
functions to be written in Python.

From a principle standpoint, I'm not sure it's a good idea for os.symlink()
to be non-atomic (there's a small race condition between reading the target's
attributes and creating the actual symlink).

Also, since in general you always know whether you're making a link to a
directory or a file, I'm not sure auto-detection is really a plus (except
that it makes things more familiar for Unix developers).
msg183062 - (view) Author: Jason R. Coombs (jaraco) * Date: 2013-02-26 14:57
Thanks for the suggestions.

> Antoine Pitrou added the comment:
>
> From an implementation standpoint, I would indeed prefer the path handling
> functions to be written in Python.

Is there an example of how a function like win32_symlink could call into ntpath.dirname()? I believe the reason we did not invoke ntpath was due to a possible race condition where the Python interpreter might not be fully started before os.symlink is invoked. Perhaps there are mitigating factors, but I'm not familiar enough with the Python internals to know what's safe and what is not (or even how to invoke that behavior).

> From a principle standpoint, I'm not sure it's a good idea for os.symlink()
> to be non-atomic (there's a small race condition between reading the target's
> attributes and creating the actual symlink).

I agree there exists a small possibility of a race condition here. I believe this race condition is of little concern to the vast majority of use-cases. For those use cases where a race condition might be a factor, the target_is_directory parameter may be supplied by that code to overcome the condition.

Furthermore, since the Windows APIs don't provide a way to match the symlink creation with its target, it's not possible to avoid this race condition.

> Also, since in general you always know whether you're making a link to a
> directory or a file, I'm not sure auto-detection is really a plus (except that it
> makes things more familiar for Unix developers).

For the vast majority of use cases, target detection will behave much like symlinks do in Unix, which allows os.symlink to be used portably without specifically accounting for Windows behavior. I'm not as concerned about Windows programmers using Python (who might not mind specifying the directory-ness of their targets), but Python programs written by Unix developers running on Windows. In the latter use-case, it's much harder to have those programs adopt an additional parameter to support Windows. I could imagine some library maintainers rejecting a patch that includes target_is_directory just because it's not elegant and seems unnecessary from a Unix programmer's perspective. For that use case and in order to support the general expectation that os.symlink should provide the most uniform behavior it can, the directory detection is highly valuable.

Indeed, both your initial usage of the function and Larry Hastings' initial reaction betray the expectation that os.symlink should create a directory-type symlink when the target is a directory. I don't see how it's not a huge plus to support a developer's expectations when possible.

Personally, I would prefer to not have the parameter at all. Unfortunately, to support cases where one might need to create directory symlinks to non-existent targets, the parameter is necessary. However, I'd like to limit its required usage to only the cases where it's necessary.

> I'm not against restoring detection, but it should be non-buggy and
> correctly unit-tested :)

I'd like to focus on restoring detection, which was vetted and approved in the original implementation. Obviously, I can't guarantee that it doesn't have additional bugs. Provably-correct code is expensive and way outside the scope of this effort. As you could see from your patch, there were tests capturing the desired behavior. That test has been restored. Additionally, this patch includes a new test to capture the case of a non-local link (a behavior that was never tested for Unix symlinks).
msg190092 - (view) Author: Jason R. Coombs (jaraco) * Date: 2013-05-26 14:44
Since there's been no additional criticism or discussion on this matter, I'll plan to commit the proposed patch to restore target directory detection. Since the functionality was removed in a bugfix release, it should be acceptable to restore it in a bugfix release. I know 3.2 no longer gets non-security releases, so I'll port the patch to 3.3 and apply it to 3.3 and default.

I'll plan to do this tomorrow unless there's further discussion.
msg190168 - (view) Author: Jason R. Coombs (jaraco) * Date: 2013-05-28 02:07
Ugh. So it seems the 3.3 implementation of win_symlink, which is now been renamed to posix_symlink, is now much more complicated. Not only does it handle symlink arguments on Unix platforms, providing abstraction for a couple of underlying system calls, but the Windows implementation also supports once again wide and narrow characters for the path, which means that the directory detection functions I created are no longer suitable as they also need wide and narrow versions.
msg190172 - (view) Author: Roundup Robot (python-dev) Date: 2013-05-28 03:28
New changeset 29a2557d693e by Jason R. Coombs in branch '3.3':
Issue #13772: Restored directory detection of targets in os.symlink on Windows, which was temporarily removed in Python 3.2.3 due to an incomplete implementation. The implementation now works even if the symlink is created in a location other than the current directory.
http://hg.python.org/cpython/rev/29a2557d693e
msg190625 - (view) Author: Roundup Robot (python-dev) Date: 2013-06-04 22:36
New changeset f431cd0edd85 by Victor Stinner in branch 'default':
Issue #13772: Fix compiler warnings on Windows
http://hg.python.org/cpython/rev/f431cd0edd85
msg190626 - (view) Author: Roundup Robot (python-dev) Date: 2013-06-04 22:37
New changeset c351591f1f63 by Victor Stinner in branch 'default':
Issue #13772: fix _check_dirA(): call *A() functions, not *W() functions
http://hg.python.org/cpython/rev/c351591f1f63
msg190627 - (view) Author: STINNER Victor (vstinner) * Date: 2013-06-04 22:38
@Jason R. Coombs: Can you please review my following commit? It looks like you made a copy/paste failure in _check_dirA() :-)

> New changeset c351591f1f63 by Victor Stinner in branch 'default':
> Issue #13772: fix _check_dirA(): call *A() functions, not *W() functions
> http://hg.python.org/cpython/rev/c351591f1f63
msg190631 - (view) Author: Jason R. Coombs (jaraco) * Date: 2013-06-04 23:22
@Victor Stinner

Thanks for fixing this. You're absolutely right.
msg190632 - (view) Author: Roundup Robot (python-dev) Date: 2013-06-04 23:59
New changeset e024236ea253 by Victor Stinner in branch 'default':
Issue #13772: Fix a compiler warning on Windows
http://hg.python.org/cpython/rev/e024236ea253

New changeset d9f3ea27f826 by Victor Stinner in branch 'default':
Issue #13772: Mark helper functions as private (static)
http://hg.python.org/cpython/rev/d9f3ea27f826
msg190633 - (view) Author: Roundup Robot (python-dev) Date: 2013-06-05 00:08
New changeset c8212fca8747 by Victor Stinner in branch 'default':
Issue #13772: Use syntax for literal wchar_t character
http://hg.python.org/cpython/rev/c8212fca8747
History
Date User Action Args
2013-06-05 00:08:01python-devsetmessages: + msg190633
2013-06-04 23:59:30python-devsetmessages: + msg190632
2013-06-04 23:22:55jaracosetmessages: + msg190631
2013-06-04 22:38:36vstinnersetnosy: + vstinner
messages: + msg190627
2013-06-04 22:37:25python-devsetmessages: + msg190626
2013-06-04 22:36:04python-devsetmessages: + msg190625
2013-05-28 03:28:39python-devsetmessages: + msg190172
2013-05-28 02:07:28jaracosetmessages: + msg190168
2013-05-26 14:44:28jaracosetmessages: + msg190092
2013-02-26 14:57:19jaracosetmessages: + msg183062
2013-02-26 14:11:53pitrousetmessages: + msg183059
2013-02-26 13:51:40jaracosetfiles: + b744517b90bc.diff

messages: + msg183057
2013-02-26 13:24:25jaracosetfiles: + 8bf88b31ebb7.diff

messages: + msg183054
2013-02-26 13:10:00jaracosetfiles: + 8bf88b31ebb7.diff
2013-02-26 13:09:01jaracosethgrepos: + hgrepo177
2013-02-26 13:08:51jaracosethgrepos: - hgrepo176
2013-02-26 13:08:12jaracosethgrepos: + hgrepo176
2013-02-20 15:47:36jaracosetmessages: + msg182523
2012-07-26 23:34:37pitrousetmessages: + msg166529
2012-07-26 19:12:29jaracosetnosy: + jaraco
messages: + msg166500
2012-01-24 08:14:37pitrousetstatus: open -> closed
resolution: fixed
messages: + msg151885

stage: patch review -> resolved
2012-01-24 08:13:48python-devsetnosy: + python-dev
messages: + msg151884
2012-01-23 19:49:34brian.curtinsetmessages: + msg151841