classification
Title: Type: Change ntpath functions to implicitly support UNC paths enhancement patch review Windows Python 3.1
process
Status: Resolution: closed fixed mhammond amaury.forgeotdarc, eric.smith, larry, mhammond, paul.moore normal needs review, patch

Created on 2009-04-20 11:07 by larry, last changed 2009-05-06 08:51 by eric.smith. This issue is now closed.

Files
lch.ntpath.r71757.diff larry, 2009-04-20 11:07 Patch against py3k/trunk r71757.
lch.ntpath.r72242.diff larry, 2009-05-04 02:17 Patch against py3k/trunk r72242.
lch.ntpath.r72309.diff larry, 2009-05-05 03:19 Patch against py3k/trunk r72309.
lch.ntpath.r72309.diff.2 larry, 2009-05-05 03:24 Updated patch against py3k/trunk r72309.
lch.ntpath.r72345.diff larry, 2009-05-05 17:49 Patch against py3k/trunk r72345.
markh.ntpath.r72374.diff mhammond, 2009-05-06 04:43 minor changes; news entry, warning.
Messages (21)
msg86195 - (view) Author: Larry Hastings (larry) * Date: 2009-04-20 11:07
This patch changes "ntpath" so all functions handle UNC paths.

In a Windows path string, a UNC path functions *exactly* like a drive
letter.  This patch means that the Python path split/join functions
treats them as if they were.

For instance:
>>> splitdrive("A:\\FOO\\BAR.TXT")
("A:", "\\FOO\\BAR.TXT")

With this patch applied:
>>> splitdrive("\\\\HOSTNAME\\SHARE\\FOO\\BAR.TXT")
("\\\\HOSTNAME\\SHARE", "\\FOO\\BAR.TXT")

This methodology only breaks down in one place: there is no "default
directory" for a UNC share point.  E.g. you can say
>>> os.chdir("c:")
or
>>> os.chdir("c:foo\\bar")
but you can't say
>>> os.chdir("\\\\hostname\\share")

The attached patch changes:
* Modify join, split, splitdrive, and ismount to add explicit support
for UNC paths.  (The other functions pick up support from these four.)
* Simplify isabs and normpath, now that they don't need to be delicate
* Modify existing unit tests and add new ones.
* Document the changes to the API.
* Deprecate splitunc, with a warning and a documentation remark.

This patch adds one subtle change I hadn't expected.  If you call
split() with a drive letter followed by a trailing slash, it returns the
trailing slash as part of the "head" returned.  E.g.
>>> os.path.split("\\")
("\\", "")
>>> os.path.split("A:\\")
("A:\\", "")
This is mentioned in the documentation, as follows:
Trailing slashes are stripped from head unless it is the root
(one or more slashes only).

For some reason, when os.path.split was called with a UNC path with only
a trailing slash, it stripped the trailing slash:
>>> os.path.split("\\\\hostname\\share\\")
("\\\\hostname\\share", "")
My patch changes this behavior; you would now see:
>>> os.path.split("\\\\hostname\\share\\")
("\\\\hostname\\share\\", "")
I think it's an improvement--this is more consistent.  Note that this
does *not* break the documented requirement that
os.path.join(os.path.split(path)) == path; that continues to work fine.

In the interests of full disclosure: I submitted a patch providing this
exact behavior just over ten years ago.  GvR accepted it into Python
1.5.2b2 (marked "*EXPERIMENTAL*") and removed it from 1.5.2c1.

Misc/HISTORY dated "Tue Apr  6 19:38:18 1999".  If memory serves
correctly, the "problems" cited were only on Cygwin.  At the time Cygwin
used "ntpath", and it supported "//a/foo" as an alias for "A:\\FOO".
You can see how this would cause Cygwin problems.

In the intervening decade, two highly relevant things have happened:
* Python no longer uses ntpath for os.path on Cygwin.  It instead uses
posixpath.
* Cygwin removed the "//a/foo" drive letter hack.  In fact, I believe it
now support UNC paths.
Therefore this patch will have no effect on Cygwin users.

p.s. I discussed this patch with Mark Hammond at the CPython sprint at
PyCon 2008.  I therefore added him to the nosy list.  I have no idea if
this is proper etiquette; I apologize in advance if it is not.
msg86281 - (view) Author: Paul Moore (paul.moore) * Date: 2009-04-22 08:45
This sounds reasonable to me. I would like to see this patch applied.

Note - someone involved with the cygwin port should confirm that the
patch does indeed no longer cause issues for cygwin.
msg86285 - (view) Author: Amaury Forgeot d'Arc (amaury.forgeotdarc) * Date: 2009-04-22 11:24
I confirm that the cygwin port uses posixpath, and that the current
patch has no impact there, except for users which explicitly import
ntpath to get nt semantics.
msg86860 - (view) Author: Eric V. Smith (eric.smith) * Date: 2009-05-01 00:33
The doc patch doesn't apply cleanly for me.

There are a number of code cleanups in the patch, like 0->False,
1->True, the improvement of the params to path(), improvement in
isabs(), etc. I think these cleanups should be made in a separate patch,
so that it's easier to evaluate what's really involved in this change.
I'd be happy to do that, and produce the patches, if you could produce a
clean patch.

The docstring for splitdrive() should be indented, see PEP 257. Also,
the code snippet in that docstring is wrong, it uses 'split' where it
should be 'result', I think.

I haven't worked my way through all of the code and tests, yet.
msg87096 - (view) Author: Larry Hastings (larry) * Date: 2009-05-04 02:17
I've generated a new patch, attached.  I don't know why you had trouble
applying.  I tested this one with a clean tree and "patch -p1 < ..." and
it applied cleanly.  If it fails again, how about I upload the three
modified files?

I removed the extraneous changes.  I claim that the changes that remain
in ntpath are salient to the UNC changes, but feel free to call me on it
if I got that wrong.  (Or make a new patch as you have graciously
volunteered to do.)

I also amended splitdrive's UNC handling slightly; it now rejects
UNC-like paths that start with three slashes or have two slashes between
the host and mount point.  Thus neither "///computer/mountpoint/x" nor
"//computer//mountpoint/x" will be parsed as UNC paths.
msg87197 - (view) Author: Eric V. Smith (eric.smith) * Date: 2009-05-05 00:50
> I've generated a new patch, attached.  I don't know why you had trouble
> applying.

Yeah, I'm not sure what that was about. Part of the patch applied, but
not the rest. In any event, the current one applied cleanly.

I'm not sure this part of the patch is correct (in relpath):
for i in range(min(len(start_list), len(path_list))):
-        if start_list[i].lower() != path_list[i].lower():
+        if start_list[i] != path_list[i]:
break
-    else:
i += 1

I think removing the "else:" is likely an error. It probably also means
that this code isn't tested.

Other than that, this looks reasonable enough to me. I'm hoping someone
else can give it a thorough review, too. I haven't had time to look
through all of the cases in join() to verify that they're correct and
complete.
msg87204 - (view) Author: Larry Hastings (larry) * Date: 2009-05-05 02:34
Sweet jumping rhubarb!  I didn't know about this "for ...: else:"
construct in Python.  When I was editing it I thought it was my editor
stumbling over a tab, and the "else" went with the "if" *inside* the
"for".  You're absolutely right that the current code is utter nonsense;
the "i += 1" gets overwritten immediately by the for loop.

On the other hand: the original code is wrong too.
for i in whatever:
something()
else:
i += 1
If the for loop is not executed, "i" is undefined, so adding 1 to it is
a runtime error.

Then again!  In the existing code the loop always executes at least
once, because it's iterating over the split of the absolute path, rather
than starting at the root of the current drive.  So the "else:" is
unnecessary.

What the "i += 1" is *probably* trying to do is fix this bug:
>>> ntpath.relpath("/foo/bar/bat", "/")
"../foo/bar/bat"
If the lists the for loop was examining skipped the drive letter (or UNC
path), the for loop wouldn't exit.  Maybe they used to have "i = 0"
above the loop or something, in which case this would probably have worked.

Sadly this bug is in both my version and the current version.

I will fix this bug, add a test case, and file a new patch, hopefully
tonight.
msg87205 - (view) Author: Larry Hastings (larry) * Date: 2009-05-05 03:19
As promised, a fresh patch.  Note that the current formulation of that
troublesome loop in relpath() is *very* much on purpose.  The removal of
the "else" is no longer a bug :)
msg87206 - (view) Author: Larry Hastings (larry) * Date: 2009-05-05 03:24
Sorry for the quick double-patch, but I realized that there was no test
for the obverse case, where "path" is / and "curdir" is non-empty.  So I
threw in some more tests.
msg87211 - (view) Author: Larry Hastings (larry) * Date: 2009-05-05 05:38
I stared at it some more.  Now I understand what "for ... : else:" was
for in the original.  The "else: i += 1" ensures that if the "for" loop
runs to completion "i" is set to the length of the shorter of the two
lists.  Anyway, my reimplemented loop accomplishes this in a slightly
easier-to-read style.
msg87232 - (view) Author: Eric V. Smith (eric.smith) * Date: 2009-05-05 09:09
This looks okay to me. (The itertools import isn't needed, but easy
enough to fix on checkin.)

I'd still like someone else to look it over, but if no one does before
the beta, I'll check it in.
msg87259 - (view) Author: Larry Hastings (larry) * Date: 2009-05-05 17:13
Whoops, yeah, that was me forgetting that in py3k itertools.izip became
just "zip".  I'll remove it in my branch in case I generate another patch.
msg87263 - (view) Author: Larry Hastings (larry) * Date: 2009-05-05 17:49
Just 'cause I like you, here's an updated patch.  The only change is the
removal of "import itertools" and the update of the base version.
msg87284 - (view) Author: Eric V. Smith (eric.smith) * Date: 2009-05-05 21:21
Looking at Guido's removal of this back in 1999, he says:
"""
* Lib/ntpath.py:
Withdraw the UNC support from splitdrive().  Instead, a new function
splitunc() parses UNC paths.  The contributor of the UNC parsing in
splitdrive() doesn't like it, but I haven't heard a good reason to
keep it, and it causes some problems.  (I think there's a
philosophical problem -- to me, the split*() functions are purely
syntactical, and the fact that \\foo is not a valid path doesn't mean
that it shouldn't be considered an absolute path.)
"""

Do you have any comment on his philosophical problem?
msg87289 - (view) Author: Larry Hastings (larry) * Date: 2009-05-05 21:35
I've never understood what is the "philosophical problem" per se.  It's
clear from his implementation--Guido created "splitunc" when he removed
my patch--that he thinks these should be precise string operations.
Whereas I propose making these slightly higher-level abstract operations
on "paths" with "drives" (really "mount points").  I think my approach
is more pleasant to use.

As for whether or not "\\foo" should be considered an absolute
path--perhaps my 1999 patch changed this behavior, but this patch does
not.  This:
>>> import ntpath
>>> ntpath.isabs("//foo")
True
is unchanged by the attached patch.  (Or, at the very least, the latest
patch.)

At the time I guess I did a poor job of demonstrating the use case, but
I think the chorus of +1s shows there is one.

On the other hand, I'm not a Windows programmer anymore, and I barely
touch Windows boxes.  I'm doing this out of the kindness of my heart ;)
msg87303 - (view) Author: Mark Hammond (mhammond) * Date: 2009-05-06 00:07
Should the DeprecationWarning for splitunc be a PendingDeprectionWarning
for 3.1 and get 'upgraded' in 3.2?
msg87304 - (view) Author: Larry Hastings (larry) * Date: 2009-05-06 00:17
I don't know what would be proper, but I'm happy to change it if you
think that's best.
msg87306 - (view) Author: Mark Hammond (mhammond) * Date: 2009-05-06 04:43
This looks good to me.

My take on Guido's earlier notes are that they caused a problem in
practice, and the philosophical concerns added justification for
removing it.  I'm yet to meet a Windows user with a philosophical
objection to this.

I'm attaching a new patch; I've added a news entry, changed to a
PendingDeprecation warning and moved the import of the warnings module
to the splitunc function to avoid extra import overhead, even if tiny.
Could anyone still awake and available please have a quick review of the
news entry and my change?
msg87311 - (view) Author: Larry Hastings (larry) * Date: 2009-05-06 07:32
I tested your patch and it looks good to me.  I didn't get the
deprecation warning on ntpath.splitunc unless I disabled all warnings.
msg87317 - (view) Author: Mark Hammond (mhammond) * Date: 2009-05-06 08:06
Checked into the trunk as r72387 (after normalizing whitespace in
ntpath.py after the precommit-hook failed).   Thanks Larry and Eric!
msg87325 - (view) Author: Eric V. Smith (eric.smith) * Date: 2009-05-06 08:51
Thanks for looking at this, Mark. If we could only assign issues to
Python 3.2 and 3.3 to change the pending deprecation warning to a real
one, and to remove the function entirely, we'd be all set! I'm always
worried we'll forget these things.
History
Date User Action Args
2009-05-06 08:51:16eric.smithsetmessages: + msg87325
2009-05-06 08:06:11mhammondsetstatus: open -> closed
resolution: fixed
messages: + msg87317
2009-05-06 07:32:31larrysetmessages: + msg87311
2009-05-06 04:43:52mhammondsetfiles: + markh.ntpath.r72374.diff
assignee: mhammond
messages: + msg87306
2009-05-06 00:17:53larrysetmessages: + msg87304
2009-05-06 00:07:19mhammondsetmessages: + msg87303
2009-05-05 21:59:55eric.smithsetassignee: eric.smith -> (no value)
2009-05-05 21:35:09larrysetmessages: + msg87289
2009-05-05 21:21:32eric.smithsetassignee: eric.smith
messages: + msg87284
2009-05-05 17:49:14larrysetfiles: + lch.ntpath.r72345.diff

messages: + msg87263
2009-05-05 17:13:22larrysetmessages: + msg87259
2009-05-05 09:09:33eric.smithsetkeywords: + needs review

messages: + msg87232
stage: patch review
2009-05-05 05:39:27larrysetmessages: + msg87211
2009-05-05 03:24:13larrysetfiles: + lch.ntpath.r72309.diff.2

messages: + msg87206
2009-05-05 03:19:25larrysetfiles: + lch.ntpath.r72309.diff

messages: + msg87205
2009-05-05 02:34:52larrysetmessages: + msg87204
2009-05-05 00:50:08eric.smithsetmessages: + msg87197
2009-05-04 02:17:55larrysetfiles: + lch.ntpath.r72242.diff

messages: + msg87096
2009-05-01 00:33:24eric.smithsetnosy: + eric.smith
messages: + msg86860
2009-04-22 11:24:03amaury.forgeotdarcsetnosy: + amaury.forgeotdarc
messages: + msg86285
2009-04-22 08:45:40paul.mooresetnosy: + paul.moore
messages: + msg86281
2009-04-20 11:07:13larrycreate