classification
Title: Add function to get common path prefix
Type: enhancement Stage: resolved
Components: Library (Lib) Versions: Python 3.3
process
Status: closed Resolution: duplicate
Dependencies: Superseder: new os.path function to extract common prefix based on path components
View: 10395
Assigned To: Nosy List: cmcqueen1975, eric.araujo, laxrulz777, loewis, martin.panter, ncoghlan, serhiy.storchaka, skip.montanaro, techtonik
Priority: normal Keywords: needs review, patch

Created on 2008-12-27 04:00 by skip.montanaro, last changed 2017-05-05 21:53 by martin.panter. This issue is now closed.

Files
File name Uploaded Description Edit
cpp.diff skip.montanaro, 2008-12-27 04:00 review
Messages (10)
msg78338 - (view) Author: Skip Montanaro (skip.montanaro) * (Python triager) Date: 2008-12-27 04:00
os.path.commonprefix returns the common prefix of a list of paths taken character-by-character.  This can 
return invalid paths.  For example, os.path.commonprefix(["/export/home/dave", "/etc/passwd"]) will return "/e", which likely has no meaning as a path, at least in the context of the input list.

Ideally, os.path.commonprefix would operate component-by-component, but people rely on the existing 
character-by-character operation, so it has been so far impossible to change semantics.  There are several 
possible ways to solve this problem.  One, change how commonprefix behaves.  Two, add a flag to 
commonprefix to allow it to operate component-by-component if desired.  Three, add a new function to 
os.path.

I personally prefer the first option.  Aside from the semantic change though, it presents the problem of 
where to put the old definition of commonprefix.  It's clearly of some use or people wouldn't have co-
opted it for non-filesystem use.  It could go in the string module, but that's been living a life in limbo 
since the creation of string methods.  People have been loathe to add new functionality there.  The second 
option seems to me like would just be a hack on top of already broken behavior and probably require the 
currently slightly broken behavior as the default to boot, so I won't go there.  Since option one is 
perhaps not going to be available to me, I've implemented the third option as a new function, 
commonpathprefix.  See the attached patch.  It includes test cases and documentation changes.
msg78339 - (view) Author: Nick Coghlan (ncoghlan) * (Python committer) Date: 2008-12-27 04:24
A new function sounds like a good solution to me. How about just calling
it "os.path.commonpath" though?

I agree having a path component based prefix function in os.path is
highly desirable, particularly since the addition of relpath in 2.6:

base_dir = os.path.commonpath(paths)
rel_paths = [os.path.relpath(p, base_dir) for p in paths]
msg78529 - (view) Author: Martin v. Löwis (loewis) * (Python committer) Date: 2008-12-30 13:24
The documentation should explain what a "common path prefix" is. It
can't be the path to a common parent directory, since the new function
doesn't allow mixing absolute and relative directories. As Phillip Eby
points out, it also doesn't account for case-insensitivity that some
file systems or operating systems implement, nor does it take into
account short file names on Windows.
msg78530 - (view) Author: Skip Montanaro (skip.montanaro) * (Python triager) Date: 2008-12-30 13:51
I think we need to recognize the inherent limitations of what we can expect
to do.  It is perfectly reasonable for a user on Windows to import posixpath
and call posixpath.commonpathprefix.  The function won't have access to the
actual filesystems being manipulated.  Same for Unix folks importing ntpath
and manipulating Windows paths.  While we can make it handle
case-insensitivity, I'm no sure we can do much, if anything, about shortened
filenames.

Also, as long as we are considering case sensitivity, what about HFS on Mac
OS X?

Skip
msg78532 - (view) Author: Nick Coghlan (ncoghlan) * (Python committer) Date: 2008-12-30 13:55
1. The discussion on python-dev shows that the current documentation of
os.path.commonprefix is incorrect - it technically works element by
element rather than character by character (since it will handle
sequences other than strings, such as lists of path components)

2. Splitting on os.sep is not the correct way to break a string into
path components. Instead, os.path.split needs to be applied repeatedly
until "head" is a single character (a single occurrence of os.sep or
os.altsep for an absolute path) or empty (for a relative path).
(Alternatively, but with additional effects on the result, the
separators can be normalised first with os.path.normpath or
os.path.normcase)

  For Windows, os.path.splitunc and os.path.splitdrive should also be
invoked first, and if either returns a non-empty string, that should
become the first path component (with the remaining components filled in
as above)

3. Calling any or all of
abspath/expanduser/expandvars/normcase/normpath/realpath is the
responsibility of the library user as far as os.path.commonprefix is
concerned. Should that behaviour be retained for an os.path.commonpath
function, or should some of them (such as os.path.abspath) be called
automatically?
msg78533 - (view) Author: Nick Coghlan (ncoghlan) * (Python committer) Date: 2008-12-30 14:05
The regex based approach to the component splitting when os.altsep is
defined obviously works as well. Duplicating the values of sep and
altsep in the default regex that way grates a little though...
msg111589 - (view) Author: Craig McQueen (cmcqueen1975) Date: 2010-07-26 02:28
http://code.activestate.com/recipes/577016-path-entire-split-commonprefix/
msg227699 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2014-09-27 16:45
There is more developed patch in issue10395.
msg227707 - (view) Author: Skip Montanaro (skip.montanaro) * (Python triager) Date: 2014-09-27 18:28
Feel free to close this ticket. I long ago gave up on it.
msg293143 - (view) Author: Martin Panter (martin.panter) * (Python committer) Date: 2017-05-05 21:53
Issue 10395 added “os.path.commonpath” in 3.5.
History
Date User Action Args
2017-05-05 21:53:15martin.pantersetstatus: languishing -> closed

superseder: new os.path function to extract common prefix based on path components

nosy: + martin.panter
messages: + msg293143
resolution: duplicate
stage: patch review -> resolved
2014-09-27 18:28:06skip.montanarosetmessages: + msg227707
2014-09-27 16:45:11serhiy.storchakasetnosy: + serhiy.storchaka
messages: + msg227699
2012-05-23 08:27:08techtoniksetnosy: + techtonik
2012-01-03 16:17:58eric.araujosetnosy: + eric.araujo
title: Common path prefix -> Add function to get common path prefix
type: behavior -> enhancement

versions: + Python 3.3, - Python 3.1
2010-07-26 02:28:33cmcqueen1975setnosy: + cmcqueen1975
messages: + msg111589
2010-02-25 17:54:47akuchlingsetstatus: open -> languishing
keywords: patch, patch, needs review
2008-12-30 14:05:12ncoghlansetkeywords: patch, patch, needs review
messages: + msg78533
2008-12-30 13:55:41ncoghlansetkeywords: patch, patch, needs review
messages: + msg78532
2008-12-30 13:51:26skip.montanarosetmessages: + msg78530
2008-12-30 13:24:15loewissetkeywords: patch, patch, needs review
nosy: + loewis
messages: + msg78529
2008-12-29 22:10:44laxrulz777setnosy: + laxrulz777
2008-12-27 04:24:11ncoghlansetkeywords: patch, patch, needs review
nosy: + ncoghlan
messages: + msg78339
2008-12-27 04:00:34skip.montanarocreate