Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

new os.path function to extract common prefix based on path components #54604

Closed
ronaldoussoren opened this issue Nov 12, 2010 · 18 comments
Closed
Assignees
Labels
stdlib Python modules in the Lib dir type-feature A feature request or enhancement

Comments

@ronaldoussoren
Copy link
Contributor

BPO 10395
Nosy @loewis, @rhettinger, @pfmoore, @ronaldoussoren, @ericvsmith, @ezio-melotti, @merwok, @bitdancer, @ericsnowcurrently, @Fak3, @serhiy-storchaka
Files
  • patch10395: Implementation of os.path.commonpath
  • patch10395-2: Updated implementation of {posix,nt}path.commonpath
  • patch10395-3: New version of ntpath.commonpath
  • ospath_commonpath.patch
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = 'https://github.com/serhiy-storchaka'
    closed_at = <Date 2015-04-02.12:42:22.211>
    created_at = <Date 2010-11-12.15:14:04.644>
    labels = ['type-feature', 'library']
    title = 'new os.path function to extract common prefix based on path components'
    updated_at = <Date 2015-04-02.12:42:22.210>
    user = 'https://github.com/ronaldoussoren'

    bugs.python.org fields:

    activity = <Date 2015-04-02.12:42:22.210>
    actor = 'serhiy.storchaka'
    assignee = 'serhiy.storchaka'
    closed = True
    closed_date = <Date 2015-04-02.12:42:22.211>
    closer = 'serhiy.storchaka'
    components = ['Library (Lib)']
    creation = <Date 2010-11-12.15:14:04.644>
    creator = 'ronaldoussoren'
    dependencies = []
    files = ['27883', '27899', '27974', '35942']
    hgrepos = []
    issue_num = 10395
    keywords = ['patch']
    message_count = 18.0
    messages = ['121038', '121039', '121040', '121043', '121063', '121106', '141535', '174663', '174818', '174819', '174941', '175493', '222966', '222986', '238766', '239675', '239691', '239694']
    nosy_count = 15.0
    nosy_names = ['loewis', 'rhettinger', 'paul.moore', 'ronaldoussoren', 'eric.smith', 'ezio.melotti', 'eric.araujo', 'r.david.murray', 'santoso.wijaya', 'python-dev', 'eric.snow', 'Roman.Evstifeev', 'serhiy.storchaka', 'rafik', 'Paddy McCarthy']
    pr_nums = []
    priority = 'low'
    resolution = 'fixed'
    stage = 'resolved'
    status = 'closed'
    superseder = None
    type = 'enhancement'
    url = 'https://bugs.python.org/issue10395'
    versions = ['Python 3.5']

    @ronaldoussoren
    Copy link
    Contributor Author

    The documentation for os.path.commonprefix notes:

    os.path.commonprefix(list)
    Return the longest path prefix (taken character-by-character) that is a prefix of all paths in list. If list is empty, return the empty string (''). Note that this may return invalid paths because it works a character at a time.

    And indeed:

    >>> os.path.commonprefix(['/usr/bin', '/usr/bicycle'])
    '/usr/bi'

    This is IMHO useless behaviour for a function in the os.path namespace, I'd expect that os.path.commonprefix works with path elements (e.g. that the call above would have returned '/usr').

    @ronaldoussoren ronaldoussoren added the stdlib Python modules in the Lib dir label Nov 12, 2010
    @ericvsmith
    Copy link
    Member

    Indeed, that behavior seems completely useless.

    I've verified that it works the same in 2.5.1.

    @ericvsmith
    Copy link
    Member

    Although there are test cases in test_genericpath that verify this behavior, so apparently it's intentional.

    @ronaldoussoren
    Copy link
    Contributor Author

    That's why I write 'broken by design' in the title.

    A "fix" for this will have to a new function, if any get added (I've written a unix implementation that finds the longest shared path several times and can provide an implementation and tests when others agree that this would be useful)

    @loewis
    Copy link
    Mannequin

    loewis mannequin commented Nov 12, 2010

    This goes back to bpo-400788 and

    http://mail.python.org/pipermail/python-dev/2000-July/005897.html
    http://mail.python.org/pipermail/python-dev/2000-August/008385.html

    Skip changed it to do something meaningful (more than ten years ago), Mark Hammond complained that it was backwards incompatible, Tim Peters argued that you shouldn't change a function if the documented behavior matches the implementation, and Skip reverted the change and added more documentation to make the actual behavior more explicit.

    It may be useless, but it's certainly not broken. In addition, it's very likely that applications of it rely on the very semantics that it has.

    In any case, anybody proposing a change should go back and re-read the old threads.

    @bitdancer
    Copy link
    Member

    Indeed, as I remember it there are people using commonprefix as a string function in situations having nothing to do with os paths.

    I'm changing the title to reflect the fact that this is really a feature request for a new function. IMO it is a reasonable feature request. Finding a name for it ought to be an interesting exercise.

    I think that this should only be accepted if there is also a windows implementation.

    @bitdancer bitdancer changed the title os.path.commonprefix broken by design new os.path function to extract common prefix based on path components Nov 13, 2010
    @bitdancer bitdancer added the type-feature A feature request or enhancement label Nov 13, 2010
    @ericsnowcurrently
    Copy link
    Member

    You can already get the better prefix using os.path, albeit less efficiently. Here's an example:

    def commondirname(paths):
        subpath = os.path.commonprefix(paths)
        for path in paths:
            if path == subpath:
                return subpath
        else:
            return os.path.join(os.path.split(subpath)[0], "")

    However, would it be better to implicitly normalize paths first rather than doing a character-by-character comparison? Here is an unoptimized demonstration of what I mean:

    def commondirname(paths):
        result = ""
        for path in paths:
            path = os.path.normcase(os.path.abspath(path))
            if not result:
                result = path
            else:
                while not path.startswith(result + os.path.sep):
                    result, _ = os.path.split(result)
                    if os.path.splitdrive(result)[1] == os.path.sep:
                        return result
        return result

    @merwok
    Copy link
    Member

    merwok commented Nov 3, 2012

    Rafik is working on os.path.commonpath for the bug day.

    @rafik
    Copy link
    Mannequin

    rafik mannequin commented Nov 4, 2012

    Here is a patch with an implementation of os.path.commonpath, along with tests and documentation. At the moment, this is only implemented for POSIX, as I don't feel like I know enough about Windows to tackle drive letters and UNC in paths without spending some more time on it.

    This probably needs more tests for corner cases.

    @serhiy-storchaka
    Copy link
    Member

    At the moment, this is only implemented for POSIX, as I don't feel like I know enough about Windows to tackle drive letters and UNC in paths without spending some more time on it.

    Just use splitdrive() and first ensure that all drivespecs are same, then find common prefix for pathspecs.

    @rafik
    Copy link
    Mannequin

    rafik mannequin commented Nov 5, 2012

    Here is a new patch addressing some of storchaka review comments, and implementing a version in ntpath.

    For the Windows version, I did as proposed in msg174819, but as I am not familiar with the semantics and subtleties of paths in Windows maybe this version of ntpath.commonpath is too simplistic and would return wrong results in some cases. I would like someone more knowledgeable in Windows to take care of it, or maybe just provide a test suite with lots of different corner cases that I could use to provide a better implementation.

    @serhiy-storchaka
    Copy link
    Member

    Some conclusions of discussion at Python-ideas (http://comments.gmane.org/gmane.comp.python.ideas/17719):

    1. commonpath() should eat double slashes in input (['/usr/bin', '/usr//bin'] -> '/usr/bin'). In any case the current implementation eats slashes on output (['/usr//bin', '/usr//bin'] -> '/usr/bin', not '/usr//bin').

    2. commonpath() should raise an exception instead of returning None on incompatible input.

    3. May be commonpath() should eat also '.' components and return '.' instead of '' when relative paths have no common prefix. I am not sure.

    In general the current patch looks good enough.

    @serhiy-storchaka serhiy-storchaka self-assigned this Dec 29, 2012
    @serhiy-storchaka
    Copy link
    Member

    Here is revised patch. The behavior is changed in correspondence with results of Python-ideas discussion, extended tests, fixed several bugs.

    @rhettinger
    Copy link
    Contributor

    This patch looks reasonable except for the doc change to os.path.commonprefix(). Remember, that function IS working as documented and that our policy is to document in an affirmative manner (here is what the function does and how to use it versus being preachy about "broken-by-design" etc.)

    @PaddyMcCarthy
    Copy link
    Mannequin

    PaddyMcCarthy mannequin commented Mar 21, 2015

    Can we now:

    1. Move os.path.commonprefix to str.commonprefix or string.commonprefix
    2. Deprecate the use of os.path.commonprefix
    3. Add os.path.commonpath
    4. Update the documentation.

    This seems to have lingered for too long and yet people have been willing to do the work it seems (from 1999).

    @pfmoore
    Copy link
    Member

    pfmoore commented Mar 31, 2015

    The patch looks good to me.

    rhettinger: I'm not sure I see a problem with the doc changes in the latest patch - noting that commonprefix may return an invalid path is fine, and what the current docs say. Directing people to commonpath if they don't want invalid paths also seems fine.

    Paddy McCarthy: I don't think that the backward compatibility cost of moving os.path.commonprefix is worth it.

    @serhiy-storchaka
    Copy link
    Member

    The patch only adds a reference to commonpath() in commonprefix() documentation. The note about invalid paths already was here.

    @python-dev
    Copy link
    Mannequin

    python-dev mannequin commented Mar 31, 2015

    New changeset ec6c812fbc1f by Serhiy Storchaka in branch 'default':
    Issue bpo-10395: Added os.path.commonpath(). Implemented in posixpath and ntpath.
    https://hg.python.org/cpython/rev/ec6c812fbc1f

    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    stdlib Python modules in the Lib dir type-feature A feature request or enhancement
    Projects
    None yet
    Development

    No branches or pull requests

    8 participants