Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add function to get common path prefix #49005

Closed
smontanaro opened this issue Dec 27, 2008 · 10 comments
Closed

Add function to get common path prefix #49005

smontanaro opened this issue Dec 27, 2008 · 10 comments
Labels
stdlib Python modules in the Lib dir type-feature A feature request or enhancement

Comments

@smontanaro
Copy link
Contributor

BPO 4755
Nosy @loewis, @smontanaro, @ncoghlan, @merwok, @vadmium, @serhiy-storchaka
Superseder
  • bpo-10395: new os.path function to extract common prefix based on path components
  • Files
  • cpp.diff
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = None
    closed_at = <Date 2017-05-05.21:53:15.529>
    created_at = <Date 2008-12-27.04:00:34.880>
    labels = ['type-feature', 'library']
    title = 'Add function to get common path prefix'
    updated_at = <Date 2017-05-05.21:53:15.522>
    user = 'https://github.com/smontanaro'

    bugs.python.org fields:

    activity = <Date 2017-05-05.21:53:15.522>
    actor = 'martin.panter'
    assignee = 'none'
    closed = True
    closed_date = <Date 2017-05-05.21:53:15.529>
    closer = 'martin.panter'
    components = ['Library (Lib)']
    creation = <Date 2008-12-27.04:00:34.880>
    creator = 'skip.montanaro'
    dependencies = []
    files = ['12469']
    hgrepos = []
    issue_num = 4755
    keywords = ['patch', 'needs review']
    message_count = 10.0
    messages = ['78338', '78339', '78529', '78530', '78532', '78533', '111589', '227699', '227707', '293143']
    nosy_count = 9.0
    nosy_names = ['loewis', 'skip.montanaro', 'ncoghlan', 'techtonik', 'laxrulz777', 'eric.araujo', 'cmcqueen1975', 'martin.panter', 'serhiy.storchaka']
    pr_nums = []
    priority = 'normal'
    resolution = 'duplicate'
    stage = 'resolved'
    status = 'closed'
    superseder = '10395'
    type = 'enhancement'
    url = 'https://bugs.python.org/issue4755'
    versions = ['Python 3.3']

    @smontanaro
    Copy link
    Contributor Author

    os.path.commonprefix returns the common prefix of a list of paths taken character-by-character. This can
    return invalid paths. For example, os.path.commonprefix(["/export/home/dave", "/etc/passwd"]) will return "/e", which likely has no meaning as a path, at least in the context of the input list.

    Ideally, os.path.commonprefix would operate component-by-component, but people rely on the existing
    character-by-character operation, so it has been so far impossible to change semantics. There are several
    possible ways to solve this problem. One, change how commonprefix behaves. Two, add a flag to
    commonprefix to allow it to operate component-by-component if desired. Three, add a new function to
    os.path.

    I personally prefer the first option. Aside from the semantic change though, it presents the problem of
    where to put the old definition of commonprefix. It's clearly of some use or people wouldn't have co-
    opted it for non-filesystem use. It could go in the string module, but that's been living a life in limbo
    since the creation of string methods. People have been loathe to add new functionality there. The second
    option seems to me like would just be a hack on top of already broken behavior and probably require the
    currently slightly broken behavior as the default to boot, so I won't go there. Since option one is
    perhaps not going to be available to me, I've implemented the third option as a new function,
    commonpathprefix. See the attached patch. It includes test cases and documentation changes.

    @smontanaro smontanaro added type-bug An unexpected behavior, bug, or error stdlib Python modules in the Lib dir labels Dec 27, 2008
    @ncoghlan
    Copy link
    Contributor

    A new function sounds like a good solution to me. How about just calling
    it "os.path.commonpath" though?

    I agree having a path component based prefix function in os.path is
    highly desirable, particularly since the addition of relpath in 2.6:

    base_dir = os.path.commonpath(paths)
    rel_paths = [os.path.relpath(p, base_dir) for p in paths]

    @loewis
    Copy link
    Mannequin

    loewis mannequin commented Dec 30, 2008

    The documentation should explain what a "common path prefix" is. It
    can't be the path to a common parent directory, since the new function
    doesn't allow mixing absolute and relative directories. As Phillip Eby
    points out, it also doesn't account for case-insensitivity that some
    file systems or operating systems implement, nor does it take into
    account short file names on Windows.

    @smontanaro
    Copy link
    Contributor Author

    I think we need to recognize the inherent limitations of what we can expect
    to do. It is perfectly reasonable for a user on Windows to import posixpath
    and call posixpath.commonpathprefix. The function won't have access to the
    actual filesystems being manipulated. Same for Unix folks importing ntpath
    and manipulating Windows paths. While we can make it handle
    case-insensitivity, I'm no sure we can do much, if anything, about shortened
    filenames.

    Also, as long as we are considering case sensitivity, what about HFS on Mac
    OS X?

    Skip

    @ncoghlan
    Copy link
    Contributor

    1. The discussion on python-dev shows that the current documentation of
      os.path.commonprefix is incorrect - it technically works element by
      element rather than character by character (since it will handle
      sequences other than strings, such as lists of path components)

    2. Splitting on os.sep is not the correct way to break a string into
      path components. Instead, os.path.split needs to be applied repeatedly
      until "head" is a single character (a single occurrence of os.sep or
      os.altsep for an absolute path) or empty (for a relative path).
      (Alternatively, but with additional effects on the result, the
      separators can be normalised first with os.path.normpath or
      os.path.normcase)

    For Windows, os.path.splitunc and os.path.splitdrive should also be
    invoked first, and if either returns a non-empty string, that should
    become the first path component (with the remaining components filled in
    as above)

    1. Calling any or all of
      abspath/expanduser/expandvars/normcase/normpath/realpath is the
      responsibility of the library user as far as os.path.commonprefix is
      concerned. Should that behaviour be retained for an os.path.commonpath
      function, or should some of them (such as os.path.abspath) be called
      automatically?

    @ncoghlan
    Copy link
    Contributor

    The regex based approach to the component splitting when os.altsep is
    defined obviously works as well. Duplicating the values of sep and
    altsep in the default regex that way grates a little though...

    @akuchling akuchling added the stale Stale PR or inactive for long period of time. label Feb 25, 2010
    @cmcqueen1975
    Copy link
    Mannequin

    cmcqueen1975 mannequin commented Jul 26, 2010

    @merwok merwok changed the title Common path prefix Add function to get common path prefix Jan 3, 2012
    @merwok merwok added type-feature A feature request or enhancement and removed type-bug An unexpected behavior, bug, or error labels Jan 3, 2012
    @serhiy-storchaka
    Copy link
    Member

    There is more developed patch in bpo-10395.

    @smontanaro
    Copy link
    Contributor Author

    Feel free to close this ticket. I long ago gave up on it.

    @vadmium
    Copy link
    Member

    vadmium commented May 5, 2017

    bpo-10395 added “os.path.commonpath” in 3.5.

    @vadmium vadmium closed this as completed May 5, 2017
    @vadmium vadmium removed the stale Stale PR or inactive for long period of time. label May 5, 2017
    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    stdlib Python modules in the Lib dir type-feature A feature request or enhancement
    Projects
    None yet
    Development

    No branches or pull requests

    6 participants