Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

os module: use keyword-only arguments for dir_fd and nofollow to reduce function count #58831

Closed
larryhastings opened this issue Apr 19, 2012 · 33 comments
Assignees
Labels
release-blocker stdlib Python modules in the Lib dir type-feature A feature request or enhancement

Comments

@larryhastings
Copy link
Contributor

BPO 14626
Nosy @gvanrossum, @birkenfeld, @pitrou, @larryhastings, @benjaminp, @bitdancer, @serhiy-storchaka
Files
  • larry.os.keyword.arguments.collapse.1.diff
  • larry.os.keyword.arguments.collapse.2.diff
  • larry.os.keyword.arguments.collapse.3.diff: My third patch implementing all this mess.
  • larry.os.keyword.arguments.collapse.4.diff: Fourth patch.
  • larry.os.keyword.arguments.collapse.5.diff: Fifth patch.
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = 'https://github.com/larryhastings'
    closed_at = <Date 2012-06-23.20:41:18.101>
    created_at = <Date 2012-04-19.23:52:38.460>
    labels = ['type-feature', 'library', 'release-blocker']
    title = 'os module: use keyword-only arguments for dir_fd and nofollow to reduce function count'
    updated_at = <Date 2015-09-21.23:29:45.974>
    user = 'https://github.com/larryhastings'

    bugs.python.org fields:

    activity = <Date 2015-09-21.23:29:45.974>
    actor = 'python-dev'
    assignee = 'larry'
    closed = True
    closed_date = <Date 2012-06-23.20:41:18.101>
    closer = 'georg.brandl'
    components = ['Library (Lib)']
    creation = <Date 2012-04-19.23:52:38.460>
    creator = 'larry'
    dependencies = []
    files = ['25716', '25818', '25863', '26063', '26080']
    hgrepos = []
    issue_num = 14626
    keywords = ['patch']
    message_count = 33.0
    messages = ['158777', '158779', '158788', '158795', '158796', '158799', '158801', '158802', '159518', '159626', '159772', '161646', '162271', '162274', '162275', '162514', '162548', '162553', '162554', '162555', '163311', '163393', '163464', '163503', '163506', '163507', '163525', '163576', '163643', '163648', '163669', '251254', '251279']
    nosy_count = 12.0
    nosy_names = ['gvanrossum', 'georg.brandl', 'pitrou', 'larry', 'benjamin.peterson', 'Arfrever', 'r.david.murray', 'neologix', 'Yury.Selivanov', 'rosslagerwall', 'python-dev', 'serhiy.storchaka']
    pr_nums = []
    priority = 'release blocker'
    resolution = 'fixed'
    stage = 'patch review'
    status = 'closed'
    superseder = None
    type = 'enhancement'
    url = 'https://bugs.python.org/issue14626'
    versions = ['Python 3.3']

    @larryhastings
    Copy link
    Contributor Author

    There are some functions in the os module that do much the same thing but differ only in minor ways, like

    • whether or not they follow symbolic links (stat vs lstat)
    • taking an extra "dir_fd" parameter (chmod vs fchmodat(3.3))

    It would be better if we consolidated these variations into one function which took keyword-only parameters that allow access to the alternate functionality.

    Here is a sample of the new possible function signatures, as originally proposed by Serhiy Storchaka:

    access(path, mode, *, followlinks=True, dirfd=None, eaccess=False)
    chmod(path, mode, *, followlinks=True, dirfd=None)
    chown(path, uid, gid, *, followlinks=True, dirfd=None)
    link(srcpath, dstpath, *, followlinks=True, srcdirfd=None, dstdirfd=None)
    mkdir(path, mode=0o777, *, dirfd=None)
    mknod(path, mode=0o600, device=0, *, dirfd=None)
    open(path, flag, mode=0o777, *, dirfd=None)
    readlink(path, *, dirfd=None)
    rename(oldpath, newpath, *, olddirfd=None, newdirfd=None)
    stat(path, *, followlinks=True, dirfd=None)
    symlink(src, dst, *, dirfd=None)
    unlink(path, *, removedir=False, dirfd=None)
    utimes(path[, (atime, mtime)], *, ns=False, dirfd=None)
    mkfifoat(path, mode=0o666, *, followlinks=True, dirfd=None)

    My opinion about naming: PEP-8 suggests underscores to separate words in non-class identifiers. So I'd spell those "dir_fd", "src_dir_fd" (etc), "remove_dir", and "follow_symlinks".

    (While thinking about this, I remembered the infrequently-cited maxim "no boolean parameters". But that's more a guideline than a rule, and it tends to be a complaint about how the meaning of the parameter is unclear at a call site. As these will be keyword-only parameters I think their meanings will be perfectly clear, so I shan't worry about it.)

    @larryhastings
    Copy link
    Contributor Author

    storchaka: sorry for the long delay, somehow I missed your reply in python-ideas.)

    You said you envision this as a big patch. Could I convince you to try and make a series of smaller patches? It should be easy to break up into small pieces--do one patch adding dir_fd, the next with follow_symlinks, etc.

    @benjaminp
    Copy link
    Contributor

    I think it most (all?) of these cases, it's better to mirror the os level APIs, since many people already know them. Such fanciness is better left to high level wrappers (like path objects).

    @bitdancer
    Copy link
    Member

    No, Guido's boolean keyword dislike is not about things being unclear at the call site. It's about boolean parameters instead of separate functions. That is (I believe) he prefers lstat/stat to having stat take a boolean keyword, keyword-only or not.

    Did he weigh in on the python-ideas thread?

    @bitdancer bitdancer added the type-feature A feature request or enhancement label Apr 20, 2012
    @larryhastings
    Copy link
    Contributor Author

    r.david.murray said:

    No, Guido's boolean keyword dislike is not about things
    being unclear at the call site.

    I wasn't referring to Guido specifically, just general code smell complaints about boolean parameters.

    That is (I believe) he prefers lstat/stat to having stat take
    a boolean keyword, keyword-only or not.

    Did he weigh in on the python-ideas thread?

    Not per se. But I ran into him when he was leaving PyCon 2012 for good (second night of the sprints iirc), and he steered me to the original python-ideas thread and asked me to follow up on it / work with the poster. (Probably because of my doing the nanosecond-precision os.stat / utime work.) If there's genuine opposition to the patch being generated here I'll ping him.

    Oh, btw storchaka, in the email thread you asked

    how the user code will check the availability of
    dirfd functionality. Special global flags in os or sys?

    No--it'll just be part of a release, and you'll check which version of Python 3 you have before using it.

        if sys.version_info.major >= 3 or sys.version_info.minor >= 3:
           # use extra flags

    p.s. http://mail.python.org/pipermail/python-ideas/2012-March/014482.html

    @serhiy-storchaka
    Copy link
    Member

    Thank you, Larry. I was going to do it, but got stuck with other things.

    The main objective of this proposal is to get rid of litter os module by dozen rarely popular and non-portable functions (introduced by bpo-4761). Moreover, the descriptions are given in alphabetical order and related functions are located far from each other.

    So I'd spell those "dir_fd", "src_dir_fd" (etc), "remove_dir", and "follow_symlinks".

    I follow the common style. followlinks is already used in other functions.

    No--it'll just be part of a release, and you'll check which version of Python 3 you have before using it.

    Presence of function depends not only on the version, but also from the platform.

    Benjamin, therefore I believe it is critically important to do this work before the release of Python 3.3. To people and have not started to use the new features. Otherwise, get rid of them will be more difficult. But I have nothing against to put "at"-functions in a separate submodule (os.posix?).

    David, let lstat remains. fstatat include functionality both stat and lstat (but has a different interface). I suggest to unite fstatat and stat, while maintaining backward compatibility and using a more user-friendly interface.

    In the C extension of the functions is impossible, that is why were introduced new functions with other names.

    @serhiy-storchaka serhiy-storchaka added the stdlib Python modules in the Lib dir label Apr 20, 2012
    @serhiy-storchaka
    Copy link
    Member

    Before starting to code, it is necessary to solve the problem of interface.

    With the majority of the functions all is good, but the link and rename have two dirfd parameters (even with different names). So I suggest two more alternatives.

    One is the filename and dirfd are combined in a tuple. Instead of a tuple, you can specify only the name.

    link ((srcpath, srcdirfd), (dstpath, dstdirfd), *, followlinks=True)

    The other -- dirfds are combined in a tuple. You can specify a number, then dirfd is the same for both filenames.

    link (srcpath, dstpath, *, followlinks=True, dirfd=(srcdirfd, dstdirfd))

    Which of these options (or the original, with different keywords) is preferable?

    @larryhastings
    Copy link
    Contributor Author

    It's true that, for example, dir_fd parameters won't work on Windows. The solution is to always accept the parameters and throw NotImplementedError on platforms where the functionality isn't available.

    Here are my thoughts on the interface for link(). Since the two dir_fd parameters are independent--you might specify none, one, or both--I think the dir_fd=(src,dst) form would be obnoxious. And polymorphic parameters (accept a string or a tuple of string and fd) are nearly always a bad idea; the % operator on strings is a good example of what can go wrong. So I think you should stick with the original interface, with independent explicit keyword arguments.

    I'd prefer that we did this everywhere it made sense, including adding a follow_symlinks parameter to stat(). But obviously you should prioritize places where we want to get rid of functions that are only in trunk (3.3) so far.

    I suppose there's precedent for "followlinks"; it's very old, predating PEP-8. Personally I'd overlook it if I were doing the implementation and spell the new parameters "follow_symlinks"--or at least I'd try it and see what the community thought. Anyway, there's no established precedent for "dir_fd" and "remove_dir". So I'd certainly prefer PEP-8 spellings for those.

    @larryhastings
    Copy link
    Contributor Author

    Here my first stab at a comprehensive proposal. Each section represents a specific new function argument, and a list of functions that the argument be added to.

    All new arguments are keyword-only and optional.

    All functions mentioned are in the os module.


    This annotation:
    foo [X bar]
    means "if we add this argument to function foo, we can remove function bar". This is because bar is a new function only in trunk and has no installed base.

    This annotation:
    foo [- bar]
    means "if we add this argument to function foo, we could theoretically start deprecating function bar". bar shipped in a previous version of Python and we can't simply remove it.

    However! I am *not* proposing doing *any* of these deprecations--I suspect the right thing to do is to leave all the existing functions in.


    follow_symlinks=bool (default True)

    Allow functions to either follow symlinks (the default) or examine symlinks.

    access              [X faccessat]
    chflags             [- lchflags]
    chmod               [- lchmod]
    chown               [- lchown]
    getxattr            [X lgetxattr]
    link                [X linkat]
    listxattr           [X llistxattr]
    removexattr         [X lremovexattr]
    setxattr            [X lsetxattr]
    stat                [- lstat]
    utime               [X lutimes]
    

    effective_ids=bool (default False)

    For functions that take the AT_EACCESS flag or otherwise operate on effective uid/gid.

    access              [X faccessat]
        (this also lets us skip ever adding euidaccess!)
    

    dir_fd=int (default None)

    For functions that take a "dir_fd" parameter instead of / in addition to a "path" parameter.

    access              [X faccessat]
    chmod               [X fchmodat]
    chown               [X fchownat]
    getxattr            [X fgetxattr]
    link                [X linkat] (note! need two parameters here.)
    mkdir               [X mkdirat]
    mkfifo              [X mkfifoat]
    mknod               [X mknodat]
    open                [X openat]
    readlink            [X readlinkat]
    rename              [X renameat]
    stat                [X fstatat]
    symlink             [X symlinkat]
    unlink              [X unlinkat]
    utime               [X futimesat utimensat]
    

    fd=int (default None)

    For functions that take a "path" parameter and have an "f"-equivalent that take an "fd" instead. The "path" parameter and "fd" parameters would be exclusive; you must specify exactly one, never both. Both parameters would accept "None" as equivalent to being unspecified. For the functions that only take one parameter (chdir, listdir, stat, statvfs) I propose we give that parameter a default of None.

    chdir               [- fchmod]
    chown               [- fchown]
    execve              [X fexecve]
    getxattr            [X fgetxattr]
    listdir             [X flistdir]
    listxattr           [X flistxattr]
    removexattr         [X fremovexattr]
    setxattr            [X fsetxattr]
    stat                [- fstat]
    statvfs             [- fstatvfs]
    utime               [X futimes futimens]
    

    remove_dir=bool (default False)

    Allows unlink to behave like rmdir.

    unlink              [X unlinkat] [- rmdir]
    

    One question:

    If we do as I propose, and dir_fd is always a parameter to a pre-existing function, can we drop support for AT_FDCWD? It seems to me that
    funlink("../foo", dir_fd=AT_FDCWD)
    is equivalent to
    unlink("../foo")
    but I fear I'm missing something important.

    @neologix
    Copy link
    Mannequin

    neologix mannequin commented Apr 29, 2012

    Well, it always boils down to the same problem: should we offer thin wrappers around underlying syscalls or offer higher-level functions?
    I personally think that offering mere wrappers around syscalls doesn't make much sense: Python is a very-high level language, and so should its library be.

    Now, I'm in favor of adding optional argument to tune the behavior with regard to symlinks, however I'm skeptical about merging regular syscalls and *at() syscalls, for the following reasons:

    • *at() syscalls are not portable: if it's not supported, what's remove(path='toto', dir_fd=fd) supposed to do? Throw an exception?
    • it's not clear how one's supposed to check that the functions are available: right now, one can do
      if hasattr(os, 'unlinkat'):
      # use unlinkat()
      else:
      # fallback to unlink()
      How would that work with dir_fd argument?
    • some functions have actually really different prototypes:
      for example:
      rename(<old path>, <new path>) -> renameat(<old dir_fd>, <old path>, <new dir_fd>, <new path>)
      trying to coalesce them into a single function will lead to awkward and ugly API: some arguments will be exclusive (which is almost always a sign of a bad API), and the order sounds really wrong:
    stat(path, *, followlinks=True, dirfd=None)
    is backwards, it should be
    stat(dirfd, path, *, followlinks=True)

    since, path is relative to dirfd.

    Actually, the proper way to do this would be to use overloading, but Python doesn't support it (and even if it did, I think overloading would probably be a bad idea anyway).

    • taking for a second the point of view of a user that has never heard of the *at() family, if I come across such a function:
      chmod(path, mode, *, followlinks=True, dir_fd=None)
      I'll probably have to think some time to understand what the hell is the last dir_fd argument. Renaming, removing a file is simple and direct, so should the API be.
    • finally, although useful, the *at() family is a really poor and confusing API, which will likely be used by less than 0.1% of the overall Python users. Complicating the whole posix module API just to shave off a few functions is IMHO a bad idea.

    @larryhastings
    Copy link
    Contributor Author

    I personally think that offering mere wrappers around syscalls doesn't
    make much sense: Python is a very-high level language, and so should
    its library be.

    I very much agree. I suggest anyone who thinks the os module is a thin
    veneer over syscalls needs to examine the Win32 implementation of os.stat.

    • *at() syscalls are not portable: if it's not supported, what's
      remove(path='toto', dir_fd=fd) supposed to do? Throw an exception?

    NotImplementedError.

    • it's not clear how one's supposed to check that the functions are
      available: right now, one can do [a hasattr check]
      How would that work with dir_fd argument?

    try/except. If someone (maybe even me) championed PEP-362 (Function
    Signature Object, introspection on functions) then we could LBYL,
    but that's only a theory right now.

    stat(path, *, followlinks=True, dirfd=None)
    is backwards, it should be
    stat(dirfd, path, *, followlinks=True)
    since, path is relative to dirfd.

    You could hypothetically rearrange all the arguments
    at the call site by using 100% keyword arguments:
    stat(dir_fd=whatever, path=thingy, follow_symlinks=True)

    I admit it's unlikely anyone will bother.

    Actually, the proper way to do this would be to use overloading, but
    Python doesn't support it (and even if it did, I think overloading
    would probably be a bad idea anyway).

    Speaking *purely hypothetically*, we could actually overload it. It'd
    be at runtime, as what in Python is not. Since stat is implemented in
    C it would look like:

      if (-1 == PyArg_ParseTupleAndKeyword(arguments-with-dir_fd)) {
        PyErr_Clear();
        if (-1 == PyArg_ParseTupleAndKeyword(arguments-without-dir_fd)) {
          /* etc. */

    However, Python doesn't have overloading because we shouldn't do it.
    I'm against overloading here, for the same reason that I'm against
    argument polymorphism (fold faccess into access by allowing the first
    argument to be either an int or a string). This sort of polymorphism
    leads to lack of clarity and awkward problems.

    I'll probably have to think some time to understand what the hell is
    the last dir_fd argument. Renaming, removing a file is simple and
    direct, so should the API be.

    I counter with TOOWWTDI. There are currently four chmod functions in
    os: chmod, fchmod, fchmodat, and lchmod. If you want to change the
    mode of a file, which do you use?

    You may also face the problem that you don't want chmod to follow
    symbolic links, yet you've never heard of lchmod.

    I feel the sad truth of the situation is, it is desirable that Python
    support all this functionality, but there is no simple and
    obviously-right way to do so. So which alternative is less undesirable:
    a combinatoric explosion of functions, or a combinatoric explosion of
    arguments to existing functions? I suggest the latter is less
    undesirable because it is marginally clearer: at least you always
    know which function to call, and the documentation on how to use it
    in its multiple forms will be all in one place.

    (Though this would not help the hapless programmer who suspected maybe
    there *was* a second function somewhere, searching forever for a
    function that doesn't actually exist.)

    Also, I suggest that the programmer reading the documentation for os
    would see the same parameters (dir_fd, follow_symlinks) as part of many
    different functions and would quickly learn their semantics.

    • finally, although useful, the *at() family is a really poor and
      confusing API, which will likely be used by less than 0.1% of the
      overall Python users. Complicating the whole posix module API just
      to shave off a few functions is IMHO a bad idea.

    I understand your critique. In defense of the proposal, I will say that
    when Guido asked me to connect with the storchaka and try to shepherd
    the process, he specifically cited the *at() functions as examples of
    new functions that could be obviated by folding the functionality
    into the existing APIs.

    Not that I claim this gives me carte blanche to check an implementation
    in. Simply that I believe Guido liked the idea in the abstract. It
    could be that he wouldn't like the result, or that my proposal (which
    purposely carried out the idea to its logical extreme) goes too far.

    In the absence of a BDFL ruling on the proposal, I think I should
    produce a patch implementing the full proposal, then bring it up in
    c.l.p-d and get a community vote. How's that sound?

    @larryhastings
    Copy link
    Contributor Author

    Here's my first pass at a patch. For this patch,
    I took the proposal to its logical extreme: I removed
    every function in os that was both mildly redundant
    with an existing function *and* has been added since
    3.2, and moved that functionality to the equivalent
    existing function, making it accessible with the use
    of keyword-only parameters.

    Specifically:

    This function has been removed, and instead
    | this parameter has been added to
    | | this function
    | | |
    v v v
    -------------------------------------
    faccessat dir_fd access
    faccessat effective_ids access
    faccessat follow_symlinks access
    fchmodat dir_fd chmod
    fchmodat follow_symlinks chmod
    fchownat dir_fd chown
    fchownat follow_symlinks chown
    fexecve fd execve
    fgetxattr fd getxattr
    flistdir fd listdir
    flistxattr fd listxattr
    fremovexattr fd removexattr
    fsetxattr fd setxattr
    fstatat dir_fd stat
    futimens fd utime
    futimes fd utime
    futimesat dir_fd utime
    lgetxattr follow_symlinks getxattr
    linkat dst_dir_fd link
    linkat src_dir_fd link
    linkat follow_symlinks link
    llistxattr follow_symlinks listxattr
    lremovexattr follow_symlinks removexattr
    lsetxattr follow_symlinks setxattr
    lutimes follow_symlinks utime
    mkdirat dir_fd mkdir
    mkfifoat dir_fd mkfifoat
    mknodat dir_fd mknod
    open dir_fd openat
    readlinkat dir_fd readlink
    renameat dst_dir_fd rename
    renameat src_dir_fd rename
    symlinkat dir_fd symlink
    unlinkat dir_fd unlink
    unlinkat remove_directory unlink
    utimensat dir_fd utime
    utimensat follow_symlinks utime

    Additionally, we *could* deprecate this function,
    | as I have added this parameter
    | | to this function:
    | | |
    v v v
    --------------------------------------
    fchdir fd chdir
    fchmod fd chmod
    fstat fd stat
    fstatvfs fd statvfs
    lchflags follow_symlinks chflags
    lchmod follow_symlinks chmod
    fchown fd chown
    lchown follow_symlinks chown
    lstat follow_symlinks stat

    I doubt we'll ever deprecate those functions.
    This patch does not deprecate those functions.
    I don't propose deprecating those functions.

    Notes:

    • What do you do on platforms where the functionality isn't available?
      I believe it's better to always accept parameters, but throw a
      NotImplementedError if the functionality they represent is
      unavailable on the current platform. Adding and removing
      parameters based on the current platform... that way lies madness.
      (It's like scrollbars that pop in and out of existance depending
      on whether or not you need them. Better that the scrollbars are
      always there, and simply disabled when the content fits in the
      current window. Users like a stable foundation under their feet.)

    • The patch is... pretty big. But you can divide-and-conquer it
      into a series of self-contained parts. First I add path_converter,
      then I modify existing functions / remove new functions. Each of
      those can be reviewed in isolation.

      Also, the new implementations generally follow the same pattern:
      initialize
      call PyArg_ParseTupleAndKeywords
      error out early if user asks for functionality
      unavailable on the current platform
      ensure that combinations of parameters
      (dir_fd, fd, follow_symlinks) are permitted
      do actual work, turning on/off advanced functionality
      based on configure ifdefs (HAVE_FCHOWNAT etc)
      if error, raise exception
      compute return value
      exit:
      cleanup all path variables
      return return_value

      Here's a list of all the functions I added arguments to:
      access chdir chflags chmod chown execve getxattr link listdir
      listxattr mkdir mkfifo mknod open readlink removexattr rename
      setxattr stat statvfs symlink unlink utime

    • The one new bit of technology: a PyArg_ParseTuple "converter"
      function called path_converter. Here's its documentation:

      /*
       * A PyArg_ParseTuple "converter" function
       * that handles filesystem paths in the manner
       * preferred by the os module.
       *
       * path_converter accepts (Unicode) strings and their
       * subclasses, and bytes and their subclasses.  What
       * it does with the argument depends on the platform:
       *
       *   * On Windows, if we get a (Unicode) string we
       *     extract the wchar_t * and return it; if we get
       *     bytes we extract the char * and return that.
       *
       *   * On all other platforms, strings are encoded
       *     to bytes using PyUnicode_FSConverter, then we
       *     extract the char * from the bytes object and
       *     return that.
       *
       * Input fields:
       *   path.nullable
       *     If nonzero, the path is permitted to be None.
       *   path.function_name
       *     If non-NULL, path_converter will use that as the name
       *     of the function in error messages.
       *     (If path.argument_name is NULL it omits the function name.)
       *   path.argument_name
       *     If non-NULL, path_converter will use that as the name
       *     of the parameter in error messages.
       *     (If path.argument_name is NULL it uses "path".)
       *
       * Output fields:
       *   path.wide
       *     Points to the path if it was expressed as Unicode
       *     and was not encoded.  (Only used on Windows.)
       *   path.narrow
       *     Points to the path if it was expressed as bytes,
       *     or it was Unicode and was encoded to bytes.
       *   path.length
       *     The length of the path in characters.
       *   path.object
       *     The original object passed in.
       *   path.cleanup
       *     For internal use only.  May point to a temporary object.
       *     (Pay no attention to the man behind the curtain.)
       *
       *   At most one of path.wide or path.narrow will be non-NULL.
       *   If path was None and path.nullable was set,
       *     both path.wide and path.narrow will be NULL,
       *     and path.length will be 0.
       *   
       *   path_converter takes care to not write to the path_t
       *   unless it's successful.  However it must reset the
       *   "cleanup" field each time it's called.
       *
       * Use as follows:
       *      path_t path;
       *      memset(&path, 0, sizeof(path));
       *      PyArg_ParseTuple(args, "O&", path_converter, &path);
       *      // ... use values from path ...
       *      path_cleanup(&path);
       *
       * (Note that if PyArg_Parse fails you don't need to call
       * path_cleanup().  However it is safe to do so.)
       */
      typedef struct {
          char *function_name;
          char *argument_name;
          int nullable;
          wchar_t *wide;
          char *narrow;
          Py_ssize_t length;
          PyObject *object;
          PyObject *cleanup;
      } path_t;

    I assert path_converter is Very Useful. It nearly always
    reduced the argument processing from three code paths
    (two for Windows, one for everyone else) to one. Even if
    the rest of the patch isn't accepted I'm sure we'll keep
    path_converter.

    • In a lot of places I combined together several functions, or
      several large blobs of #ifdef'd code, into one large
      slightly-shaggy function. Like, Windows code is now often
      streamlined in, instead of being a big separate #ifdef. And
      instead of three passes at decoding arguments (one for Windows
      wide, one for Windows narrow, one for everyone else) there is
      generally just one. (Thanks, path_converter!) Again, even
      if we don't keep the extra keyword arguments / functionality,
      I'm hoping at least some of that cleanup will survive.

    • utime is the whale. It obviates *five* other functions
      (futimens, futimes, futimesat, lutimes, and utimensat).
      I spent a fair amount of time refactoring it. It's so complex,
      it was hard to make it anything like readable. I like to
      think I reached a local maximum of readability given what I had
      to work with, and I like to think that the resulting dog's
      breakfast is less unreadable than the previous iteration. Anyway
      I'm definitely open to suggestions on how to restructure it
      further to enhance readability.

    • The tests used to use e.g. hasattr(os, 'futimes') and the like to
      determine what functionality was available. In order to LBYL,
      I had to resort to e.g. sysconfig.get_config_var('HAVE_FUTIMES').
      If users need to LBYL here too, well, it's kind of a problem.
      (I hope function signature objects solve this--we're talking
      about it.)

    • On a very related topic: os.openat() is one of the functions I
      removed. And then we have this in Lib/os.py:
      if _exists('openat'):
      def fwalk(...)
      Since I removed os.openat completely, we could longer tell
      whether or not the functionality is available. And I couldn't
      use my sysconfig.get_config_var trick because it's os.py and
      we're bootstrapping the interpreter. So I went with a dirty
      hack--uh, I mean, practicality beats purity. posixmodule.c
      conditionally exposes a symbol called _HAVE_OPENAT if openat()
      is available, and fwalk() is now gated on that.

    • A minor semantic change: symlink() now accepts the same arguments
      everywhere. Previously it had an extra optional argument only on
      Windows ("target_is_directory"). Non-Windows now accepts that too
      and ignores it, in much the same way that os.mkdir ignores the mode
      on Windows. Also, os.symlink now accepts byte strings for the
      paths on Windows. (Even though we'll remove byte string paths
      in 3.4, yes?)

    • The patch is still sloppy:

      • The docstrings are incomplete.
      • I haven't touched the docs at all.
      • There are > 80 col lines.
        Clearly I'll clean up this stuff if the patch has a shot at
        going in.

    I do think it's an improvement. But I won't check any part of it
    in without some consensus (or BFDL ruling).

    I look forward to your feedback!

    @larryhastings larryhastings self-assigned this May 26, 2012
    @larryhastings
    Copy link
    Contributor Author

    Second pass at my patch. Incorporates suggestions from Serhiy's review--thanks, Serhiy!

    Still not ready for checkin. > 80 col lines, no docs, docstrings are messy. But code is ready for (further) review. Code passes regression test suite without errors.

    @serhiy-storchaka
    Copy link
    Member

    Well, I'm going to ignore the long lines and documentation. The patch is
    really big and impressive.

    @larryhastings
    Copy link
    Contributor Author

    I'm not sure that "long" and "impressive" are words that go together when describing a patch ;-)

    @larryhastings
    Copy link
    Contributor Author

    Here's a nice fresh minor update.

    Notes on this third patch:

    • The docstrings are now done.

    • I discovered that fchmodat() doesn't actually support
      AT_SYMLINK_NOFOLLOW! glibc documents using the flag, then
      comically notes that it doesn't actually work. Specifying
      it results in ENOTSUP every time. os.chmod() now accomodates
      this; it throws NotImplementedError if it detects this situation.
      However it should also Just Work once glibc supports it.

    • While editing the docstrings, I noticed that several of them
      had old-style octal constants ;-) I fixed 'em.

    • I added support for the remove_dir argument to unlink/remove
      on Windows. I also made it work on non-Windows even when
      unlinkat() is not available.

    @larryhastings
    Copy link
    Contributor Author

    BTW: If PEP-362 is accepted, and this patch makes it for 3.3 (both of which I think will happen), I'll hand-code signatures for the functions that may throw NotImplementedError so users can use "is_implemented" to LBYL.

    @gvanrossum
    Copy link
    Member

    I haven't read the code, but from Larry's description this looks great to me. It's amazing how many extra functions were added to the os module since 3.2! I also agree that the redundant functions that existed in 3.2 should stay and I don't see it's fair to deprecate them. I do hope that not too many people have written code based on the 3.3 alphas using all those extra functions, but I suppose they will get what they paid for.

    Everything else Larry wrote also sounds reasonable to me.

    @Arfrever
    Copy link
    Mannequin

    Arfrever mannequin commented Jun 9, 2012

    Previously existing redundant functions could be deprecated in documentation.

    @larryhastings
    Copy link
    Contributor Author

    Previously existing redundant functions could be
    deprecated in documentation.

    As in, don't start a "deprecation cycle" (warning in 3.3, deprecated in 3.4, gone in 3.5), just document "consider using this other function instead"? That's probably worth doing. I wouldn't use the word "deprecated" though, I'd just suggest a "see also".

    Maybe we could remove the redundant functions in 4.0. I'll put it on my wishlist :)

    @larryhastings
    Copy link
    Contributor Author

    New patch! What's new:

    • The big change: I removed the "fd=" parameters. Now, functions that
      accept either a path or a file descriptor simply take either as their
      "path" argument. I ran it by Guido and he thought it sounded fine,
      so I tried it--and I think it's a definite improvement. (Thanks to
      Jim Jewett for suggesting it--tbh I'd considered it before, but
      looking at it through fresh eyes helped!)

    • Also new in this patch: you can now LBYL for the fd, dir_fd, and
      follow_symlinks parameters. Just check to see if the function is
      in os.supports_<name of parameter>. For example:
      if os.chown in os.supports_dir_fd:
      os.chown(path, dir_fd=whatnot)

    • The third big bit of news: the patch works under Windows!

    • I attempted to support Mac OS X 10.3, specifically the weak linking
      to statvfs, fstatvfs, and lchown. However I don't have a Mac (much
      less one running 10.3) so I can't test this.

    I *think* the docstrings are all fixed. The only thing I know that
    needs to be done are the docs (and Misc/NEWS).

    I really wanna get this in before the feature freeze. I promise to
    support it through the betas... can I puh-leez check it in?

    @larryhastings
    Copy link
    Contributor Author

    Fifth iteration of my patch. Everything is done, and I really think it's ready to be checked in.

    • The documentation is done, including Misc/NEWS.
    • All the code is now < 80 columns.
    • The docstrings have been double- and triple-checked.
    • It passes all regression tests on Linux 64-bit; it processes the
      regression test suite identically to trunk on Windows 32-bit.

    @serhiy-storchaka
    Copy link
    Member

    > http://bugs.python.org/review/14626/diff/5182/Doc/library/os.rst#newcode1210
    Doc/library/os.rst:1210: using it will raise a
    :exc:`NotImplementedError`.
    Same as above: would it make sense to ignore the arg if follow_symlinks
    is not supported, but the fact is irrelevant because the system doesn't
    support symlinks at all?

    And what if the system supports symlink and doesn't support
    at-functions, follow_symlinks=False and file is not symlink. Should it
    be NotImplementedError?

    @python-dev
    Copy link
    Mannequin

    python-dev mannequin commented Jun 22, 2012

    New changeset 27f9c26fdd8b by Larry Hastings in branch 'default':
    Issue bpo-14626: Large refactoring of functions / parameters in the os module.
    http://hg.python.org/cpython/rev/27f9c26fdd8b

    @python-dev
    Copy link
    Mannequin

    python-dev mannequin commented Jun 23, 2012

    New changeset 04fd8f77a58e by Larry Hastings in branch 'default':
    Issue bpo-14626: Fix buildbot issues on FreeBSD (AMD64). (Fingers crossed.)
    http://hg.python.org/cpython/rev/04fd8f77a58e

    @python-dev
    Copy link
    Mannequin

    python-dev mannequin commented Jun 23, 2012

    New changeset e1e0eeb07398 by Larry Hastings in branch 'default':
    Issue bpo-14626: Fix buildbot issue on x86 Tiger 3.x.
    http://hg.python.org/cpython/rev/e1e0eeb07398

    @python-dev
    Copy link
    Mannequin

    python-dev mannequin commented Jun 23, 2012

    New changeset 66f7377547d5 by Larry Hastings in branch 'default':
    Issue bpo-14626: Fix buildbot issue on OpenIndiana 3.x machines. (Hopefully.)
    http://hg.python.org/cpython/rev/66f7377547d5

    @pitrou
    Copy link
    Member

    pitrou commented Jun 23, 2012

    27f9c26fdd8b broke test_shutil on the Windows buildbots:

    ======================================================================
    FAIL: test_basic (test.test_shutil.TestWhich)
    ----------------------------------------------------------------------

    Traceback (most recent call last):
      File "D:\cygwin\home\db3l\buildarea\3.x.bolen-windows7\build\lib\test\test_shutil.py", line 1146, in test_basic
        self.assertEqual(rv, self.temp_file.name)
    AssertionError: None != 'c:\\users\\db3l\\appdata\\local\\temp\\tmpxqw4gu\\tmp7ugfmm.exe'

    ======================================================================
    FAIL: test_full_path_short_circuit (test.test_shutil.TestWhich)
    ----------------------------------------------------------------------

    Traceback (most recent call last):
      File "D:\cygwin\home\db3l\buildarea\3.x.bolen-windows7\build\lib\test\test_shutil.py", line 1152, in test_full_path_short_circuit
        self.assertEqual(self.temp_file.name, rv)
    AssertionError: 'c:\\users\\db3l\\appdata\\local\\temp\\tmpmwer14\\tmpeacfbz.exe' != None

    ======================================================================
    FAIL: test_non_matching_mode (test.test_shutil.TestWhich)
    ----------------------------------------------------------------------

    Traceback (most recent call last):
      File "D:\cygwin\home\db3l\buildarea\3.x.bolen-windows7\build\lib\test\test_shutil.py", line 1158, in test_non_matching_mode
        self.assertIsNone(rv)
    AssertionError: 'c:\\users\\db3l\\appdata\\local\\temp\\tmp7n6ojp\\tmp5tt9pa.exe' is not None

    ======================================================================
    FAIL: test_pathext_checking (test.test_shutil.TestWhich)
    ----------------------------------------------------------------------

    Traceback (most recent call last):
      File "D:\cygwin\home\db3l\buildarea\3.x.bolen-windows7\build\lib\test\test_shutil.py", line 1181, in test_pathext_checking
        self.assertEqual(self.temp_file.name, rv)
    AssertionError: 'c:\\users\\db3l\\appdata\\local\\temp\\tmpipmbe3\\tmpx43hex.exe' != None

    ======================================================================
    FAIL: test_relative (test.test_shutil.TestWhich)
    ----------------------------------------------------------------------

    Traceback (most recent call last):
      File "D:\cygwin\home\db3l\buildarea\3.x.bolen-windows7\build\lib\test\test_shutil.py", line 1166, in test_relative
        self.assertEqual(rv, os.path.join(tail_dir, self.file))
    AssertionError: None != 'tmpcluw7l\\tmp6sy_py.exe'

    @pitrou
    Copy link
    Member

    pitrou commented Jun 23, 2012

    The rmdir argument to unlink() looks completely crazy to me. Instead rmdir() should be called when one wants to rmdir() a fd.

    @birkenfeld
    Copy link
    Member

    I agree with Antoine. I've opened bpo-15154 to track this.

    @birkenfeld
    Copy link
    Member

    The Windows buildbots are now content; closing.

    @python-dev
    Copy link
    Mannequin

    python-dev mannequin commented Sep 21, 2015

    New changeset a138a1131bae by Victor Stinner in branch 'default':
    Issue bpo-25207, bpo-14626: Fix ICC compiler warnings in posixmodule.c
    https://hg.python.org/cpython/rev/a138a1131bae

    @python-dev
    Copy link
    Mannequin

    python-dev mannequin commented Sep 21, 2015

    New changeset 170cd0104267 by Victor Stinner in branch 'default':
    Issue bpo-25207, bpo-14626: Fix my commit.
    https://hg.python.org/cpython/rev/170cd0104267

    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    release-blocker stdlib Python modules in the Lib dir type-feature A feature request or enhancement
    Projects
    None yet
    Development

    No branches or pull requests

    7 participants