classification
Title: Optimize construction of Path from other Paths by just returning the same object?
Type: Stage: patch review
Components: Library (Lib) Versions: Python 3.10
process
Status: open Resolution:
Dependencies: Superseder:
Assigned To: Nosy List: barneygale, brett.cannon, pitrou, remi.lapeyre, rhettinger
Priority: normal Keywords: patch

Created on 2020-02-28 11:36 by Antony.Lee, last changed 2020-06-21 09:44 by Antony.Lee.

Pull Requests
URL Status Linked Edit
PR 20288 open remi.lapeyre, 2020-05-21 12:00
Messages (6)
msg362874 - (view) Author: Antony Lee (Antony.Lee) * Date: 2020-02-28 11:36
Many functions which take a path-like object typically also accept strings (sorry, no hard numbers here).  This means that if the function plans to call Path methods on the object, it needs to first call Path() on the arguments to convert them, well, to Paths.  This adds an unnecessary cost in the case where the argument is *already* a Path object (which should become more and more common as the use of pathlib spreads), as Path instantiation is not exactly cheap (it's on the order of microseconds).

Instead, given that Paths are immutable, `Path(path)` could just return the exact same path instance, completely bypassing instance creation (after checking that the argument's type exactly matches whatever we need and is not, say, PureWindowsPath when we want to instantiate a PosixPath, etc.).  Note that there is prior art for doing so in CPython: creating a frozenset from another frozenset just returns the same instance:
```
In [1]: s = frozenset({1}); id(s) == id(frozenset(s)) == id(s.copy())
Out[1]: True
```
msg362884 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2020-02-28 16:07
Possibly this could be done just by adding @lru_cache to the __new__() method.
msg362886 - (view) Author: Rémi Lapeyre (remi.lapeyre) * Date: 2020-02-28 16:23
There is the _closed attribute thought that is mutated by some methods.
msg362887 - (view) Author: Antony Lee (Antony.Lee) * Date: 2020-02-28 17:43
Decorating __new__ with lru_cache would likely run into memory leakage problems? (one would need a "weak lru_cache", I guess).

I didn't know about the _closed attribute. From a quick look it appears to only be settable by using the path (not the actual file) as a context manager, and only serves to block further filesystem methods, but I'm not even sure why? (for example, it doesn't block fspath, so it won't prevent operations via os.path functions anyways...)
msg362919 - (view) Author: Rémi Lapeyre (remi.lapeyre) * Date: 2020-02-28 21:41
> Decorating __new__ with lru_cache would likely run into memory leakage problems?

I think the LRU cache would be for returning the same instance when called with the same string. I don't think it would be needed to return the same instance when called with a Path instance.

> From a quick look it appears to only be settable by using the path (not the actual file) as a context manager, and only serves to block further filesystem methods, but I'm not even sure why?

This came up in bpo-39682 recently, I'm not sure what this flag is supposed to mean and I could not find it in the documentation.

Since it's uncorrelated to the file object and the physical file, I think it may actually be dangerous to rely on this flag. I don't know why is the purpose of using Path as a context manager.
msg362940 - (view) Author: Barney Gale (barneygale) * Date: 2020-02-28 23:23
Attempted fix shown here: https://github.com/barneygale/cpython/commit/784630ef6ad05031abdefa523e61e0629b15e201

Note that I've already removed context manager support (and hence `_closed`) in this branch.
History
Date User Action Args
2020-06-21 09:44:05Antony.Leesetnosy: - Antony.Lee
2020-06-21 00:46:22remi.lapeyresetversions: + Python 3.10, - Python 3.9
2020-05-21 12:00:05remi.lapeyresetkeywords: + patch
stage: patch review
pull_requests: + pull_request19563
2020-02-28 23:23:32barneygalesetnosy: + barneygale
messages: + msg362940
2020-02-28 21:41:52remi.lapeyresetmessages: + msg362919
2020-02-28 17:43:18Antony.Leesetmessages: + msg362887
2020-02-28 16:23:17remi.lapeyresetnosy: + remi.lapeyre
messages: + msg362886
2020-02-28 16:07:14rhettingersetnosy: + rhettinger
messages: + msg362884
2020-02-28 14:18:38xtreaksetnosy: + brett.cannon, pitrou
2020-02-28 11:36:37Antony.Leecreate