This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: pathlib paths .normalize()
Type: enhancement Stage: resolved
Components: Library (Lib) Versions: Python 3.9
process
Status: closed Resolution: rejected
Dependencies: Superseder:
Assigned To: Nosy List: brett.cannon, iciocirlan, pitrou, serhiy.storchaka, veky, xtreak
Priority: normal Keywords:

Created on 2019-11-26 22:23 by iciocirlan, last changed 2022-04-11 14:59 by admin. This issue is now closed.

Messages (11)
msg357534 - (view) Author: Ionuț Ciocîrlan (iciocirlan) Date: 2019-11-26 22:23
pathlib paths should expose a `.normalize()` method. This is highly useful, especially in web-related scenarios.

On `PurePath` its usefulness is obvious, but it's debatable for `Path`, as it would yield different results from `.resolve()` in case of symlinks (which resolves them before normalizing).
msg357557 - (view) Author: Karthikeyan Singaravelan (xtreak) * (Python committer) Date: 2019-11-27 05:28
Can you please add an example of how normalize() should behave? I assume you want the same behaviour as os.path.normpath which already accepts a pathlike object to be added to pathlib.
msg357575 - (view) Author: Brett Cannon (brett.cannon) * (Python committer) Date: 2019-11-27 17:28
Do note that Path inherits from PurePath, so providing a normalize() method on the latter means it will end up on the former.
msg357633 - (view) Author: Ionuț Ciocîrlan (iciocirlan) Date: 2019-11-29 04:04
> Can you please add an example of how normalize() should behave?

```
>>> mypath = PurePosixPath("foo/bar/bzz")
>>> mypath /= "../../"
>>> mypath
PurePosixPath('foo/bar/bzz/../..')
>>> mypath = mypath.normalize()
>>> mypath
PurePosixPath('foo')
>>> mypath /= "../../../there"
>>> mypath
PurePosixPath('foo/../../../there')
>>> mypath = mypath.normalize()
>>> mypath
PurePosixPath('../../there')
>>> mypath /= "../../and/back/again"
>>> mypath
PurePosixPath('../../there/../../and/back/again')
>>> mypath = mypath.normalize()
>>> mypath
PurePosixPath('../../../and/back/again')
```

> I assume you want the same behaviour as os.path.normpath which already accepts a pathlike object to be added to pathlib.

Yes, exactly the same behaviour, but arguing that normpath() can take a pathlib object is just saying that it saves you from doing an intermediate str(), which is, well, nice, but still not pretty. Consider `mypath = mypath.normalize()` vs. `mypath = PurePosixPath(normpath(mypath))`.

> Do note that Path inherits from PurePath, so providing a normalize() method on the latter means it will end up on the former.

That could be "circumvented" with a bit of code shuffling, e.g. moving everything from `PurePath` to a `PathBase` or `_Path` or somesuch, and forking the inheritance from there. On the other hand, it might be useful. I personally can't think of a scenario, but the GNU folk certainly think so, see `realpath --logical`: https://www.gnu.org/software/coreutils/manual/html_node/realpath-invocation.html
msg357634 - (view) Author: Karthikeyan Singaravelan (xtreak) * (Python committer) Date: 2019-11-29 05:05
> Yes, exactly the same behaviour, but arguing that normpath() can take a pathlib object is just saying that it saves you from doing an intermediate str(), which is, well, nice, but still not pretty. Consider `mypath = mypath.normalize()` vs. `mypath = PurePosixPath(normpath(mypath))`.

From my experience in the past the intention has been to keep the API minimal and below are some recent additions. Many discussions lead to the answer over using a function that accepts a pathlike object already and if not add support for it than add the API to pathlib itself. I will leave it to the experts on this.

realink : issue30618
link_to : issue26978
msg357635 - (view) Author: Vedran Čačić (veky) * Date: 2019-11-29 07:23
I think the real issue here

> mypath = PurePosixPath(normpath(mypath))

is the PurePosixPath wrapper. It is nice that normpath _accepts_ pathlike objects, but it should then not return a generic str. It should try to return an object of the same type.

Of course it's harder to do, especially in presence of pathlike objects of unknown classes, but with some reasonable assumptions on the constructors, it can be done---and it's much more useful. The similar debate, with similar conclusions, has already happened with datetime-like objects.
msg357650 - (view) Author: Brett Cannon (brett.cannon) * (Python committer) Date: 2019-11-29 18:47
> From my experience in the past the intention has been to keep the API minimal

Correct, all of os.path and shutils does not need to end up in pathlib. :) Hence why the request to add things is always tough as we have to try and strike a balance of useful but not overwhelming/overdone (and what is "useful" varies from person to person).

> It is nice that normpath _accepts_ pathlike objects, but it should then not return a generic str. It should try to return an object of the same type.

It's an interesting idea, but it's also difficult to get right, even with assumptions as things that represent a path are nowhere near as unified as dates. There would also be a ton of converting back and forth in os.path as functions call other functions to get the path, manipulate it, and then wrap it back up.

But if someone can come up with a concrete proposal with some example implementation and brings it to python-ideas it could be discussed.
msg357653 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2019-11-29 20:33
There were reasons why something like PurePath.normalize() was not added at first place. os.path.normpath() is broken by design. It does not work as you expect in case when the .. component is preceeded by a symlink. Its behvior can lead to bugs and maybe even security issues. We did not want to add something so dubious in the pathlib module. Path.resolve() is the correct way.

So I suggest to close this issue.
msg357721 - (view) Author: Brett Cannon (brett.cannon) * (Python committer) Date: 2019-12-02 17:47
I'm going with Serhiy's recommendation and closing this. Sorry, Ionuț.
msg357724 - (view) Author: Ionuț Ciocîrlan (iciocirlan) Date: 2019-12-02 18:03
Brett and Serhiy, you do realise there are no symlinks to resolve on PurePaths, right?


> os.path.normpath() is broken by design.

Why don't you deprecate it then? Sounds like the reasonable thing to do, no? So many innocent souls endangered by this evil function...

It's broken by design if you use it to shoot yourself in the foot. If you want however to normalize an abstract path, an absolutely reasonable thing to do, it does the right and very useful thing. Because, well, the filesystem isn't the only thing that has paths and other things don't have symlinks. Also, this lib is called pathlib, not fspathlib, *and* someone had the foresight of separating filesystem paths from abstract paths. Quite a strange series of coincidences, no?

Let me quote the initial comment for this issue, which apparently noone read:

> On `PurePath` its usefulness is obvious, but it's debatable for `Path`, as it would yield different results from `.resolve()` in case of symlinks (which resolves them before normalizing).
msg357758 - (view) Author: Brett Cannon (brett.cannon) * (Python committer) Date: 2019-12-03 18:37
While I understand you're disappointed, do realize that the tone of your response isn't necessary. I'm going to assume you didn't mean for it to come off as confrontational and still provide a reply.

> you do realise there are no symlinks to resolve on PurePaths, right?

Yes.

> Why don't you deprecate it then?

Because the amount of code that would break for those that are willing to deal with its drawbacks is way too vast. But just because we keep that function around even with its drawbacks doesn't meant we want to propagate that in newer code.

> Let me quote the initial comment for this issue, which apparently noone read

We read it, but as I said in response, "Path inherits from PurePath, so providing a normalize() method on the latter means it will end up on the former". Now I know you suggested putting in code to somehow hide it from Path, but we try to avoid being so magical in the stdlib, especially when it would require some careful explanation in the docs that for some reason a method on an inherited class wasn't available.

Please note you can also use your own subclass or function to get the functionality you are after. There is nothing special to what you are asking for that requires inclusion in the stdlib.
History
Date User Action Args
2022-04-11 14:59:23adminsetgithub: 83105
2019-12-03 18:37:12brett.cannonsetmessages: + msg357758
2019-12-02 18:03:22iciocirlansetmessages: + msg357724
2019-12-02 17:47:39brett.cannonsetstatus: open -> closed
resolution: rejected
messages: + msg357721

stage: resolved
2019-11-29 20:33:21serhiy.storchakasetmessages: + msg357653
2019-11-29 18:47:22brett.cannonsetmessages: + msg357650
2019-11-29 07:23:34vekysetnosy: + veky
messages: + msg357635
2019-11-29 05:05:11xtreaksetnosy: + pitrou, serhiy.storchaka
messages: + msg357634
2019-11-29 04:04:27iciocirlansetmessages: + msg357633
2019-11-27 17:28:57brett.cannonsetnosy: + brett.cannon
messages: + msg357575
2019-11-27 05:28:04xtreaksetnosy: + xtreak
messages: + msg357557
2019-11-26 22:23:45iciocirlancreate