classification
Title: pathlib.Path.with_name() handles '.' and '..' inconsistently
Type: behavior Stage: patch review
Components: Library (Lib) Versions: Python 3.9
process
Status: open Resolution:
Dependencies: Superseder:
Assigned To: brett.cannon Nosy List: Jeffrey.Kintscher, Nophke, brett.cannon, lucas.steinmann, pitrou, terry.reedy
Priority: normal Keywords: patch

Created on 2019-06-02 01:18 by Nophke, last changed 2019-07-26 22:31 by Jeffrey.Kintscher.

Pull Requests
URL Status Linked Edit
PR 14022 open Nophke, 2019-06-12 17:10
Messages (15)
msg344250 - (view) Author: N.P. Khelili (Nophke) * Date: 2019-06-02 01:18
Hi guys!

I'm new to python and working on my first real project with it....
I'm sorry if it's not the right place for posting this.

I noticed that that pathlib.with_name() method does not accept to give a name to a path that does not already have one.

It seems a bit inconsistent knowing that the Path constructor does not require one...

>>> Path()
PosixPath('.')

>>> Path().resolve()
PosixPath('/home/nono')

but:
 
>>> Path().with_name('dodo')

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/lib/python3.7/pathlib.py", line 819, in with_name
    raise ValueError("%r has an empty name" % (self,))
ValueError: PosixPath('.') has an empty name

whereas if you do:

>>> Path().resolve().with_name('dodo')
PosixPath('/home/dodo')

I first tought "explicit is better than implicit" and then why not allways use resolve first! That was not a big deal but then I tried:

>>> Path('..').with_name('dudu').resolve()
PosixPath('/home/nono/dudu')

( ! )

>>> Path('..').resolve().with_name('dudu')
PosixPath('/dudu')

It seems that the dots and slashes are in fact not really interpreted leading to:

>>> Path('../..').with_name('dudu')
PosixPath('../dudu')

>>> Path('../..').with_name('dudu').resolve()
PosixPath('/home/dudu')

( ! )
 
>>> Path('../..').resolve().with_name('dudu')

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/lib/python3.7/pathlib.py", line 819, in with_name
    raise ValueError("%r has an empty name" % (self,))
ValueError: PosixPath('/') has an empty name

Even if the doc briefly tells about this, I found this behavior quite disturbing....

I don't know what could be the correct answer to this,
maybe making Path('..') as invalid as Path('.'),
or adding a few more lines in the doc...

Sincerly yours,
msg344462 - (view) Author: Brett Cannon (brett.cannon) * (Python committer) Date: 2019-06-03 18:58
The inconsistency is a bit weird. Looking at https://github.com/python/cpython/blob/master/Lib/pathlib.py#L825 the question is why is self.name not being set for '.' but it is for '..'. I suspect there's special-casing for '.' somewhere that sets self.name to '' for '.' but leaves it alone in all other instances.

Based on what with_name() is supposed to do I would argue that '..' shouldn't work since without a filename the with_name() method makes no sense.
msg344640 - (view) Author: N.P. Khelili (Nophke) * Date: 2019-06-04 20:58
in the definition of the name property https://github.com/python/cpython/blob/9ab2fb1c68a75115da92d51b8c40b74f60f88561/Lib/pathlib.py#L792 :

if len(parts) == (1 if (self._drv or self._root) else 0):
    return ''

could also become

if self.parent == self
    return ''   # why not None btw...

As I said, I'm new to python and I'll make a few tries once I build the test suite
msg344770 - (view) Author: N.P. Khelili (Nophke) * Date: 2019-06-05 20:57
The idea in my last post was quite bad,
setting name to None breaks a lot of functions
that expect name to be a string.

Path('.').parent and Path('..').parent both return '.'.

Even if it is not stupid (regarding them as special dirs
pointing to somewhere else but still being inside the directory).

I don't know why anyone would rely on such a behaviour...

The tests expect Path('..').stem() to be '..'
and expect Path('.').stem() to be ''

Once again, I don't know why anyone should rely on it but
I fear I can't do a lot without breaking this one part of the test.

I'm working on it and posting something by the end of the Week.
msg345001 - (view) Author: Terry J. Reedy (terry.reedy) * (Python committer) Date: 2019-06-07 19:14
Welcome to Python.  If you end up proposing a change to code or doc, this will end up being the right place.  A change to a documented behavior be called an 'enhancement' and only applied to the next version.  A doc change will likely be backported.
msg345129 - (view) Author: N.P. Khelili (Nophke) * Date: 2019-06-10 13:51
First, there is no real special case about the '.' path. The parse_args() method simlply removes then during __new__() (around line 80) as they are not needed. Double dots having to be kept, are later considered valid by the name @property.

In test_pathlib.py, '..' are just ignored in a lot of tests. In my previous post, I pointed the bahaviour of .stem and .parent. But we should also consider the question of .anchor. The doc says that it should return the """The concatenation of the drive and root, or ''."""

but the code is:

anchor = self._drv + self._root
return anchor

leading to:

>>> Path('.').anchor
''
>>> Path('..').anchor
''

and:

>>> Path('foo').anchor
''

when one would probably expect '.', './..' and './foo'.

I also found:

>>> Path('*').match('.')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/nono/BUILD/cpython/Lib/pathlib.py", line 956, in match
    raise ValueError("empty pattern")
ValueError: empty pattern


>>> Path('*').match('..')
False

While the behaviour of .stem (cf up msg 344770) dates from initial commit of test_pathlib, maybe breaking it may be a bad idea... ( I really don't know.)

I have a working code that sets name to '' for '..' and adds a special case in .stem() so that we do not remove any line in test_pathlib, but only adds some.

I think anyway, it is too soon for any pull request. Questions are:

- Should .name allways return a string, even if empty?
- Should P('..').parent really return '.'?
- Is it ok that .match() make that much difference between '.' and '..'?
- Am I correct in my expectations about .anchor ?
- May .stem behaviour be changed ?
msg345386 - (view) Author: N.P. Khelili (Nophke) * Date: 2019-06-12 17:13
After digging the question,I'd rather go for a minimal change.

- setting .name to '' for '..'
- let it be known in the doc
- special-casing Path('..').stem (to keep the old behaviour)
- update tests

More could be done, but I don't feel like rewriting too much of it.

A global design change should set a special treatment for '..' as well as '.' 

But I'd rather see small steps being accepted than big ones rejected.
msg345395 - (view) Author: Brett Cannon (brett.cannon) * (Python committer) Date: 2019-06-12 18:06
@Antoine: was there a design reason behind setting 'name' to '' when a Path objects was initialized with '.'? Is it to implicitly represent the current directory?

@N.P.: we will have to think through the implications of this as I don't know if normalizing to 'name' being '' is the best way to resolve this inconsistency.
msg345406 - (view) Author: N.P. Khelili (Nophke) * Date: 2019-06-12 20:52
@Brett: Honestly.... I don't think it is the best way. But fact is:

nono@ACER ~ % cd /

nono@ACER / % python
Python 3.7.3 (default, Mar 26 2019, 21:43:19) 
[GCC 8.2.1 20181127] on linux
Type "help", "copyright", "credits" or "license" for more information.

>>> from pathlib import Path
>>> Path('.') == Path('/')
False

In my humble and *very personal* opinion, this result could be understood in the case of
a PurePath (or some kind of system call free) object. But that is not what people expect
in the case of a 'normal' path...

I also think that one day we may see the rise of a new Os that wouldn't use . and  ..
The first Unix used 'd' for directory and 'dd' for directory's directory !
Those were hand-made links in a file system that otherwise had no concept of hierarchy...

I think each flavour should have a special_dirs variable (fact that linux and MsWin share
the same being an accident). And that the concept of a system-call free path implementation,
is fragile.
msg345424 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2019-06-12 22:33
I'm sorry, but I don't understand the issue here.

Instead of posting isolated snippets, could you explain what are you trying to do?
msg345536 - (view) Author: Brett Cannon (brett.cannon) * (Python committer) Date: 2019-06-13 19:15
@Antoine: Basically Path.with_name() fails under '.' but works with '..' although with a somewhat odd result. And then after that is the fact that Path('.').name is the empty string but for Path('..').name it's '..' (which is what causes Path('.').with_name() to fail).
msg347391 - (view) Author: Brett Cannon (brett.cannon) * (Python committer) Date: 2019-07-05 21:34
Thinking about this a bit, I my gut says having Path('.').name == '.' makes more sense than returning ''. My reasoning is that in any case where there's a single value -- e.g. Path('spam') -- you end up with the part returned in `name`. That suggests to me that '.' isn't any more special or ambiguous than 'spam' or '..'.

Antoine, what do you think?
msg347495 - (view) Author: Steinmann (lucas.steinmann) Date: 2019-07-08 13:47
@Brett: I also think making Path('.').name evaluate to '.' would be the most logical thing. More even so since the documentation says PurePath.name() is equivalent to os.path.basename()[1], but:

>>> Path('.').name
''
>>> os.path.basename('.')
'.'

Though I'm not sure if that is ok to change this behaviour or if people already rely on it.

No matter which decision is made, I would say the documentation should be improved.
If it will be the same as basename, document this also here [2].
Otherwise add a note to [1], maybe in the same format and position as was done for os.path.relpath().

[1] https://docs.python.org/3/library/pathlib.html#correspondence-to-tools-in-the-os-module
[2] https://docs.python.org/3/library/pathlib.html#pathlib.PurePath.name
msg348530 - (view) Author: Brett Cannon (brett.cannon) * (Python committer) Date: 2019-07-26 21:50
Anyone up for doing a PR that makes Path('.').name == '.'? It would be good to see if that would break the stdlib as a proxy as to how many people might be relying on these semantics. My hope is nothing breaks in which case we can make the change in 3.9 and notify people in What's New of the new, consistent semantics.
msg348534 - (view) Author: Jeffrey Kintscher (Jeffrey.Kintscher) * Date: 2019-07-26 22:31
I'll take a crack at adding support for Path('.').name == '.'.
History
Date User Action Args
2019-07-26 22:31:13Jeffrey.Kintschersetmessages: + msg348534
2019-07-26 21:50:34brett.cannonsetmessages: + msg348530
2019-07-08 13:47:26lucas.steinmannsetnosy: + lucas.steinmann
messages: + msg347495
2019-07-05 21:34:39brett.cannonsetmessages: + msg347391
2019-06-13 19:15:00brett.cannonsetmessages: + msg345536
title: pathlib does not handle '..' directory -> pathlib.Path.with_name() handles '.' and '..' inconsistently
2019-06-12 22:33:27pitrousetmessages: + msg345424
2019-06-12 20:52:08Nophkesetmessages: + msg345406
2019-06-12 18:06:27brett.cannonsetnosy: + pitrou
messages: + msg345395
2019-06-12 17:13:29Nophkesetmessages: + msg345386
2019-06-12 17:10:51Nophkesetkeywords: + patch
stage: test needed -> patch review
pull_requests: + pull_request13887
2019-06-10 18:49:58brett.cannonsetassignee: brett.cannon
2019-06-10 13:51:32Nophkesetmessages: + msg345129
title: pathlib.with_name() doesn't like unnamed files. -> pathlib does not handle '..' directory
2019-06-07 19:14:24terry.reedysetnosy: + terry.reedy

messages: + msg345001
versions: + Python 3.9, - Python 3.7
2019-06-05 20:57:56Nophkesetmessages: + msg344770
2019-06-04 20:58:40Nophkesetmessages: + msg344640
2019-06-04 02:45:52Jeffrey.Kintschersetnosy: + Jeffrey.Kintscher
2019-06-03 18:59:07brett.cannonsetstage: test needed
2019-06-03 18:58:59brett.cannonsetnosy: + brett.cannon
messages: + msg344462
2019-06-02 01:18:12Nophkecreate