This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: subclasses of pathlib.PurePosixPath never call __init__ or __new__
Type: behavior Stage: patch review
Components: Library (Lib) Versions: Python 3.10, Python 3.9, Python 3.8, Python 3.7, Python 3.6
process
Status: open Resolution:
Dependencies: Superseder:
Assigned To: Nosy List: Jeffrey.Kintscher, conchylicultor, louis-vincent.boudre, pitrou
Priority: normal Keywords: patch

Created on 2020-06-25 02:51 by conchylicultor, last changed 2022-04-11 14:59 by admin.

Pull Requests
URL Status Linked Edit
PR 21920 open Jeffrey.Kintscher, 2020-08-19 05:31
Messages (12)
msg372297 - (view) Author: Etienne POT (conchylicultor) * Date: 2020-06-25 02:51
I have a subclass GithubPath of PurePosixPath.

```
class GithubPath(pathlib.PurePosixPath):

  def __new__(cls, *args, **kwargs):
    print('New')
    return super().__new__(cls, *args, **kwargs)

  def __init__(self, *args, **kwargs):
    print('Init')
    super().__init__()
```

Calling `child.parent` create a new GithubPath but without ever calling __new__ nor __init__. So my subclass is never notified it is created.

```
p = GithubPath()  # Print "New", "Init"

p.parent  # Create a new GithubPath but bypass the constructors
```

The reason seems to be that parent calls _from_parts which create a new object through `object.__new__(cls)`: https://github.com/python/cpython/blob/cf18c9e9d4d44f6671a3fe6011bb53d8ee9bd92b/Lib/pathlib.py#L689


A hack is to subclass `_init` but it seems hacky as it relies on internal implementation detail.
msg372298 - (view) Author: Etienne POT (conchylicultor) * Date: 2020-06-25 03:03
Note that this likely affect all methods which returns new Path by calling `_from_parts` or `_from_parsed_parts`, like `.absolute`, `.resolve`,...
msg375030 - (view) Author: Jeffrey Kintscher (Jeffrey.Kintscher) * Date: 2020-08-08 01:36
The workaround is to override _init(), which I agree is not desirable.

This is the relevant code in PurePath, which is the super class of PurePosixPath:

    @classmethod
    def _from_parsed_parts(cls, drv, root, parts, init=True):
        self = object.__new__(cls)
        self._drv = drv
        self._root = root
        self._parts = parts
        if init:
            self._init()
        return self

...

    def _init(self):
        # Overridden in concrete Path
        pass

To me, the clean way to get the desired behavior seems like it would be to have _init() call self.__init__().

    def _init(self):
        # Overridden in concrete Path
        self.__init__()

This fixes p.parent, but GithubPath() ends up calling GithubPath.__init__() twice – the first time by PurePath.__new__() calling PurePath._init() and the second time by the GithubPath object creation.
msg375031 - (view) Author: Jeffrey Kintscher (Jeffrey.Kintscher) * Date: 2020-08-08 02:05
Clarification:

PurePath.__new__() calls PurePath._from_parts(), which then calls PurePath._init()
msg375041 - (view) Author: Etienne POT (conchylicultor) * Date: 2020-08-08 10:34
Before solving this issue, I think it would be best to think on a more generic solution on how to make Pathlib more extensible. Related to: https://discuss.python.org/t/make-pathlib-extensible/3428

For instance, if childs created with `p.parent()`, `p / 'subdir'` need to forward some state (e.g. `RemotePath(path, password=, user=)`).

Rather than __init__, maybe there should be some __post_init__ like dataclasses.
msg375047 - (view) Author: Louis-Vincent Boudreault (louis-vincent.boudre) Date: 2020-08-08 13:47
Path.__new__ should not call _from_parts because it breaks the specialization by directly using object.__new__.
This is why `_from_parts` has to be duplicated in each subclass's `__new__`.

My suggestion (first draft) is to do the parsing of arguments inside an `__init__` in the Path class hierarchy
and deprecate  `_from_parts`.

```
class PurePath:
   
     def __new__(cls, *args):
         if cls is PurePath:
             cls = PureWindowsPath if os.name == 'nt' else PurePosixPath
         super().__new__(cls, *args) # Here we remove call to from_parts

     def __init__(self, *args):
         # We should have an __init__ in the hierarchy.
         drv, root, parts = self._parse_args(args)  # this would get the proper _flavour.
         self._drv = drv
         self._root = root
         self._parts = parts

     ...

class Path(PurePath):
     def __new__(cls, *args, **kwargs):
        if cls is Path:
            cls = WindowsPath if os.name == 'nt' else PosixPath

        # REMOVE THIS LINE: self = cls._from_parts(args, init=False) #

        if not self._flavour.is_supported:
            raise NotImplementedError("cannot instantiate %r on your system"
                                      % (cls.__name__,))
        return super().__new__(cls, *args, **kwargs) # Use super

```


I don't know what is the purpose of `_init` and if it could be replaced by an extra keyword argument in __init__.

The class method `_from_parts` would become deprecated since the argument parsing would be now handled by `__init__`.

By using __init__ and avoid the direct call to `object.__new__` we would respect class specialization and custom class could be implemented intuitively.
msg375162 - (view) Author: Jeffrey Kintscher (Jeffrey.Kintscher) * Date: 2020-08-11 03:07
The purpose of the _init() function in PurePath is to allow PurePath methods to call the Path subclass override _init() function when initializing a Path object.
msg375332 - (view) Author: Jeffrey Kintscher (Jeffrey.Kintscher) * Date: 2020-08-13 19:27
Adding __init__() to PurePath complicates things and doesn't provide any benefit.  A subclass that calls super.__init__() ends up invoking object.__init__(), which is perfectly fine.

I was able to find a solution by calling type(self)() instead of object.__new__() in most cases.  I am working on a PR.
msg375333 - (view) Author: Louis-Vincent Boudreault (louis-vincent.boudre) Date: 2020-08-13 19:31
This is not true, because the classmethod use the library shortcuts the
class mro order, to prevent infinite loop inthe __new__. However, it was
using __init__ before hand, we would
Not have this issue

Le jeu. 13 août 2020 à 15:27, Jeffrey Kintscher <report@bugs.python.org> a
écrit :

>
> Jeffrey Kintscher <websurfer@surf2c.net> added the comment:
>
> Adding __init__() to PurePath complicates things and doesn't provide any
> benefit.  A subclass that calls super.__init__() ends up invoking
> object.__init__(), which is perfectly fine.
>
> I was able to find a solution by calling type(self)() instead of
> object.__new__() in most cases.  I am working on a PR.
>
> ----------
>
> _______________________________________
> Python tracker <report@bugs.python.org>
> <https://bugs.python.org/issue41109>
> _______________________________________
>
msg375334 - (view) Author: Louis-Vincent Boudreault (louis-vincent.boudre) Date: 2020-08-13 19:33
This is not true, because the classmethod used the library shortcuts the
class mro order, this is to prevent infinite loop inthe __new__. However,
If it was using __init__ before hand, we would
Not have this issue

Le jeu. 13 août 2020 à 15:31, Louis-Vincent Boudreault <
lv.boudreault95@gmail.com> a écrit :

> This is not true, because the classmethod use the library shortcuts the
> class mro order, to prevent infinite loop inthe __new__. However, it was
> using __init__ before hand, we would
> Not have this issue
>
>
> Le jeu. 13 août 2020 à 15:27, Jeffrey Kintscher <report@bugs.python.org>
> a écrit :
>
>>
>> Jeffrey Kintscher <websurfer@surf2c.net> added the comment:
>>
>> Adding __init__() to PurePath complicates things and doesn't provide any
>> benefit.  A subclass that calls super.__init__() ends up invoking
>> object.__init__(), which is perfectly fine.
>>
>> I was able to find a solution by calling type(self)() instead of
>> object.__new__() in most cases.  I am working on a PR.
>>
>> ----------
>>
>> _______________________________________
>> Python tracker <report@bugs.python.org>
>> <https://bugs.python.org/issue41109>
>> _______________________________________
>>
>
msg375364 - (view) Author: Jeffrey Kintscher (Jeffrey.Kintscher) * Date: 2020-08-14 02:12
The current implementation calls object.__new__(cls), where cls is the child class type, from within a class method (@classmethod).  This is fine for Path.__new__() and PurePath.__new__(), which are called by the child class's __new__(), because we don't want them to recursively call the child class's __new__() when the child class is created.  This all works as expected when the child class is instantiated outside of Path and PurePath, and the child's __init__() gets called as expected.  I don't see any point in making changes to this behavior.

When one of approximately 20 PurePath and Path functions and properties instantiate a new child class object the same way PurePath.__new__() and Path.__new__() do, the child class's __new__() and __init__() functions are not called.  This is the problem we are trying to solve.

My fix is to add normal functions (i.e. not decorated with @classmethod) to instantiate child class objects using

obj = type(self)()

This creates a child class instance, and the child's __new__() and __init__() functions get called.

Once I have finished re-plumbing Path and PurePath to use the new functions and created the necessary unit tests (to make sure I didn't break anything), I will also look at adding 
a proper __init__() function to the two classes instead of having __new__() initialize the member variables.  I didn't mean to imply that __init__() isn't useful.  It is required to allow the child class to initialize its own variable.  I just meant it isn't required to force calling __init__() and __new__() in the child class.
msg375638 - (view) Author: Jeffrey Kintscher (Jeffrey.Kintscher) * Date: 2020-08-19 06:25
I created a PR that should provide the desired behavior:  __init__() and __new__() get called in subclass objects that are created by Path and PurePath.  Also, Path and PurePath now have __init__() functions, and the __new__() functions only return new objects and rely upon __init__() to perform the object initialization.
History
Date User Action Args
2022-04-11 14:59:32adminsetgithub: 85281
2020-08-19 06:37:57Jeffrey.Kintschersetcomponents: + Library (Lib)
2020-08-19 06:25:13Jeffrey.Kintschersetmessages: + msg375638
2020-08-19 05:31:42Jeffrey.Kintschersetkeywords: + patch
stage: patch review
pull_requests: + pull_request21034
2020-08-14 02:12:51Jeffrey.Kintschersetmessages: + msg375364
2020-08-13 19:33:20louis-vincent.boudresetmessages: + msg375334
2020-08-13 19:31:25louis-vincent.boudresetmessages: + msg375333
2020-08-13 19:27:23Jeffrey.Kintschersetmessages: + msg375332
2020-08-11 03:07:33Jeffrey.Kintschersetmessages: + msg375162
2020-08-08 13:47:37louis-vincent.boudresetnosy: + louis-vincent.boudre
messages: + msg375047
2020-08-08 10:34:16conchylicultorsetmessages: + msg375041
2020-08-08 02:05:42Jeffrey.Kintschersetmessages: + msg375031
2020-08-08 01:36:03Jeffrey.Kintschersetmessages: + msg375030
2020-08-07 20:38:30Jeffrey.Kintschersetnosy: + Jeffrey.Kintscher
2020-06-25 03:08:40xtreaksetnosy: + pitrou
2020-06-25 03:03:34conchylicultorsetmessages: + msg372298
2020-06-25 02:51:23conchylicultorcreate