classification
Title: Trouble with dir_util created dir cache
Type: behavior Stage: needs patch
Components: Documentation Versions: Python 3.8, Python 3.7, Python 2.7
process
Status: pending Resolution:
Dependencies: Superseder:
Assigned To: Nosy List: Malcolm Smith, diegoqueiroz, eric.araujo, ivtashev, tarek
Priority: normal Keywords:

Created on 2011-01-19 17:07 by diegoqueiroz, last changed 2019-03-14 00:58 by eric.araujo.

Messages (13)
msg126540 - (view) Author: Diego Queiroz (diegoqueiroz) Date: 2011-01-19 17:07
There is a problem with dir_util cache (defined by "_path_created" global variable).

It appears to be useful but it isn't, just repeat these steps to understand the problem I'm facing:

1) Use mkpath to create any path (eg. /home/user/a/b/c)
2) Open the terminal and manually delete the directory "/home/user/a" and its contents
3) Try to create "/home/user/a/b/c" again using mkpath

Expected behavior:
mkpath should create the folder tree again.

What happens:
Nothing, mkpath "thinks" the folder already exists because its creation was cached. Moreover, if you try to create one more folder level (eg. /home/user/a/b/c/d) it raises an exception because it thinks that part of the tree was already created and fails to create the last folder.


I'm working with parallel applications that deal with files asynchronously, this problem gave me a headache.

Anyway, the solution is easy: remove the cache.
msg126548 - (view) Author: Éric Araujo (eric.araujo) * (Python committer) Date: 2011-01-19 17:50
Thanks for the report and diagnosis.  Why does your application randomly removes files created by distutils?
msg126550 - (view) Author: Diego Queiroz (diegoqueiroz) Date: 2011-01-19 18:07
Well. My application does not actually randomly remove the folders, it just can't guarantee for a given process how the folder it created will be deleted.

I have many tasks running on a cluster using the same disk. Some tasks creates the folders/files and some of them remove them after processing. What each task will do depends of the availability of computational resources.

The application is also aware of possible user interaction, that is, I need to be able to manipulate folders manually (adding or removing) without crashing the application or corrupting data.
msg126551 - (view) Author: Éric Araujo (eric.araujo) * (Python committer) Date: 2011-01-19 18:12
Maybe I’m tired, but I don’t understand why your application would remove directories that distutils creates.  We’ve fixed a bug related to a race condition when *creating* directories (#9281), but behaving sanely on an unstable tree seems something different to me.
msg126554 - (view) Author: Diego Queiroz (diegoqueiroz) Date: 2011-01-19 18:31
Suppose the application creates one folder and add some data to it:

- /scratch/a/b/c

While the application is still running (it is not using the folder anymore), you see the data, copy it to somewhere and delete everything manually using the terminal.

After some time, (maybe a week or a month later, it doesn't really matter) the application wants to write again on that folder, but ops, the folder was removed. As application is very well coded :-), it checks for that folder and note that it doesn't exist anymore and needs to be recreated.

But, when the application try to do so, nothing happens, because the cache is not updated. ;/

Maybe distutils package was not designed for the purpose I am using it (I am not using it to install python modules or anything), but this behavior is not well documented anyway.

If you really think the cache is important, two things need to be done:
1) Implement a way to update/clear the cache
2) Include details about the cache and its implications on distutils documentation
msg126564 - (view) Author: Éric Araujo (eric.araujo) * (Python committer) Date: 2011-01-19 20:20
“Maybe distutils package was not designed for the purpose I am using it (I am not using it to install python modules or anything), but this behavior is not well documented anyway.”  Aaaah, I had no idea you were using the function directly for something unrelated to distutils’s purpose.  There is no clear distinction between public and private functions in distutils, so I understand how you could find this seemingly useful function and use it in your code.

The solution is to use a public function like os.makedirs.  For distutils, I don’t think a doc change is needed: the cache is an implementation detail.
msg126568 - (view) Author: Diego Queiroz (diegoqueiroz) Date: 2011-01-19 21:30
You were right, "os.makedirs" fits my needs. :-)

Anyway, I still think the change in the documentation is needed.
This is not an implementation detail, it is part of the way the function works.

The user should be aware of the behavior when he call this function twice. In my opinion, the documentation should be clear about everything. We could call this an implementation detail iff it does not affect anything externally, but this is not the case (it affects subsequent calls).

This function does exactly the same of "os.makedirs" but the why is discribed only in a comment inside the code. We know this is a poor programming style. This information need to be available in the documentation too.
msg126569 - (view) Author: Éric Araujo (eric.araujo) * (Python committer) Date: 2011-01-19 21:33
“This is not an implementation detail, it is part of the way the function works.  The user should be aware of the behavior when [they] call this function twice.”

I would agree if mkpath were a public function.  I think it’s an implementation detail used by other distutils code, especially commands.  Considering that dir_util is gone in distutils2, I see no benefit in editing the doc.
msg126625 - (view) Author: Diego Queiroz (diegoqueiroz) Date: 2011-01-20 15:44
"I would agree if mkpath were a public function."
So It is better to define what a "public function" is. Any function in any module of any project, if it is indented to be used by other modules, it is public by definition.

If new people get involved in distutils development they will need to read all the code, line by line and every comment, because the old developers decided not to document the inner workings of its functions.

"Considering that dir_util is gone in distutils2, I see no benefit in editing the doc."
Well, I know nothing about this. However, if you tell me that distutils2 will replace distutils, I may agree with you and distutils just needs to be deprecated. Otherwise, I keep my opinion.
msg126773 - (view) Author: Éric Araujo (eric.araujo) * (Python committer) Date: 2011-01-21 19:14
> So It is better to define what a "public function" is.
That is no easy task.  See #10894 for a general discussion.  For the particular case of distutils, there is no distinction between internal helpers that we should be free to change and public functions provided to third-party code.  That’s one of the reasons we had to fork under a new name to have a chance to clean things up (i.e. make nearly everything private).

> If new people get involved in distutils development they will need to
> read all the code, line by line and every comment, because the old
> developers decided not to document the inner workings of its functions.
A lot of people have bee learning distutils internals in recent years: Tarek, the current maintainer; hackers from Montreal; Google Summer of Code students like me.  So it is possible to get involved with distutils, starting with one area (network code, or versions and dependencies, or commands, or compilers...).  That said, I agree the doc is very lacking, and improving it is one of my big goals for distutils2 in Python 3.3.  (I will give priority to important user-facing functions and classes over helpers like mkpath, however.)

> Well, I know nothing about this. However, if you tell me that
> distutils2 will replace distutils, I may agree with you and distutils
> just needs to be deprecated. Otherwise, I keep my opinion.
distutils is frozen and only gets bug fixes; distutils2 is a fork where we can break compatibility to fix the design and behavior.  More information on http://tarekziade.wordpress.com/2010/03/03/the-fate-of-distutils-pycon-summit-packaging-sprint-detailed-report/

I am now closing this issue.  If I have misunderstood your last message and you’re not satisfied with that, please reopen.  Thanks again for your report, and don’t hesitate to report any bug you may find in the future.
msg332972 - (view) Author: Malcolm Smith (Malcolm Smith) Date: 2019-01-04 11:57
Please reopen this issue. The distutils2 project has now been abandoned, so that's no longer a justification for taking no action. 

At the very least, the documentation should be fixed to either warn about this surprising behavior, or make it clear that the the dir_util functions are for distutils internal use only.
msg337779 - (view) Author: Ivan Tashev (ivtashev) Date: 2019-03-12 17:16
distutils.dir_util is easily found in the documentation. If this behaviour is not fixed, at least the docs should state dir_util is not recommended for public use.
msg337892 - (view) Author: Éric Araujo (eric.araujo) * (Python committer) Date: 2019-03-14 00:58
Agreed, a doc PR to warn against using any of the distutils *util modules would be useful.
History
Date User Action Args
2019-03-14 00:58:58eric.araujosetstatus: closed -> pending

assignee: eric.araujo ->
components: + Documentation, - Distutils
versions: + Python 3.8, - Python 3.1, Python 3.2
resolution: works for me ->
messages: + msg337892
stage: resolved -> needs patch
2019-03-12 17:16:37ivtashevsetnosy: + ivtashev
messages: + msg337779
2019-01-04 11:57:56Malcolm Smithsetnosy: + Malcolm Smith

messages: + msg332972
versions: + Python 3.7
2011-01-21 19:14:49eric.araujosetstatus: open -> closed
nosy: tarek, eric.araujo, diegoqueiroz
messages: + msg126773

resolution: works for me
stage: resolved
2011-01-20 15:44:41diegoqueirozsetnosy: tarek, eric.araujo, diegoqueiroz
messages: + msg126625
2011-01-19 21:33:35eric.araujosetnosy: tarek, eric.araujo, diegoqueiroz
messages: + msg126569
2011-01-19 21:30:12diegoqueirozsetnosy: tarek, eric.araujo, diegoqueiroz
messages: + msg126568
2011-01-19 20:20:34eric.araujosetnosy: tarek, eric.araujo, diegoqueiroz
messages: + msg126564
2011-01-19 18:31:30diegoqueirozsetnosy: tarek, eric.araujo, diegoqueiroz
messages: + msg126554
2011-01-19 18:12:12eric.araujosetnosy: tarek, eric.araujo, diegoqueiroz
messages: + msg126551
2011-01-19 18:07:13diegoqueirozsetnosy: tarek, eric.araujo, diegoqueiroz
messages: + msg126550
2011-01-19 17:50:32eric.araujosetassignee: tarek -> eric.araujo
versions: + Python 3.2, - Python 2.6, Python 2.5
messages: + msg126548
nosy: tarek, eric.araujo, diegoqueiroz
2011-01-19 17:07:10diegoqueirozcreate