classification
Title: race condition in pathlib mkdir with flags parents=True
Type: behavior Stage: test needed
Components: Library (Lib) Versions: Python 3.7, Python 3.6, Python 3.5
process
Status: open Resolution:
Dependencies: Superseder:
Assigned To: Nosy List: arigo, berker.peksag, serhiy.storchaka, whitespacer
Priority: normal Keywords: patch

Created on 2017-03-02 13:53 by whitespacer, last changed 2017-03-28 14:27 by arigo.

Files
File name Uploaded Description Edit
x1.diff arigo, 2017-03-07 06:39 review
x2.diff arigo, 2017-03-28 14:27
Messages (6)
msg288801 - (view) Author: whitespacer (whitespacer) Date: 2017-03-02 13:53
When pathlib mkdir is called with parents=True and some parent doesn't exists it recursively calls self.parent.mkdir(parents=True) after catching OSError. However after catching of OSError and before call to self.parent.mkdir(parents=True) somebody else can create parent dir, which will lead to FileExistsError exception.
msg289155 - (view) Author: Armin Rigo (arigo) * (Python committer) Date: 2017-03-07 06:39
A different bug in the same code: if someone creates the directory itself between the two calls to ``self._accessor.mkdir(self, mode)``, then the function will fail with an exception even if ``exist_ok=True``.

Attached is a patch that fixes both cases.
msg290087 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2017-03-24 13:05
Can you provide tests?
msg290102 - (view) Author: Armin Rigo (arigo) * (Python committer) Date: 2017-03-24 17:45
It's a mess to write a test, because the exact semantics of .mkdir() are not defined as far as I can tell.  This patch is a best-effort attempt at making .mkdir() work in the presence of common parallel filesystem changes, that is, other processes that would create the same directories at the same time.

This patch is by no means an attempt at being a complete solution for similar problems.  The exact semantics have probably never been discussed at all.  For example, what should occur if a parent directory is removed just after .mkdir() created it?

I'm not suggesting to discuss these issues now, but to simply leave them open.  I'm trying instead to explain why writing a test is a mess (more than "just" creating another thread and creating/removing directories very fast while the main thread calls .mkdir()), because we have no exact notion of what should work and what shouldn't.
msg290477 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2017-03-25 10:49
You can monkey-patch os.mkdir and pathlib._NormalAccessor.mkdir.

Without tests we can't be sure that the patch fixes the issue and that the bug will be not reintroduced by future changes. Tests are required for this change.
msg290723 - (view) Author: Armin Rigo (arigo) * (Python committer) Date: 2017-03-28 14:27
Changes including a test.  The test should check all combinations of "concurrent" creation of the directory.  It hacks around at pathlib._normal_accessor.mkdir (patching "os.mkdir" has no effect, as the built-in function was already extracted and stored inside pathlib._normal_accessor).
History
Date User Action Args
2017-03-28 14:27:12arigosetfiles: + x2.diff

messages: + msg290723
2017-03-25 10:49:48serhiy.storchakasetmessages: + msg290477
2017-03-24 17:45:12arigosetmessages: + msg290102
2017-03-24 13:05:22serhiy.storchakasetnosy: + serhiy.storchaka

messages: + msg290087
stage: patch review -> test needed
2017-03-18 16:53:10berker.peksagsetnosy: + berker.peksag

components: + Library (Lib)
stage: patch review
2017-03-07 06:39:04arigosetfiles: + x1.diff

nosy: + arigo
messages: + msg289155

keywords: + patch
2017-03-02 13:53:33whitespacercreate