classification
Title: Approved API for creating a temporary file path
Type: behavior Stage: needs patch
Components: Library (Lib) Versions: Python 3.6, Python 3.5
process
Status: open Resolution:
Dependencies: Superseder:
Assigned To: Nosy List: bignose, eryksun, ethan.furman, georg.brandl, gumnos, r.david.murray, serhiy.storchaka
Priority: normal Keywords:

Created on 2016-02-15 02:11 by bignose, last changed 2017-05-01 23:04 by eryksun.

Messages (19)
msg260291 - (view) Author: Ben Finney (bignose) Date: 2016-02-15 02:11
The security issues of `tempfile.mktemp` are clear when the return value is used to create a filesystem entry. The documentation and docstrings (and even some comments on past issues) are correct o deprecate its use for that purpose.

The function has a use which doers not have security implications: generating test data. When a test case wants to generate unpredictable, unique, valid filesystem paths – and *never access those paths* on the filesystem – the `tempfile.mktemp` function is right there and is very useful.

The `tempfile._RandomNameSequence` class would also be useful, but its name also makes clear that it is not part of the library public API.

Please make that functionality available for the purpose of *only* generating filesystem paths as `tempfile._RandomNameSequence` does, in a public, supported, non-deprecated API.
msg260346 - (view) Author: Ben Finney (bignose) Date: 2016-02-16 05:00
It has been pointed out that `tempfile.mktemp` does in fact access the filesystem, to query whether the entry exists.

So this request would be best met by exposing a simple “get a new return value from the `tempfile._RandomNameSequence` instance” function.
msg260349 - (view) Author: Ben Finney (bignose) Date: 2016-02-16 05:53
An example::

    import io
    import tempfile
    names = tempfile._get_candidate_names()

    def test_frobnicates_configured_spungfile():
        """ ‘foo’ should frobnicate the configured spungfile. """

        fake_file_path = os.path.join(tempfile.gettempdir(), names.next())
        fake_file = io.BytesIO("Lorem ipsum, dolor sit amet".encode("utf-8"))

        patch_builtins_open(
                when_accessing_path=fake_file_path,
                provide_file=fake_file)

        system_under_test.config.spungfile_path = fake_file_path
        system_under_test.foo()
        assert_correctly_frobnicated(fake_file)

With a supported standard library API for this – ‘tempfile.makepath’
for example – the generation of the filesystem path would change from
four separate function calls::

    names = tempfile._get_candidate_names()
    fake_file_path = os.path.join(tempfile.gettempdir(), names.next())

to a simple function call::

    fake_file_path = tempfile.makepath()

and have the benefit of not reaching in to a private API.
msg260995 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2016-02-29 07:50
I have read the thread on Python-list (http://comments.gmane.org/gmane.comp.python.general/789339) and still don't understand the purpose of your idea.

If you need unique filesystem paths without any access to real filesystem, you can use simple counter. If you need unpredictable -- use random generator. For uniqueness use a set of already generated name. If you have additional requirements -- it is easy to modify you code for your needs.
msg261016 - (view) Author: Ben Finney (bignose) Date: 2016-02-29 18:39
> I have read the thread on Python-list

Thank you, and thanks for linking to that discussion.

> and still don't understand the purpose of your idea.

The purpose is to get a public API for making temporary filesystem paths with the same properties as are produced by ‘tempfile’, specifically as produced by a ‘tempfile._get_candidate_names’ generator.

The paths will be used in unit tests and will never touch the filesystem.

Some have suggested using ‘uuid’ or other random-generator APIs. Using an API that isn't explicitly designed to produce valid filesystem paths is both risky (the algorithm may not exactly produce the right behaviour), and confusing (a reader seeing use of an unrelated API will not find it obvious why that API was used for generating a filesystem path).

The unit tests need paths with properties that are already implemented in ‘tempfile’, so it seems unreasonable to suggest that functionality should be re-implemented elsewhere since it is already in the standard library.

The request is for a “get the next path generated by a ‘tempfile._get_candidate_names’ generator”, with an approved and documented public API. One suggested name is ‘tempfile.makepath’.
msg261017 - (view) Author: Ethan Furman (ethan.furman) * (Python committer) Date: 2016-02-29 18:44
> The request is for a “get the next path generated by a
> ‘tempfile._get_candidate_names’ generator”, with an approved and
> documented public API.

I don't see any problem with this.  Patches welcome.
msg261021 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2016-02-29 20:39
I see a problem. tempfile._get_candidate_names is implementation detail, and exposing it adds a burden for maintainers. It also adds cognitive burden for users of the tempfile module. And the idea itself looks doubtful and contradictory with the good practice of using the tempfile module.

I'm -1 for adding such function to the tempfile module.
msg261022 - (view) Author: Ethan Furman (ethan.furman) * (Python committer) Date: 2016-02-29 20:47
I don't see the problem.  Every Python platform that has `mktemp` has an implementation to generate the names, and that implementation can become the public face instead of `mktemp`.  So no more of a burden than `mktemp` is; likewise for the "cognitive burden".

As far as appropriate location -- where else would you put it?
msg261023 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2016-02-29 21:20
mktemp is deprecated, and I think we will get rid of it when 2.7 will be out of use. Do you want to add new deprecated from the start function?

> As far as appropriate location -- where else would you put it?

May be in third-party module on PyPI? It doesn't look as highly demanded function.
msg261025 - (view) Author: Ben Finney (bignose) Date: 2016-02-29 22:09
Serhiy Storchaka

> mktemp is deprecated, and I think we will get rid of it when 2.7 will be out of use.

That's fine. This is not a request to retain ‘tempfile.mktemp’.

I confused matters in my initial message, so your confusion is understandable on this point. To repeat:

The request is for a “get the next path generated by a ‘tempfile._get_candidate_names’ generator”, with an approved and documented public API. One suggested name is ‘tempfile.makepath’.


> Do you want to add new deprecated from the start function?

No. I'm asking for already-implemented and currently-maintained (i.e. not deprecated) standard library code, that is currently neither public nor documented, to gain a public API.


Ethan Furman:

> I don't see any problem with this.  Patches welcome.

Thank you, I'll work on a patch soon.
msg292648 - (view) Author: Tim Chase (gumnos) Date: 2017-05-01 02:48
As requested by Ben Finney[1], adding my use-case here.  I'm attempting to make a hard-link from "a" to "b", but if "b" exists, os.link() will fail with an EEXISTS.  I don't want to do

 os.unlink("b")
 # power-outage here means "b" is gone
 os.link("a", "b")

I can do something like

  temp_name = tempfile.mktemp(dir=".")
  os.link("a", temp_name)
  try:
    os.rename(temp_name, "b") # docs guarantee this is atomic
  except OSError:
    os.unlink(temp_name)

but mktemp() is marked as deprecated.

I'm okay with the stray temp-file floating around to clean up in the event of power-loss after the os.link() but before the os.unlink() call, as is new info, not disposing of existing file-names



[1] https://mail.python.org/pipermail/python-list/2017-April/721641.html
msg292657 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2017-05-01 11:52
You are depending on a non-portable feature of os.rename there, so I'm not convinced this makes a good use case for the Python stdlib.
msg292658 - (view) Author: Ben Finney (bignose) Date: 2017-05-01 12:27
On 01-May-2017, R. David Murray wrote:
> You are depending on a non-portable feature of os.rename there

What's the non-portable dependency?

If you mean the expectation that ‘os.rename’ will be atomic on
success: the documentation promises “If successful, the renaming will
be an atomic operation”, without any non-compatibility caveat
<URL:https://docs.python.org/3/library/os.html#os.rename>.

If you mean the ‘except OSError’ clause: that's not a dependency of
this example. It is handling a case that is documented to occur on
some platforms. On platforms that don't have that behaviour, the
clause is not a dependency of this use case.

Neither of those is germane to the use case being demonstrated: the
need for a temporary filesystem path as generated by ‘tempfile.mktemp’.

> so I'm not convinced this makes a good use case for the Python
> stdlib.

Note that Tim Chase is *not* presenting a use case for ‘os.rename’,
but rather a use case for ‘tempfile.mktemp’. For that purpose it seems
a valid use case.
msg292660 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2017-05-01 12:48
Yes, and I'm saying his example doesn't work on Windows (on windows, it does not accomplish his goal).  So I'm not sure it is a use case appropriate for the standard library.  I'm not saying it definitely isn't, I'm just raising a doubt.
msg292662 - (view) Author: Eryk Sun (eryksun) * (Python triager) Date: 2017-05-01 13:22
For Windows, use os.replace instead of os.rename, which will replace the destination file if it exists. However, a request to replace an existing file will be denied access if the file is currently open. Unlinking an open file isn't allowed, even if it's open with delete sharing.
msg292690 - (view) Author: Tim Chase (gumnos) Date: 2017-05-01 18:45
I do have a workaround thanks to Gregory Ewing
https://mail.python.org/pipermail/python-list/2017-May/721649.html
which suffices for my particular use-case (generating a link to a file with a unique filename).

As my use-case is predominantly for *nix environments, the lack of Windows' os.rename() atomicity is merely a would-be-nice-for-other-people-if-it-was-available. But I'll update my documentation to reflect the edge-case. Thanks, Eryk.
msg292697 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2017-05-01 19:00
>  temp_name = tempfile.mktemp(dir=".")
>  os.link("a", temp_name)

There is a race condition between generating file name and using it. tempfile.mktemp() is not much more useful that just a function that generates some names which unlikely matches the names of existing files the directory. In any case you should catch an error and repeat an attempt with different name. How much attempts to do and what additional checks to do is an application specific.
msg292705 - (view) Author: Ben Finney (bignose) Date: 2017-05-01 22:50
On 01-May-2017, Serhiy Storchaka wrote:

> tempfile.mktemp() is not much more useful that just a function that
> generates some names which unlikely matches the names of existing
> files the directory.

Yes. That is already useful enough to be in the standard library, and
it is there.

It is also maintained and implemented in one place, behind an
unpublished API (by using ‘next(tempfile._get_candidate_names())’).

Don't get distracted by the way ‘tempfile.mktemp’ creates files;
that's irrelevant to this issue. None of the use cases presented here
care at all about the file created, and never use that file. They
*only* want the name, generated by that standard-library algorithm.

The proposal in this issue is to have a public standard library API,
which I'm calling ‘tempfile.makepath’, to access that existing
maintained standard-library functionality.
msg292706 - (view) Author: Eryk Sun (eryksun) * (Python triager) Date: 2017-05-01 23:04
> the lack of Windows' os.rename() atomicity

rename/replace is two system calls: NtOpenFile to get a handle for the source file and NtSetInformationFile to rename it. The difference is only that replace() sets the ReplaceIfExists field in the FileRenameInformation. The rename operation should be atomic.

If the source file is already open without delete/rename sharing, opening it to rename it will fail with sharing violation. After the handle is open, another thread could set the delete disposition on the source file (i.e. flag it to be unlinked), which will cause the rename to fail with access denied. The relative timing of the threads is the difference between getting a FileNotFoundError versus a PermissionError. The other thread could also open a handle to the destination file at this point, which also causes the rename to fail with access denied, but that's no different from what would happen if it were already open.
History
Date User Action Args
2017-05-01 23:04:47eryksunsetmessages: + msg292706
2017-05-01 22:50:14bignosesetmessages: + msg292705
2017-05-01 19:00:55serhiy.storchakasetmessages: + msg292697
2017-05-01 18:45:38gumnossetmessages: + msg292690
2017-05-01 13:22:17eryksunsetnosy: + eryksun
messages: + msg292662
2017-05-01 12:48:15r.david.murraysetmessages: + msg292660
2017-05-01 12:27:48bignosesetmessages: + msg292658
2017-05-01 11:52:51r.david.murraysetnosy: + r.david.murray
messages: + msg292657
2017-05-01 02:48:05gumnossetnosy: + gumnos
messages: + msg292648
2016-02-29 22:09:26bignosesetmessages: + msg261025
2016-02-29 21:20:30serhiy.storchakasetmessages: + msg261023
2016-02-29 20:47:57ethan.furmansetmessages: + msg261022
2016-02-29 20:39:33serhiy.storchakasetmessages: + msg261021
2016-02-29 18:44:45ethan.furmansetmessages: + msg261017
stage: needs patch
2016-02-29 18:39:01bignosesetmessages: + msg261016
2016-02-29 07:50:14serhiy.storchakasetnosy: + serhiy.storchaka, georg.brandl
messages: + msg260995
2016-02-16 05:53:53bignosesetmessages: + msg260349
2016-02-16 05:00:41bignosesetmessages: + msg260346
2016-02-15 04:25:21ethan.furmansetnosy: + ethan.furman
2016-02-15 02:11:11bignosecreate