classification
Title: tempfile mixes str and bytes in an inconsistent manner
Type: behavior Stage: resolved
Components: Versions: Python 3.10
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: lukasz.langa Nosy List: ericzolf, gregory.p.smith, lukasz.langa, miss-islington, python-dev, serhiy.storchaka
Priority: normal Keywords: patch

Created on 2020-05-20 19:20 by ericzolf, last changed 2021-04-10 17:55 by gregory.p.smith. This issue is now closed.

Files
File name Uploaded Description Edit
tempfile.py.diff ericzolf, 2020-05-21 09:36 Patching tempfile.py to properly handle tempdir set to bytes
tempfile_py_2020-05-23.diff ericzolf, 2020-05-23 07:38 Proper patch for tempfile.py for consistent bytes handling
Pull Requests
URL Status Linked Edit
PR 20442 merged python-dev, 2020-05-27 05:32
PR 24734 closed miss-islington, 2021-03-03 20:36
PR 25334 merged gregory.p.smith, 2021-04-10 17:53
Messages (18)
msg369469 - (view) Author: Eric L. (ericzolf) * Date: 2020-05-20 19:20
tempfile fails on mixed str and bytes when setting tempfile.tempdir to a non-existent bytes path but succeeds when set to an existing bytes path.

$ python3.9
Python 3.9.0a6 (default, Apr 28 2020, 00:00:00) 
[GCC 10.0.1 20200430 (Red Hat 10.0.1-0.14)] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import tempfile
>>> tempfile.tempdir = b'/doesntexist'
>>> tempfile.TemporaryFile()
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/lib64/python3.9/tempfile.py", line 615, in TemporaryFile
    (fd, name) = _mkstemp_inner(dir, prefix, suffix, flags, output_type)
  File "/usr/lib64/python3.9/tempfile.py", line 248, in _mkstemp_inner
    file = _os.path.join(dir, pre + name + suf)
  File "/usr/lib64/python3.9/posixpath.py", line 90, in join
    genericpath._check_arg_types('join', a, *p)
  File "/usr/lib64/python3.9/genericpath.py", line 155, in _check_arg_types
    raise TypeError("Can't mix strings and bytes in path components") from None
TypeError: Can't mix strings and bytes in path components
>>> tempfile.tempdir = b'/tmp'
>>> tempfile.TemporaryFile()
<_io.BufferedRandom name=3>
>>> tempfile.mktemp()
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/lib64/python3.9/tempfile.py", line 400, in mktemp
    file = _os.path.join(dir, prefix + name + suffix)
  File "/usr/lib64/python3.9/posixpath.py", line 90, in join
    genericpath._check_arg_types('join', a, *p)
  File "/usr/lib64/python3.9/genericpath.py", line 155, in _check_arg_types
    raise TypeError("Can't mix strings and bytes in path components") from None
TypeError: Can't mix strings and bytes in path components

It seems to me that the case of `tempfile.tempdir` being of type bytes hasn't been completely considered and is handled inconsistently. My suggestion would be to manage the paths as bytes if there is no other indication like suffix or prefix.
msg369511 - (view) Author: Eric L. (ericzolf) * Date: 2020-05-21 09:36
In the meantime, I noticed the following in addition:

[ericl@tuxedo ~]$ python3.9
Python 3.9.0a6 (default, Apr 28 2020, 00:00:00) 
[GCC 10.0.1 20200430 (Red Hat 10.0.1-0.14)] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import tempfile
>>> tempfile.tempdir = b'/tmp'
>>> tempfile.gettempdir()
b'/tmp'
>>> tempfile.tempdir = '/tmp'
>>> tempfile.gettempdirb()
b'/tmp'

This actually explicitly hurts the interface description which states that tempfile.gettempdir() returns a string.

"Encouraged" by this discovery, I've tried to write a patch of tempfile.py addressing the issues discovered. It's my first patch ever of Python so bare with me. The default remains string but if someone _explicitly_ sets tempdir to bytes, it'll become bytes. I've tried all the commands listed previously and it all looks consistent to me.
msg369698 - (view) Author: Eric L. (ericzolf) * Date: 2020-05-23 07:38
Sorry, I uploaded by mistake an early version of the patch. The new one is the one I had actually tested (the old one would fail with mixing bytes and string under certain circumstances, I can't remember any more).
msg369742 - (view) Author: Gregory P. Smith (gregory.p.smith) * (Python committer) Date: 2020-05-23 19:41
Could you please turn that into a Github PR?
msg369810 - (view) Author: Eric L. (ericzolf) * Date: 2020-05-24 17:15
On 23/05/2020 21:41, Gregory P. Smith wrote:
> Could you please turn that into a Github PR?
I can, if you don't tell me that I need to setup a full-blown Python
development environment to do this.
msg369817 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2020-05-24 18:30
Maybe just document that tempdir should be a string?
msg369850 - (view) Author: Eric L. (ericzolf) * Date: 2020-05-25 05:57
On 24/05/2020 20:30, Serhiy Storchaka wrote:
> Maybe just document that tempdir should be a string?
I would definitely prefer to have bytes paths considered as 1st class
citizen.
msg369855 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2020-05-25 07:35
In any case this is a new feature, so it can be added only in 3.10, and we need the documentation patch for 3.9 and older.

As a workaround you can use os.fsdecode():

tempfile.tempdir = os.fsdecode(b'/doesntexist')
msg370049 - (view) Author: Eric L. (ericzolf) * Date: 2020-05-27 05:44
Well, your decision but, as a user of the library, it didn't feel like a new feature just like a bug to be fixed, the main issue being the inconsistent handling of bytes vs. str.
msg370110 - (view) Author: Gregory P. Smith (gregory.p.smith) * (Python committer) Date: 2020-05-27 16:56
We consider it closer to new feature as it changes existing behavior in a way that people cannot _depend_ on being present in older Python releases as it'd only appear in a bugfix release, so most people could never write code depending on it while claiming to generally support 3.7-3.9.

Anyways your PR overall looks good for 3.10.  I left some comments.
msg370125 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2020-05-27 19:56
Well, the behavior for an existing bytes path was not unintended, but some people can depend on it.

But before making it an official feature, we should also check other cases of an unintended behavior. What if set tempfile.tempdir to a Path object or to a file descriptor of a directory? Does anything work in these cases and should we support them?
msg370126 - (view) Author: Gregory P. Smith (gregory.p.smith) * (Python committer) Date: 2020-05-27 19:59
I expect the best decision to be to get rid of tempfile.tempdir entirely.  That would need be its own issue with a deprecation period involved.

A process global that alters behavior of all calls into a module that don't explicitly opt-out is a bad API.
msg387922 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2021-03-02 12:30
> A process global that alters behavior of all calls into a module that don't explicitly opt-out is a bad API.

I don't think that it is so bad. The behavior depends on environment variables TMPDIR, TEMP, TMP. The tempdir variable is just a cache for them. As sys.path is a cache for PYTHONPATH. We need just document that it should be a string if not None. Nobody expects bytes paths be valid in sys.path.

On other hand, there is gettempdir(), so we have two different ways to get the value of tempfile.tempdir.
msg387930 - (view) Author: Gregory P. Smith (gregory.p.smith) * (Python committer) Date: 2021-03-02 14:24
I clarified the documentation in the PR and added a regression test.

I chose to explicitly document that tempfile.tempdir may only be str or bytes and cannot be a path-like object.

We already document that people really should not set it and instead pass dir= to their APIs.  Now the docs make that more clear when it comes to setting it to a bytes object due to the global API return type change consequences.

I view this PR as cleaning up a partial misbehavior while preserving an API wart of it ever working in any manner when set to a bytes value.

getting rid of tempfile.tempdir entirely or preventing it from being a bytes value at all would be much more of a breaking change, even though desirable.  Not a bugfix.  Now with the PR as is, we at least document that people should avoid doing that and make it clear what consistent behavior happens when they do it anyways.  With a test.
msg388056 - (view) Author: Gregory P. Smith (gregory.p.smith) * (Python committer) Date: 2021-03-03 20:36
New changeset 9c7927400cd8f1d283bf7915b6b33fea81b8655d by Eric L in branch 'master':
bpo-40701: tempfile mixes str and bytes in an inconsistent manner (GH-20442)
https://github.com/python/cpython/commit/9c7927400cd8f1d283bf7915b6b33fea81b8655d
msg390684 - (view) Author: Gregory P. Smith (gregory.p.smith) * (Python committer) Date: 2021-04-10 01:27
Fixed for 3.10.  Marking as a release blocker for 3.9 and assigning for a release manager decision on if they accept this change (the PR) as a bugfix in 3.9 or not.  (see the PR)

It is late enough in 3.8's lifetime I wouldn't touch that one.
msg390702 - (view) Author: Ɓukasz Langa (lukasz.langa) * (Python committer) Date: 2021-04-10 08:16
Sorry, I decided not to take this. We're four bugfix releases into 3.9 and there is non-zero chance somebody relies on existing behavior somehow. Marking this as changing with 3.10.0 is cleaner from an end user standpoint.
msg390727 - (view) Author: Gregory P. Smith (gregory.p.smith) * (Python committer) Date: 2021-04-10 17:55
New changeset e05a703848473b0365886dcc593cbddc46609f29 by Gregory P. Smith in branch 'master':
bpo-40701: doc typo historcal -> historical (GH-25334)
https://github.com/python/cpython/commit/e05a703848473b0365886dcc593cbddc46609f29
History
Date User Action Args
2021-04-10 17:55:21gregory.p.smithsetmessages: + msg390727
2021-04-10 17:53:49gregory.p.smithsetpull_requests: + pull_request24068
2021-04-10 08:16:15lukasz.langasetstatus: open -> closed
priority: release blocker -> normal
versions: + Python 3.10, - Python 3.9
messages: + msg390702

resolution: fixed
stage: patch review -> resolved
2021-04-10 01:27:18gregory.p.smithsetpriority: normal -> release blocker

nosy: + lukasz.langa
versions: - Python 3.7, Python 3.8, Python 3.10
messages: + msg390684

assignee: gregory.p.smith -> lukasz.langa
2021-03-03 20:36:42miss-islingtonsetnosy: + miss-islington
pull_requests: + pull_request23505
2021-03-03 20:36:30gregory.p.smithsetmessages: + msg388056
2021-03-02 14:24:14gregory.p.smithsetmessages: + msg387930
2021-03-02 12:30:30serhiy.storchakasetmessages: + msg387922
2021-03-02 11:55:46gregory.p.smithsetassignee: gregory.p.smith
2020-05-27 19:59:51gregory.p.smithsetmessages: + msg370126
2020-05-27 19:56:18serhiy.storchakasetmessages: + msg370125
2020-05-27 16:56:07gregory.p.smithsetmessages: + msg370110
2020-05-27 05:44:50ericzolfsetmessages: + msg370049
2020-05-27 05:32:05python-devsetnosy: + python-dev

pull_requests: + pull_request19698
stage: patch review
2020-05-25 07:35:41serhiy.storchakasetmessages: + msg369855
2020-05-25 05:57:46ericzolfsetmessages: + msg369850
2020-05-24 18:30:40serhiy.storchakasetnosy: + serhiy.storchaka
messages: + msg369817
2020-05-24 17:15:19ericzolfsetmessages: + msg369810
2020-05-23 19:41:55gregory.p.smithsetmessages: + msg369742
2020-05-23 16:26:56DahlitzFloriansetnosy: - DahlitzFlorian
2020-05-23 07:38:11ericzolfsetfiles: + tempfile_py_2020-05-23.diff

messages: + msg369698
2020-05-23 07:29:16terry.reedysetversions: + Python 3.10, - Python 3.5, Python 3.6
2020-05-21 10:33:37serhiy.storchakasetnosy: + gregory.p.smith
2020-05-21 09:36:26ericzolfsetfiles: + tempfile.py.diff
keywords: + patch
messages: + msg369511
2020-05-21 08:34:26DahlitzFloriansetnosy: + DahlitzFlorian
2020-05-20 19:20:24ericzolfcreate