classification
Title: tempfile mixes str and bytes in an inconsistent manner
Type: behavior Stage: patch review
Components: Versions: Python 3.10, Python 3.9, Python 3.8, Python 3.7
process
Status: open Resolution:
Dependencies: Superseder:
Assigned To: Nosy List: ericzolf, gregory.p.smith, python-dev, serhiy.storchaka
Priority: normal Keywords: patch

Created on 2020-05-20 19:20 by ericzolf, last changed 2020-05-27 19:59 by gregory.p.smith.

Files
File name Uploaded Description Edit
tempfile.py.diff ericzolf, 2020-05-21 09:36 Patching tempfile.py to properly handle tempdir set to bytes
tempfile_py_2020-05-23.diff ericzolf, 2020-05-23 07:38 Proper patch for tempfile.py for consistent bytes handling
Pull Requests
URL Status Linked Edit
PR 20442 open python-dev, 2020-05-27 05:32
Messages (12)
msg369469 - (view) Author: Eric L. (ericzolf) * Date: 2020-05-20 19:20
tempfile fails on mixed str and bytes when setting tempfile.tempdir to a non-existent bytes path but succeeds when set to an existing bytes path.

$ python3.9
Python 3.9.0a6 (default, Apr 28 2020, 00:00:00) 
[GCC 10.0.1 20200430 (Red Hat 10.0.1-0.14)] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import tempfile
>>> tempfile.tempdir = b'/doesntexist'
>>> tempfile.TemporaryFile()
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/lib64/python3.9/tempfile.py", line 615, in TemporaryFile
    (fd, name) = _mkstemp_inner(dir, prefix, suffix, flags, output_type)
  File "/usr/lib64/python3.9/tempfile.py", line 248, in _mkstemp_inner
    file = _os.path.join(dir, pre + name + suf)
  File "/usr/lib64/python3.9/posixpath.py", line 90, in join
    genericpath._check_arg_types('join', a, *p)
  File "/usr/lib64/python3.9/genericpath.py", line 155, in _check_arg_types
    raise TypeError("Can't mix strings and bytes in path components") from None
TypeError: Can't mix strings and bytes in path components
>>> tempfile.tempdir = b'/tmp'
>>> tempfile.TemporaryFile()
<_io.BufferedRandom name=3>
>>> tempfile.mktemp()
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/lib64/python3.9/tempfile.py", line 400, in mktemp
    file = _os.path.join(dir, prefix + name + suffix)
  File "/usr/lib64/python3.9/posixpath.py", line 90, in join
    genericpath._check_arg_types('join', a, *p)
  File "/usr/lib64/python3.9/genericpath.py", line 155, in _check_arg_types
    raise TypeError("Can't mix strings and bytes in path components") from None
TypeError: Can't mix strings and bytes in path components

It seems to me that the case of `tempfile.tempdir` being of type bytes hasn't been completely considered and is handled inconsistently. My suggestion would be to manage the paths as bytes if there is no other indication like suffix or prefix.
msg369511 - (view) Author: Eric L. (ericzolf) * Date: 2020-05-21 09:36
In the meantime, I noticed the following in addition:

[ericl@tuxedo ~]$ python3.9
Python 3.9.0a6 (default, Apr 28 2020, 00:00:00) 
[GCC 10.0.1 20200430 (Red Hat 10.0.1-0.14)] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import tempfile
>>> tempfile.tempdir = b'/tmp'
>>> tempfile.gettempdir()
b'/tmp'
>>> tempfile.tempdir = '/tmp'
>>> tempfile.gettempdirb()
b'/tmp'

This actually explicitly hurts the interface description which states that tempfile.gettempdir() returns a string.

"Encouraged" by this discovery, I've tried to write a patch of tempfile.py addressing the issues discovered. It's my first patch ever of Python so bare with me. The default remains string but if someone _explicitly_ sets tempdir to bytes, it'll become bytes. I've tried all the commands listed previously and it all looks consistent to me.
msg369698 - (view) Author: Eric L. (ericzolf) * Date: 2020-05-23 07:38
Sorry, I uploaded by mistake an early version of the patch. The new one is the one I had actually tested (the old one would fail with mixing bytes and string under certain circumstances, I can't remember any more).
msg369742 - (view) Author: Gregory P. Smith (gregory.p.smith) * (Python committer) Date: 2020-05-23 19:41
Could you please turn that into a Github PR?
msg369810 - (view) Author: Eric L. (ericzolf) * Date: 2020-05-24 17:15
On 23/05/2020 21:41, Gregory P. Smith wrote:
> Could you please turn that into a Github PR?
I can, if you don't tell me that I need to setup a full-blown Python
development environment to do this.
msg369817 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2020-05-24 18:30
Maybe just document that tempdir should be a string?
msg369850 - (view) Author: Eric L. (ericzolf) * Date: 2020-05-25 05:57
On 24/05/2020 20:30, Serhiy Storchaka wrote:
> Maybe just document that tempdir should be a string?
I would definitely prefer to have bytes paths considered as 1st class
citizen.
msg369855 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2020-05-25 07:35
In any case this is a new feature, so it can be added only in 3.10, and we need the documentation patch for 3.9 and older.

As a workaround you can use os.fsdecode():

tempfile.tempdir = os.fsdecode(b'/doesntexist')
msg370049 - (view) Author: Eric L. (ericzolf) * Date: 2020-05-27 05:44
Well, your decision but, as a user of the library, it didn't feel like a new feature just like a bug to be fixed, the main issue being the inconsistent handling of bytes vs. str.
msg370110 - (view) Author: Gregory P. Smith (gregory.p.smith) * (Python committer) Date: 2020-05-27 16:56
We consider it closer to new feature as it changes existing behavior in a way that people cannot _depend_ on being present in older Python releases as it'd only appear in a bugfix release, so most people could never write code depending on it while claiming to generally support 3.7-3.9.

Anyways your PR overall looks good for 3.10.  I left some comments.
msg370125 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2020-05-27 19:56
Well, the behavior for an existing bytes path was not unintended, but some people can depend on it.

But before making it an official feature, we should also check other cases of an unintended behavior. What if set tempfile.tempdir to a Path object or to a file descriptor of a directory? Does anything work in these cases and should we support them?
msg370126 - (view) Author: Gregory P. Smith (gregory.p.smith) * (Python committer) Date: 2020-05-27 19:59
I expect the best decision to be to get rid of tempfile.tempdir entirely.  That would need be its own issue with a deprecation period involved.

A process global that alters behavior of all calls into a module that don't explicitly opt-out is a bad API.
History
Date User Action Args
2020-05-27 19:59:51gregory.p.smithsetmessages: + msg370126
2020-05-27 19:56:18serhiy.storchakasetmessages: + msg370125
2020-05-27 16:56:07gregory.p.smithsetmessages: + msg370110
2020-05-27 05:44:50ericzolfsetmessages: + msg370049
2020-05-27 05:32:05python-devsetnosy: + python-dev

pull_requests: + pull_request19698
stage: patch review
2020-05-25 07:35:41serhiy.storchakasetmessages: + msg369855
2020-05-25 05:57:46ericzolfsetmessages: + msg369850
2020-05-24 18:30:40serhiy.storchakasetnosy: + serhiy.storchaka
messages: + msg369817
2020-05-24 17:15:19ericzolfsetmessages: + msg369810
2020-05-23 19:41:55gregory.p.smithsetmessages: + msg369742
2020-05-23 16:26:56DahlitzFloriansetnosy: - DahlitzFlorian
2020-05-23 07:38:11ericzolfsetfiles: + tempfile_py_2020-05-23.diff

messages: + msg369698
2020-05-23 07:29:16terry.reedysetversions: + Python 3.10, - Python 3.5, Python 3.6
2020-05-21 10:33:37serhiy.storchakasetnosy: + gregory.p.smith
2020-05-21 09:36:26ericzolfsetfiles: + tempfile.py.diff
keywords: + patch
messages: + msg369511
2020-05-21 08:34:26DahlitzFloriansetnosy: + DahlitzFlorian
2020-05-20 19:20:24ericzolfcreate