classification
Title: tempfile.mkdtemp() doesn't work with bytes paths
Type: enhancement Stage: commit review
Components: Library (Lib) Versions: Python 3.5
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: gregory.p.smith Nosy List: Matt.Mackall, durin42, georg.brandl, gregory.p.smith, haypo, martin.panter, ncoghlan, pitrou, python-dev, r.david.murray, serhiy.storchaka
Priority: normal Keywords: patch

Created on 2015-05-18 17:04 by durin42, last changed 2016-03-15 11:46 by haypo. This issue is now closed.

Files
File name Uploaded Description Edit
issue24230-gps01.patch gregory.p.smith, 2015-05-20 05:09 review
issue24230-gps02.diff gregory.p.smith, 2015-05-22 06:17 review
issue24230-gps03.diff gregory.p.smith, 2015-05-22 19:12 review
Messages (23)
msg243498 - (view) Author: Augie Fackler (durin42) * Date: 2015-05-18 17:04
Many things in the OS module work with bytestrings for paths in Python 3, but tempfile isn't so happy. It'd be very useful to be able to have the same bytes-in-bytes-out behavior in tempfile as for the os.path.* methods.

This is something we're working around in hg, but I figured I'd file a bug anyway in case we can improve the situation.
msg243500 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2015-05-18 17:21
Can you explain what you are looking for in more detail?  It isn't obvious to me what "bytes in bytes out" means in the context of the tempfile API.
msg243502 - (view) Author: Augie Fackler (durin42) * Date: 2015-05-18 17:24
Today we're doing something like this:

tmpdir = tempfile.mkdtemp('', 'hgtests.', d and d.decode('utf-8')).encode('utf-8')

but would rather have

tmpdir = tempfile.mkdtemp(b'', b'hgtests.', d)
# and now tmpdir is a bytestring containing the path to the tmpdir

basically, we want to maintain our pedantry around paths being sequences of bytes.
msg243505 - (view) Author: Matt Mackall (Matt.Mackall) Date: 2015-05-18 17:31
Another way of putting it is:

os.listdir(<bytes>) -> [<bytes>,...]
os.listdir(<unicode>) -> [<unicode>,...]

is the usual pattern, and tempfile isn't following it.
msg243507 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2015-05-18 17:33
I understand now.  The problem is that this would be a backward incompatible change.
msg243508 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2015-05-18 17:35
I'm sorry, hit send before I finished thinking.  You are saying bytes input are rejected, so yes this could be fixed.
msg243509 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2015-05-18 17:37
Would you care to supply a patch?  The beta deadline is this coming weekend.
msg243552 - (view) Author: Gregory P. Smith (gregory.p.smith) * (Python committer) Date: 2015-05-19 01:25
it seems a little messy to code this up the way Lib/tempfile.py is written but i'll take a stab at it tonight.  Probably via parallel implementations of the methods internally rather than conditional logic throughout given how they work.
msg243643 - (view) Author: Gregory P. Smith (gregory.p.smith) * (Python committer) Date: 2015-05-20 05:09
Attached is a patch with implementation, tests and docs.  I didn't have to duplicate too much thankfully.  Just figure out where to put the type conversions.  Review would be nice, but I'll err on the side of getting this in before beta 1 (May 24 per https://www.python.org/dev/peps/pep-0478/) so any corrections can be dealt with during the beta cycle.

Augie & Matt, can you confirm that this patch does what y'all need for Mercurial?
msg243648 - (view) Author: Martin Panter (martin.panter) * (Python committer) Date: 2015-05-20 06:36
Doing a quick review, expect a few comments on Rietveld
msg243649 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2015-05-20 06:57
Added comments on Rietveld. But I'm not sure that this change is worth to do. Only low-level functions in os.path and os modules (and few other) support bytes names (and only bytes, not bytearray!). High level interfaces, such as in tarfile or zipfile, work only with str paths. The question is what level the tempfile module?

In any way Mercurial people need their own wrapper if they want to support Python <3.5.
msg243654 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2015-05-20 12:18
We have a number of other places in the stdlib where bytes-in-bytes-out is observed, at various levels of abstraction.  I think this is reasonable.  To answer your direct question, I think tempfile is a convenience-and-do-it-right wrapper around what is really a pretty low level operation (creating a temporary file/directory securely).
msg243672 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2015-05-20 14:24
I was vague about this feature, but if other core developer found it useful, I have no objections. Only few nitpicks to the implementation.
msg243674 - (view) Author: Augie Fackler (durin42) * Date: 2015-05-20 15:23
> In any way Mercurial people need their own wrapper if they want to support Python <3.5.

We need format operations on bytes objects, so we have no plans to support Python3 for any version before 3.5.
msg243675 - (view) Author: Augie Fackler (durin42) * Date: 2015-05-20 15:24
I'll build a patched Python3.5 tomorrow (I'm on the road today) and let you know if this does everything we need. I'd be shocked if it wasn't plenty.
msg243809 - (view) Author: Gregory P. Smith (gregory.p.smith) * (Python committer) Date: 2015-05-22 06:17
code review comments addressed.
msg243821 - (view) Author: Martin Panter (martin.panter) * (Python committer) Date: 2015-05-22 08:22
Looks good in general. I think the signatures in Doc/library/tempfile.rst need updating for the following functions (possibly also version changed notices added somewhere):

* NamedTemporaryFile
* TemporaryFile
* SpooledTemporaryFile
* TemporaryDirectory
msg243847 - (view) Author: Gregory P. Smith (gregory.p.smith) * (Python committer) Date: 2015-05-22 19:12
updated, thanks for the great reviews.
msg243865 - (view) Author: Roundup Robot (python-dev) Date: 2015-05-22 23:18
New changeset 870899ce71f4 by Gregory P. Smith in branch 'default':
Issue 24230: The tempfile module now accepts bytes for prefix, suffix and dir
https://hg.python.org/cpython/rev/870899ce71f4
msg243866 - (view) Author: Gregory P. Smith (gregory.p.smith) * (Python committer) Date: 2015-05-22 23:21
that should take care of it.  if you see any other issues with this post commit, either raise them here or fix them directly.  the reviews were helpful.
msg243876 - (view) Author: Augie Fackler (durin42) * Date: 2015-05-23 01:20
I just removed my hack that gave us unicode to hand to mkdtemp() and everything still passes with your new commit. Thanks much for the quick response!
msg261810 - (view) Author: STINNER Victor (haypo) * (Python committer) Date: 2016-03-15 11:44
I don't understand the rationale to add new functions using bytes filenames.

IMHO it's a bad practice to pass bytes filenames to write portable code. On Windows, bytes lead to strange bugs: os.listdir(bytes) gives you filenames which don't exist when a filename cannot be encoded to the ANSI code page.

On UNIX, os.fsencode() and os.fsdecode() allows you to convert filenames between unicode and bytes. It provides nice properties like os.fsencode(os.fsdecode(value))==value where value is bytes. No need to add new APIs, use existing APIs an use os.fsencode() and os.fsdecode() on top of them.

Bytes filenames are deprecated on Windows since Python 3.2. The new os.scandir() function doesn't support bytes for example, it's a deliberate choice.

Maybe I missed an important technical issue?
msg261811 - (view) Author: STINNER Victor (haypo) * (Python committer) Date: 2016-03-15 11:46
"""
Today we're doing something like this:

tmpdir = tempfile.mkdtemp('', 'hgtests.', d and d.decode('utf-8')).encode('utf-8')
"""

Don't do that. UTF-8 is not the right encoding.

Use os.fsencode() and os.fsdecode(). Internally, Python uses sys.getfilesystemencoding() with 'surrogateescape' error handler (but 'strict' error handler on Windows).

If you use UTF-8, you can get indirectly mojibake.
History
Date User Action Args
2016-03-15 11:46:23hayposetmessages: + msg261811
2016-03-15 11:44:01hayposetnosy: + haypo
messages: + msg261810
2015-05-23 01:20:59durin42setmessages: + msg243876
2015-05-22 23:21:09gregory.p.smithsetstatus: open -> closed
resolution: fixed
messages: + msg243866

stage: patch review -> commit review
2015-05-22 23:18:32python-devsetnosy: + python-dev
messages: + msg243865
2015-05-22 19:12:38gregory.p.smithsetfiles: + issue24230-gps03.diff

messages: + msg243847
2015-05-22 08:22:56martin.pantersetmessages: + msg243821
2015-05-22 06:18:00gregory.p.smithsetfiles: + issue24230-gps02.diff

messages: + msg243809
2015-05-20 15:24:48durin42setmessages: + msg243675
2015-05-20 15:23:41durin42setmessages: + msg243674
2015-05-20 14:24:19serhiy.storchakasetnosy: + georg.brandl, ncoghlan, pitrou
messages: + msg243672
2015-05-20 12:18:32r.david.murraysetmessages: + msg243654
2015-05-20 06:57:15serhiy.storchakasetnosy: + serhiy.storchaka
messages: + msg243649
2015-05-20 06:36:40martin.pantersetnosy: + martin.panter
messages: + msg243648
2015-05-20 05:09:56gregory.p.smithsetfiles: + issue24230-gps01.patch
keywords: + patch
messages: + msg243643

stage: needs patch -> patch review
2015-05-19 01:25:16gregory.p.smithsetmessages: + msg243552
2015-05-19 01:08:12gregory.p.smithsetassignee: gregory.p.smith

nosy: + gregory.p.smith
2015-05-18 17:37:50r.david.murraysettype: enhancement
messages: + msg243509
stage: needs patch
2015-05-18 17:35:13r.david.murraysetmessages: + msg243508
2015-05-18 17:33:38r.david.murraysetmessages: + msg243507
2015-05-18 17:31:07Matt.Mackallsetnosy: + Matt.Mackall
messages: + msg243505
2015-05-18 17:24:35durin42setmessages: + msg243502
2015-05-18 17:21:01r.david.murraysetnosy: + r.david.murray
messages: + msg243500
2015-05-18 17:04:53durin42create