msg243498 - (view) |
Author: Augie Fackler (durin42) * |
Date: 2015-05-18 17:04 |
Many things in the OS module work with bytestrings for paths in Python 3, but tempfile isn't so happy. It'd be very useful to be able to have the same bytes-in-bytes-out behavior in tempfile as for the os.path.* methods.
This is something we're working around in hg, but I figured I'd file a bug anyway in case we can improve the situation.
|
msg243500 - (view) |
Author: R. David Murray (r.david.murray) *  |
Date: 2015-05-18 17:21 |
Can you explain what you are looking for in more detail? It isn't obvious to me what "bytes in bytes out" means in the context of the tempfile API.
|
msg243502 - (view) |
Author: Augie Fackler (durin42) * |
Date: 2015-05-18 17:24 |
Today we're doing something like this:
tmpdir = tempfile.mkdtemp('', 'hgtests.', d and d.decode('utf-8')).encode('utf-8')
but would rather have
tmpdir = tempfile.mkdtemp(b'', b'hgtests.', d)
# and now tmpdir is a bytestring containing the path to the tmpdir
basically, we want to maintain our pedantry around paths being sequences of bytes.
|
msg243505 - (view) |
Author: Matt Mackall (Matt.Mackall) |
Date: 2015-05-18 17:31 |
Another way of putting it is:
os.listdir(<bytes>) -> [<bytes>,...]
os.listdir(<unicode>) -> [<unicode>,...]
is the usual pattern, and tempfile isn't following it.
|
msg243507 - (view) |
Author: R. David Murray (r.david.murray) *  |
Date: 2015-05-18 17:33 |
I understand now. The problem is that this would be a backward incompatible change.
|
msg243508 - (view) |
Author: R. David Murray (r.david.murray) *  |
Date: 2015-05-18 17:35 |
I'm sorry, hit send before I finished thinking. You are saying bytes input are rejected, so yes this could be fixed.
|
msg243509 - (view) |
Author: R. David Murray (r.david.murray) *  |
Date: 2015-05-18 17:37 |
Would you care to supply a patch? The beta deadline is this coming weekend.
|
msg243552 - (view) |
Author: Gregory P. Smith (gregory.p.smith) *  |
Date: 2015-05-19 01:25 |
it seems a little messy to code this up the way Lib/tempfile.py is written but i'll take a stab at it tonight. Probably via parallel implementations of the methods internally rather than conditional logic throughout given how they work.
|
msg243643 - (view) |
Author: Gregory P. Smith (gregory.p.smith) *  |
Date: 2015-05-20 05:09 |
Attached is a patch with implementation, tests and docs. I didn't have to duplicate too much thankfully. Just figure out where to put the type conversions. Review would be nice, but I'll err on the side of getting this in before beta 1 (May 24 per https://www.python.org/dev/peps/pep-0478/) so any corrections can be dealt with during the beta cycle.
Augie & Matt, can you confirm that this patch does what y'all need for Mercurial?
|
msg243648 - (view) |
Author: Martin Panter (martin.panter) *  |
Date: 2015-05-20 06:36 |
Doing a quick review, expect a few comments on Rietveld
|
msg243649 - (view) |
Author: Serhiy Storchaka (serhiy.storchaka) *  |
Date: 2015-05-20 06:57 |
Added comments on Rietveld. But I'm not sure that this change is worth to do. Only low-level functions in os.path and os modules (and few other) support bytes names (and only bytes, not bytearray!). High level interfaces, such as in tarfile or zipfile, work only with str paths. The question is what level the tempfile module?
In any way Mercurial people need their own wrapper if they want to support Python <3.5.
|
msg243654 - (view) |
Author: R. David Murray (r.david.murray) *  |
Date: 2015-05-20 12:18 |
We have a number of other places in the stdlib where bytes-in-bytes-out is observed, at various levels of abstraction. I think this is reasonable. To answer your direct question, I think tempfile is a convenience-and-do-it-right wrapper around what is really a pretty low level operation (creating a temporary file/directory securely).
|
msg243672 - (view) |
Author: Serhiy Storchaka (serhiy.storchaka) *  |
Date: 2015-05-20 14:24 |
I was vague about this feature, but if other core developer found it useful, I have no objections. Only few nitpicks to the implementation.
|
msg243674 - (view) |
Author: Augie Fackler (durin42) * |
Date: 2015-05-20 15:23 |
> In any way Mercurial people need their own wrapper if they want to support Python <3.5.
We need format operations on bytes objects, so we have no plans to support Python3 for any version before 3.5.
|
msg243675 - (view) |
Author: Augie Fackler (durin42) * |
Date: 2015-05-20 15:24 |
I'll build a patched Python3.5 tomorrow (I'm on the road today) and let you know if this does everything we need. I'd be shocked if it wasn't plenty.
|
msg243809 - (view) |
Author: Gregory P. Smith (gregory.p.smith) *  |
Date: 2015-05-22 06:17 |
code review comments addressed.
|
msg243821 - (view) |
Author: Martin Panter (martin.panter) *  |
Date: 2015-05-22 08:22 |
Looks good in general. I think the signatures in Doc/library/tempfile.rst need updating for the following functions (possibly also version changed notices added somewhere):
* NamedTemporaryFile
* TemporaryFile
* SpooledTemporaryFile
* TemporaryDirectory
|
msg243847 - (view) |
Author: Gregory P. Smith (gregory.p.smith) *  |
Date: 2015-05-22 19:12 |
updated, thanks for the great reviews.
|
msg243865 - (view) |
Author: Roundup Robot (python-dev)  |
Date: 2015-05-22 23:18 |
New changeset 870899ce71f4 by Gregory P. Smith in branch 'default':
Issue 24230: The tempfile module now accepts bytes for prefix, suffix and dir
https://hg.python.org/cpython/rev/870899ce71f4
|
msg243866 - (view) |
Author: Gregory P. Smith (gregory.p.smith) *  |
Date: 2015-05-22 23:21 |
that should take care of it. if you see any other issues with this post commit, either raise them here or fix them directly. the reviews were helpful.
|
msg243876 - (view) |
Author: Augie Fackler (durin42) * |
Date: 2015-05-23 01:20 |
I just removed my hack that gave us unicode to hand to mkdtemp() and everything still passes with your new commit. Thanks much for the quick response!
|
msg261810 - (view) |
Author: STINNER Victor (vstinner) *  |
Date: 2016-03-15 11:44 |
I don't understand the rationale to add new functions using bytes filenames.
IMHO it's a bad practice to pass bytes filenames to write portable code. On Windows, bytes lead to strange bugs: os.listdir(bytes) gives you filenames which don't exist when a filename cannot be encoded to the ANSI code page.
On UNIX, os.fsencode() and os.fsdecode() allows you to convert filenames between unicode and bytes. It provides nice properties like os.fsencode(os.fsdecode(value))==value where value is bytes. No need to add new APIs, use existing APIs an use os.fsencode() and os.fsdecode() on top of them.
Bytes filenames are deprecated on Windows since Python 3.2. The new os.scandir() function doesn't support bytes for example, it's a deliberate choice.
Maybe I missed an important technical issue?
|
msg261811 - (view) |
Author: STINNER Victor (vstinner) *  |
Date: 2016-03-15 11:46 |
"""
Today we're doing something like this:
tmpdir = tempfile.mkdtemp('', 'hgtests.', d and d.decode('utf-8')).encode('utf-8')
"""
Don't do that. UTF-8 is not the right encoding.
Use os.fsencode() and os.fsdecode(). Internally, Python uses sys.getfilesystemencoding() with 'surrogateescape' error handler (but 'strict' error handler on Windows).
If you use UTF-8, you can get indirectly mojibake.
|
|
Date |
User |
Action |
Args |
2022-04-11 14:58:17 | admin | set | github: 68418 |
2016-03-15 11:46:23 | vstinner | set | messages:
+ msg261811 |
2016-03-15 11:44:01 | vstinner | set | nosy:
+ vstinner messages:
+ msg261810
|
2015-05-23 01:20:59 | durin42 | set | messages:
+ msg243876 |
2015-05-22 23:21:09 | gregory.p.smith | set | status: open -> closed resolution: fixed messages:
+ msg243866
stage: patch review -> commit review |
2015-05-22 23:18:32 | python-dev | set | nosy:
+ python-dev messages:
+ msg243865
|
2015-05-22 19:12:38 | gregory.p.smith | set | files:
+ issue24230-gps03.diff
messages:
+ msg243847 |
2015-05-22 08:22:56 | martin.panter | set | messages:
+ msg243821 |
2015-05-22 06:18:00 | gregory.p.smith | set | files:
+ issue24230-gps02.diff
messages:
+ msg243809 |
2015-05-20 15:24:48 | durin42 | set | messages:
+ msg243675 |
2015-05-20 15:23:41 | durin42 | set | messages:
+ msg243674 |
2015-05-20 14:24:19 | serhiy.storchaka | set | nosy:
+ georg.brandl, ncoghlan, pitrou messages:
+ msg243672
|
2015-05-20 12:18:32 | r.david.murray | set | messages:
+ msg243654 |
2015-05-20 06:57:15 | serhiy.storchaka | set | nosy:
+ serhiy.storchaka messages:
+ msg243649
|
2015-05-20 06:36:40 | martin.panter | set | nosy:
+ martin.panter messages:
+ msg243648
|
2015-05-20 05:09:56 | gregory.p.smith | set | files:
+ issue24230-gps01.patch keywords:
+ patch messages:
+ msg243643
stage: needs patch -> patch review |
2015-05-19 01:25:16 | gregory.p.smith | set | messages:
+ msg243552 |
2015-05-19 01:08:12 | gregory.p.smith | set | assignee: gregory.p.smith
nosy:
+ gregory.p.smith |
2015-05-18 17:37:50 | r.david.murray | set | type: enhancement messages:
+ msg243509 stage: needs patch |
2015-05-18 17:35:13 | r.david.murray | set | messages:
+ msg243508 |
2015-05-18 17:33:38 | r.david.murray | set | messages:
+ msg243507 |
2015-05-18 17:31:07 | Matt.Mackall | set | nosy:
+ Matt.Mackall messages:
+ msg243505
|
2015-05-18 17:24:35 | durin42 | set | messages:
+ msg243502 |
2015-05-18 17:21:01 | r.david.murray | set | nosy:
+ r.david.murray messages:
+ msg243500
|
2015-05-18 17:04:53 | durin42 | create | |