New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
tempfile.mkdtemp() doesn't work with bytes paths #68418
Comments
Many things in the OS module work with bytestrings for paths in Python 3, but tempfile isn't so happy. It'd be very useful to be able to have the same bytes-in-bytes-out behavior in tempfile as for the os.path.* methods. This is something we're working around in hg, but I figured I'd file a bug anyway in case we can improve the situation. |
Can you explain what you are looking for in more detail? It isn't obvious to me what "bytes in bytes out" means in the context of the tempfile API. |
Today we're doing something like this: tmpdir = tempfile.mkdtemp('', 'hgtests.', d and d.decode('utf-8')).encode('utf-8') but would rather have tmpdir = tempfile.mkdtemp(b'', b'hgtests.', d)
# and now tmpdir is a bytestring containing the path to the tmpdir basically, we want to maintain our pedantry around paths being sequences of bytes. |
Another way of putting it is: os.listdir(<bytes>) -> [<bytes>,...] is the usual pattern, and tempfile isn't following it. |
I understand now. The problem is that this would be a backward incompatible change. |
I'm sorry, hit send before I finished thinking. You are saying bytes input are rejected, so yes this could be fixed. |
Would you care to supply a patch? The beta deadline is this coming weekend. |
it seems a little messy to code this up the way Lib/tempfile.py is written but i'll take a stab at it tonight. Probably via parallel implementations of the methods internally rather than conditional logic throughout given how they work. |
Attached is a patch with implementation, tests and docs. I didn't have to duplicate too much thankfully. Just figure out where to put the type conversions. Review would be nice, but I'll err on the side of getting this in before beta 1 (May 24 per https://www.python.org/dev/peps/pep-0478/) so any corrections can be dealt with during the beta cycle. Augie & Matt, can you confirm that this patch does what y'all need for Mercurial? |
Doing a quick review, expect a few comments on Rietveld |
Added comments on Rietveld. But I'm not sure that this change is worth to do. Only low-level functions in os.path and os modules (and few other) support bytes names (and only bytes, not bytearray!). High level interfaces, such as in tarfile or zipfile, work only with str paths. The question is what level the tempfile module? In any way Mercurial people need their own wrapper if they want to support Python <3.5. |
We have a number of other places in the stdlib where bytes-in-bytes-out is observed, at various levels of abstraction. I think this is reasonable. To answer your direct question, I think tempfile is a convenience-and-do-it-right wrapper around what is really a pretty low level operation (creating a temporary file/directory securely). |
I was vague about this feature, but if other core developer found it useful, I have no objections. Only few nitpicks to the implementation. |
We need format operations on bytes objects, so we have no plans to support Python3 for any version before 3.5. |
I'll build a patched Python3.5 tomorrow (I'm on the road today) and let you know if this does everything we need. I'd be shocked if it wasn't plenty. |
code review comments addressed. |
Looks good in general. I think the signatures in Doc/library/tempfile.rst need updating for the following functions (possibly also version changed notices added somewhere):
|
updated, thanks for the great reviews. |
New changeset 870899ce71f4 by Gregory P. Smith in branch 'default': |
that should take care of it. if you see any other issues with this post commit, either raise them here or fix them directly. the reviews were helpful. |
I just removed my hack that gave us unicode to hand to mkdtemp() and everything still passes with your new commit. Thanks much for the quick response! |
I don't understand the rationale to add new functions using bytes filenames. IMHO it's a bad practice to pass bytes filenames to write portable code. On Windows, bytes lead to strange bugs: os.listdir(bytes) gives you filenames which don't exist when a filename cannot be encoded to the ANSI code page. On UNIX, os.fsencode() and os.fsdecode() allows you to convert filenames between unicode and bytes. It provides nice properties like os.fsencode(os.fsdecode(value))==value where value is bytes. No need to add new APIs, use existing APIs an use os.fsencode() and os.fsdecode() on top of them. Bytes filenames are deprecated on Windows since Python 3.2. The new os.scandir() function doesn't support bytes for example, it's a deliberate choice. Maybe I missed an important technical issue? |
""" tmpdir = tempfile.mkdtemp('', 'hgtests.', d and d.decode('utf-8')).encode('utf-8')
""" Don't do that. UTF-8 is not the right encoding. Use os.fsencode() and os.fsdecode(). Internally, Python uses sys.getfilesystemencoding() with 'surrogateescape' error handler (but 'strict' error handler on Windows). If you use UTF-8, you can get indirectly mojibake. |
Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.
Show more details
GitHub fields:
bugs.python.org fields:
The text was updated successfully, but these errors were encountered: