Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

tempfile.mkdtemp() doesn't work with bytes paths #68418

Closed
durin42 mannequin opened this issue May 18, 2015 · 23 comments
Closed

tempfile.mkdtemp() doesn't work with bytes paths #68418

durin42 mannequin opened this issue May 18, 2015 · 23 comments
Assignees
Labels
stdlib Python modules in the Lib dir type-feature A feature request or enhancement

Comments

@durin42
Copy link
Mannequin

durin42 mannequin commented May 18, 2015

BPO 24230
Nosy @birkenfeld, @gpshead, @ncoghlan, @pitrou, @vstinner, @bitdancer, @vadmium, @serhiy-storchaka
Files
  • issue24230-gps01.patch
  • issue24230-gps02.diff
  • issue24230-gps03.diff
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = 'https://github.com/gpshead'
    closed_at = <Date 2015-05-22.23:21:09.422>
    created_at = <Date 2015-05-18.17:04:53.985>
    labels = ['type-feature', 'library']
    title = "tempfile.mkdtemp() doesn't work with bytes paths"
    updated_at = <Date 2016-03-15.11:46:23.756>
    user = 'https://bugs.python.org/durin42'

    bugs.python.org fields:

    activity = <Date 2016-03-15.11:46:23.756>
    actor = 'vstinner'
    assignee = 'gregory.p.smith'
    closed = True
    closed_date = <Date 2015-05-22.23:21:09.422>
    closer = 'gregory.p.smith'
    components = ['Library (Lib)']
    creation = <Date 2015-05-18.17:04:53.985>
    creator = 'durin42'
    dependencies = []
    files = ['39436', '39464', '39469']
    hgrepos = []
    issue_num = 24230
    keywords = ['patch']
    message_count = 23.0
    messages = ['243498', '243500', '243502', '243505', '243507', '243508', '243509', '243552', '243643', '243648', '243649', '243654', '243672', '243674', '243675', '243809', '243821', '243847', '243865', '243866', '243876', '261810', '261811']
    nosy_count = 11.0
    nosy_names = ['georg.brandl', 'gregory.p.smith', 'ncoghlan', 'pitrou', 'vstinner', 'durin42', 'r.david.murray', 'python-dev', 'martin.panter', 'Matt.Mackall', 'serhiy.storchaka']
    pr_nums = []
    priority = 'normal'
    resolution = 'fixed'
    stage = 'commit review'
    status = 'closed'
    superseder = None
    type = 'enhancement'
    url = 'https://bugs.python.org/issue24230'
    versions = ['Python 3.5']

    @durin42
    Copy link
    Mannequin Author

    durin42 mannequin commented May 18, 2015

    Many things in the OS module work with bytestrings for paths in Python 3, but tempfile isn't so happy. It'd be very useful to be able to have the same bytes-in-bytes-out behavior in tempfile as for the os.path.* methods.

    This is something we're working around in hg, but I figured I'd file a bug anyway in case we can improve the situation.

    @durin42 durin42 mannequin added the stdlib Python modules in the Lib dir label May 18, 2015
    @bitdancer
    Copy link
    Member

    Can you explain what you are looking for in more detail? It isn't obvious to me what "bytes in bytes out" means in the context of the tempfile API.

    @durin42
    Copy link
    Mannequin Author

    durin42 mannequin commented May 18, 2015

    Today we're doing something like this:

    tmpdir = tempfile.mkdtemp('', 'hgtests.', d and d.decode('utf-8')).encode('utf-8')

    but would rather have

    tmpdir = tempfile.mkdtemp(b'', b'hgtests.', d)
    # and now tmpdir is a bytestring containing the path to the tmpdir

    basically, we want to maintain our pedantry around paths being sequences of bytes.

    @MattMackall
    Copy link
    Mannequin

    MattMackall mannequin commented May 18, 2015

    Another way of putting it is:

    os.listdir(<bytes>) -> [<bytes>,...]
    os.listdir(<unicode>) -> [<unicode>,...]

    is the usual pattern, and tempfile isn't following it.

    @bitdancer
    Copy link
    Member

    I understand now. The problem is that this would be a backward incompatible change.

    @bitdancer
    Copy link
    Member

    I'm sorry, hit send before I finished thinking. You are saying bytes input are rejected, so yes this could be fixed.

    @bitdancer
    Copy link
    Member

    Would you care to supply a patch? The beta deadline is this coming weekend.

    @bitdancer bitdancer added the type-feature A feature request or enhancement label May 18, 2015
    @gpshead gpshead self-assigned this May 19, 2015
    @gpshead
    Copy link
    Member

    gpshead commented May 19, 2015

    it seems a little messy to code this up the way Lib/tempfile.py is written but i'll take a stab at it tonight. Probably via parallel implementations of the methods internally rather than conditional logic throughout given how they work.

    @gpshead
    Copy link
    Member

    gpshead commented May 20, 2015

    Attached is a patch with implementation, tests and docs. I didn't have to duplicate too much thankfully. Just figure out where to put the type conversions. Review would be nice, but I'll err on the side of getting this in before beta 1 (May 24 per https://www.python.org/dev/peps/pep-0478/) so any corrections can be dealt with during the beta cycle.

    Augie & Matt, can you confirm that this patch does what y'all need for Mercurial?

    @vadmium
    Copy link
    Member

    vadmium commented May 20, 2015

    Doing a quick review, expect a few comments on Rietveld

    @serhiy-storchaka
    Copy link
    Member

    Added comments on Rietveld. But I'm not sure that this change is worth to do. Only low-level functions in os.path and os modules (and few other) support bytes names (and only bytes, not bytearray!). High level interfaces, such as in tarfile or zipfile, work only with str paths. The question is what level the tempfile module?

    In any way Mercurial people need their own wrapper if they want to support Python <3.5.

    @bitdancer
    Copy link
    Member

    We have a number of other places in the stdlib where bytes-in-bytes-out is observed, at various levels of abstraction. I think this is reasonable. To answer your direct question, I think tempfile is a convenience-and-do-it-right wrapper around what is really a pretty low level operation (creating a temporary file/directory securely).

    @serhiy-storchaka
    Copy link
    Member

    I was vague about this feature, but if other core developer found it useful, I have no objections. Only few nitpicks to the implementation.

    @durin42
    Copy link
    Mannequin Author

    durin42 mannequin commented May 20, 2015

    In any way Mercurial people need their own wrapper if they want to support Python <3.5.

    We need format operations on bytes objects, so we have no plans to support Python3 for any version before 3.5.

    @durin42
    Copy link
    Mannequin Author

    durin42 mannequin commented May 20, 2015

    I'll build a patched Python3.5 tomorrow (I'm on the road today) and let you know if this does everything we need. I'd be shocked if it wasn't plenty.

    @gpshead
    Copy link
    Member

    gpshead commented May 22, 2015

    code review comments addressed.

    @vadmium
    Copy link
    Member

    vadmium commented May 22, 2015

    Looks good in general. I think the signatures in Doc/library/tempfile.rst need updating for the following functions (possibly also version changed notices added somewhere):

    • NamedTemporaryFile
    • TemporaryFile
    • SpooledTemporaryFile
    • TemporaryDirectory

    @gpshead
    Copy link
    Member

    gpshead commented May 22, 2015

    updated, thanks for the great reviews.

    @python-dev
    Copy link
    Mannequin

    python-dev mannequin commented May 22, 2015

    New changeset 870899ce71f4 by Gregory P. Smith in branch 'default':
    bpo-24230: The tempfile module now accepts bytes for prefix, suffix and dir
    https://hg.python.org/cpython/rev/870899ce71f4

    @gpshead
    Copy link
    Member

    gpshead commented May 22, 2015

    that should take care of it. if you see any other issues with this post commit, either raise them here or fix them directly. the reviews were helpful.

    @gpshead gpshead closed this as completed May 22, 2015
    @durin42
    Copy link
    Mannequin Author

    durin42 mannequin commented May 23, 2015

    I just removed my hack that gave us unicode to hand to mkdtemp() and everything still passes with your new commit. Thanks much for the quick response!

    @vstinner
    Copy link
    Member

    I don't understand the rationale to add new functions using bytes filenames.

    IMHO it's a bad practice to pass bytes filenames to write portable code. On Windows, bytes lead to strange bugs: os.listdir(bytes) gives you filenames which don't exist when a filename cannot be encoded to the ANSI code page.

    On UNIX, os.fsencode() and os.fsdecode() allows you to convert filenames between unicode and bytes. It provides nice properties like os.fsencode(os.fsdecode(value))==value where value is bytes. No need to add new APIs, use existing APIs an use os.fsencode() and os.fsdecode() on top of them.

    Bytes filenames are deprecated on Windows since Python 3.2. The new os.scandir() function doesn't support bytes for example, it's a deliberate choice.

    Maybe I missed an important technical issue?

    @vstinner
    Copy link
    Member

    """
    Today we're doing something like this:

    tmpdir = tempfile.mkdtemp('', 'hgtests.', d and d.decode('utf-8')).encode('utf-8')
    """

    Don't do that. UTF-8 is not the right encoding.

    Use os.fsencode() and os.fsdecode(). Internally, Python uses sys.getfilesystemencoding() with 'surrogateescape' error handler (but 'strict' error handler on Windows).

    If you use UTF-8, you can get indirectly mojibake.

    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    stdlib Python modules in the Lib dir type-feature A feature request or enhancement
    Projects
    None yet
    Development

    No branches or pull requests

    5 participants