Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Don't accept bytearray as filenames part 2 #70987

Closed
pjenvey opened this issue Apr 18, 2016 · 19 comments
Closed

Don't accept bytearray as filenames part 2 #70987

pjenvey opened this issue Apr 18, 2016 · 19 comments
Assignees
Labels
interpreter-core (Objects, Python, Grammar, and Parser dirs) type-bug An unexpected behavior, bug, or error

Comments

@pjenvey
Copy link
Member

pjenvey commented Apr 18, 2016

BPO 26800
Nosy @gvanrossum, @brettcannon, @pitrou, @vstinner, @larryhastings, @pjenvey, @serhiy-storchaka, @rlamy
Files
  • path_converter-deprecate-buffer.patch
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = 'https://github.com/serhiy-storchaka'
    closed_at = <Date 2016-08-14.06:30:18.009>
    created_at = <Date 2016-04-18.21:34:55.352>
    labels = ['interpreter-core', 'type-bug']
    title = "Don't accept bytearray as filenames part 2"
    updated_at = <Date 2016-08-14.06:30:18.008>
    user = 'https://github.com/pjenvey'

    bugs.python.org fields:

    activity = <Date 2016-08-14.06:30:18.008>
    actor = 'serhiy.storchaka'
    assignee = 'serhiy.storchaka'
    closed = True
    closed_date = <Date 2016-08-14.06:30:18.009>
    closer = 'serhiy.storchaka'
    components = ['Interpreter Core']
    creation = <Date 2016-04-18.21:34:55.352>
    creator = 'pjenvey'
    dependencies = []
    files = ['43336']
    hgrepos = []
    issue_num = 26800
    keywords = ['patch', '3.3regression']
    message_count = 18.0
    messages = ['263694', '263805', '263807', '263943', '263944', '264063', '264065', '264113', '264118', '264124', '264136', '264138', '264139', '268156', '269731', '270516', '272054', '272106']
    nosy_count = 9.0
    nosy_names = ['gvanrossum', 'brett.cannon', 'pitrou', 'vstinner', 'larry', 'pjenvey', 'python-dev', 'serhiy.storchaka', 'Ronan.Lamy']
    pr_nums = []
    priority = 'normal'
    resolution = 'fixed'
    stage = 'resolved'
    status = 'closed'
    superseder = None
    type = 'behavior'
    url = 'https://bugs.python.org/issue26800'
    versions = ['Python 3.6']

    @pjenvey
    Copy link
    Member Author

    pjenvey commented Apr 18, 2016

    Basically a reopen of the older bpo-8485 with the same name. It was decided there to drop support for bytearray filenames -- partly because of the complexity of handling buffers but it was also deemed to just not make much sense.

    This regressed or crept back into the posix module with the big path_converter changes for 3.3:

    https://hg.python.org/cpython/file/ee9921b29fd8/Lib/test/test_posix.py#l411

    IMHO this functionality should be deprecated/removed per the original discussion, or does someone want to reopen the debate?

    The os module docs (and path_converter's own docs) explicitly advertise handling of str or bytes, not bytearrays or buffers. Even os.fsencode rejects bytearrays/buffers.

    Related to bpo-26754 -- further inconsistencies around filename handling

    @pjenvey pjenvey added interpreter-core (Objects, Python, Grammar, and Parser dirs) type-bug An unexpected behavior, bug, or error labels Apr 18, 2016
    @serhiy-storchaka
    Copy link
    Member

    In the light of bpo-8485 this looks as a bug. Thank you for investigating this Philip.

    I already raised this issue on Python-Dev less than a week ago [1], and only Victor had answered, thus we can assume that the consensus about bytes-like paths is not changed.

    The only question is whether we need the deprecation period before removing the support of non-bytes bytes-like paths. Since they are supported during two major releases, and there are even tests for this support, I think that the deprecation period is needed. Third-party code written in last 3.5 years can use this feature.

    [1] http://comments.gmane.org/gmane.comp.python.devel/157296

    @pjenvey
    Copy link
    Member Author

    pjenvey commented Apr 20, 2016

    Thanks Serhiy, I did not see the python-dev thread. This coincidentally came up recently in pypy3.

    +1 for some kind of deprecation period needed

    @larryhastings
    Copy link
    Contributor

    I did the path_converter change. IIRC some functions supported bytearray, some did not, and in my quest for consistency I took the superset of functionality and supported bytearray for everything.

    Sounds to me like bytearray support should be dropped, but ultimately I don't have a strong opinion. I'm happy to sit back and let the more opinionated decide the matter.

    @vstinner
    Copy link
    Member

    bytearray is designed for performance. I don't think that the code
    creating a filename can be a bottleneck in an application.

    @gvanrossum
    Copy link
    Member

    I'm a little bit worried about removing support for bytearray in these cases. It used to be quite easy to support bytes and bytearray (just say you support the buffer API), and I even added language to PEP-484 suggesting that type checkers should consider bytearray (and memoryview) as valid arguments to anything that's declared as taking bytes. I know this is not universally true (a trivial example is a dict with bytes keys) but the existence of the buffer API makes it easy to do in most cases. Why are we moving away from this? If we do, we should probably remove that language from PEP-484 (and the code from mypy, but that's a separate thing).

    @serhiy-storchaka
    Copy link
    Member

    Side comment about "bytes-like" and "byte string". As for language from PEP-484, I think it is too permissive. bytes and bytearray have are "bytes strings" because they are not only sequences of bytes, but have a lot of str-like methods: lower(), split(), startswith(), strip(), etc. Many Python functions that work with "bytes strings" expect the support some of these methods. memoryview() has no these methods, it is even not always the sequence of bytes. Other objects that support the buffer protocol can even be not sequences. This is a problem when we want to support all objects with the buffer protocol in functions written in Python. We need to wrap them in memoryview and cast to the 'B' format. But in many cases the term "byte-like" means that bytes and bytearray are accepted. There are different levels of "byte-likelity", and unfortunately there is no good terminology.

    As for moving away from accepting non-bytes paths, I think the arguments are similar to arguments about why Path is not str subclass. Or why we don't convert any path argument to string by calling str(). Because it can hide errors and cause unexpected behavior instead of exception. For example on Windows array('u', '扡摣晥') represents not Unicode name '扡摣晥', but bytes name b'abcdef'.

    @gvanrossum
    Copy link
    Member

    OK, I actually agree with you on memoryview and and other types that support the buffer view. I will update PEP-484 to exclude memoryview. But I think bytearray has a special role and purose, and as much as possible it needs to be accepted in places that support bytes. So IMO as long as the os module supports bytes at all, I see no reason not to support bytearray. (It even has the extra trailing \0 needed to pass it to a typical C function.)

    So as long as we support bytes (and on Linux, and probably some other Unixoid systems, bytes are the *correct* type for paths) we should also support bytearray.

    The issue that Path doesn't support bytes is separate. I think it's the right choice. But that doesn't mean e.g. os.listdir() shouldn't support bytes.

    @serhiy-storchaka
    Copy link
    Member

    The os.path module and os.fsdecode() don't support bytearray paths. Should we add bytearray support? This will complicate and slowdown os.path functions which often are used multiple times in tight loops.

    Correspondingly, many Python implemented functions in the os module, such as makedirs() or walk(), doesn't support bytearray paths.

    In 3.2 almost all functions used PyUnicode_FSConverter() and supported only str and bytes paths. Few functions accidentally supported other types (on Windows only!): chdir(), rmdir() and unlink() supported read-only buffers (i.e. bytearray was not supported), and rename() and symplink() supported any buffers, including bytearray. Now all these functions emit a warning since non-string paths are deprecated on Windows. It looks to me that the support of non-bytes paths was added by accident.

    @vstinner
    Copy link
    Member

    I'm really sceptical that anyone really use bytearray for filenames in Python. I'm quite sure that most functions fail with bytearray. Do you have an example of application using bytearray? What's the point of using bytearray? To limit memory copy and optimize the memory usage?

    The two most common functions for filenames are os.path.join() and open(). I just tried:

    >>> os.path.join(bytearray(b"a"), b"b")
    Traceback (most recent call last):
      ...
    TypeError: startswith first arg must be bytes or a tuple of bytes, not str
    
    >>> open(bytearray(b"/etc/issue"))
    Traceback (most recent call last):
      ...
    TypeError: invalid file: bytearray(b'/etc/issue')

    Hum, so bytearray are not accepted for open() nor os.path.join(), right?

    @gvanrossum
    Copy link
    Member

    OK, it isn't worth fighting for. I will update PEP-484. It seems mypy
    doesn't support the special treatment bytearray anyway.

    For posterity, the reason I thought that bytearray should be accepted is
    not that bytearray is better in any way, just that sometimes you might read
    a filename off a bytearray that you read off a file or a socket. But it's
    clearly not a strong use case (or we had gotten complaints long ago) so
    I'll let it go. Sorry!! Thanks for the feedback.

    @gvanrossum
    Copy link
    Member

    Sigh, I was wrong, not only does mypy promote bytearray to bytes, it
    actually depends on this in its own code. Not for filenames, but for a
    regex match. It reads the source into a bytearray, and then passes that to
    something that uses regex matches (which *do* support bytearrays, and
    rightly so). If I remove the bytearray promotion I run into some type
    errors, and if I fix those (without copying data) I run into more type
    errors. I guess we could fix it by carefully marking up everything that
    takes bytearrays in the stubs, but it's surely inconvenient. So maybe I'll
    leave this in mypy for now.

    @python-dev
    Copy link
    Mannequin

    python-dev mannequin commented Apr 25, 2016

    New changeset 3fb0f9a0b195 by Guido van Rossum in branch 'default':
    Rip out the promotion from bytearray/memoryview to bytes. See http://bugs.python.org/issue26800.
    https://hg.python.org/peps/rev/3fb0f9a0b195

    @serhiy-storchaka
    Copy link
    Member

    Here is a patch that deprecates support of general bytes-like objects in os functions.

    @serhiy-storchaka
    Copy link
    Member

    Ping.

    @brettcannon
    Copy link
    Member

    Serhiy's patch LGTM.

    @brettcannon
    Copy link
    Member

    Assigning to Serhiy to apply since Larry doesn't care and I already reviewed the patch.

    @python-dev
    Copy link
    Mannequin

    python-dev mannequin commented Aug 6, 2016

    New changeset 1e01ca34a42c by Serhiy Storchaka in branch 'default':
    Issue bpo-26800: Undocumented support of general bytes-like objects
    https://hg.python.org/cpython/rev/1e01ca34a42c

    @vstinner
    Copy link
    Member

    Follow-up issue: #98393 "os module: reject bytes-like strings for paths, only accept the exact bytes type".

    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    interpreter-core (Objects, Python, Grammar, and Parser dirs) type-bug An unexpected behavior, bug, or error
    Projects
    None yet
    Development

    No branches or pull requests

    6 participants