classification
Title: Don't accept bytearray as filenames part 2
Type: behavior Stage: resolved
Components: Interpreter Core Versions: Python 3.6
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: serhiy.storchaka Nosy List: Ronan.Lamy, brett.cannon, gvanrossum, larry, pitrou, pjenvey, python-dev, serhiy.storchaka, vstinner
Priority: normal Keywords: 3.3regression, patch

Created on 2016-04-18 21:34 by pjenvey, last changed 2016-08-14 06:30 by serhiy.storchaka. This issue is now closed.

Files
File name Uploaded Description Edit
path_converter-deprecate-buffer.patch serhiy.storchaka, 2016-06-10 21:00 review
Messages (18)
msg263694 - (view) Author: Philip Jenvey (pjenvey) * (Python committer) Date: 2016-04-18 21:34
Basically a reopen of the older issue8485 with the same name. It was decided there to drop support for bytearray filenames -- partly because of the complexity of handling buffers but it was also deemed to just not make much sense.

This regressed or crept back into the posix module with the big path_converter changes for 3.3:

https://hg.python.org/cpython/file/ee9921b29fd8/Lib/test/test_posix.py#l411

IMHO this functionality should be deprecated/removed per the original discussion, or does someone want to reopen the debate?

The os module docs (and path_converter's own docs) explicitly advertise handling of str or bytes, not bytearrays or buffers. Even os.fsencode rejects bytearrays/buffers.

Related to issue26754 -- further inconsistencies around filename handling
msg263805 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2016-04-20 07:33
In the light of issue8485 this looks as a bug. Thank you for investigating this Philip.

I already raised this issue on Python-Dev less than a week ago [1], and only Victor had answered, thus we can assume that the consensus about bytes-like paths is not changed.

The only question is whether we need the deprecation period before removing the support of non-bytes bytes-like paths. Since they are supported during two major releases, and there are even tests for this support, I think that the deprecation period is needed. Third-party code written in last 3.5 years can use this feature.

[1] http://comments.gmane.org/gmane.comp.python.devel/157296
msg263807 - (view) Author: Philip Jenvey (pjenvey) * (Python committer) Date: 2016-04-20 07:41
Thanks Serhiy, I did not see the python-dev thread. This coincidentally came up recently in pypy3.

+1 for some kind of deprecation period needed
msg263943 - (view) Author: Larry Hastings (larry) * (Python committer) Date: 2016-04-22 00:06
I did the path_converter change.  IIRC some functions supported bytearray, some did not, and in my quest for consistency I took the superset of functionality and supported bytearray for everything.

Sounds to me like bytearray support should be dropped, but ultimately I don't have a strong opinion.  I'm happy to sit back and let the more opinionated decide the matter.
msg263944 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2016-04-22 00:19
bytearray is designed for performance. I don't think that the code
creating a filename can be a bottleneck in an application.
msg264063 - (view) Author: Guido van Rossum (gvanrossum) * (Python committer) Date: 2016-04-23 17:21
I'm a little bit worried about removing support for bytearray in these cases. It used to be quite easy to support bytes and bytearray (just say you support the buffer API), and I even added language to PEP 484 suggesting that type checkers should consider bytearray (and memoryview) as valid arguments to anything that's declared as taking bytes. I know this is not universally true (a trivial example is a dict with bytes keys) but the existence of the buffer API makes it easy to do in most cases. Why are we moving away from this? If we do, we should probably remove that language from PEP 484 (and the code from mypy, but that's a separate thing).
msg264065 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2016-04-23 18:15
Side comment about "bytes-like" and "byte string". As for language from PEP 484, I think it is too permissive. bytes and bytearray have are "bytes strings" because they are not only sequences of bytes, but have a lot of str-like methods: lower(), split(), startswith(), strip(), etc. Many Python functions that work with "bytes strings" expect the support some of these methods. memoryview() has no these methods, it is even not always the sequence of bytes. Other objects that support the buffer protocol can even be not sequences. This is a problem when we want to support all objects with the buffer protocol in functions written in Python. We need to wrap them in memoryview and cast to the 'B' format. But in many cases the term "byte-like" means that bytes and bytearray are accepted. There are different levels of "byte-likelity", and unfortunately there is no good terminology.

As for moving away from accepting non-bytes paths, I think the arguments are similar to arguments about why Path is not str subclass. Or why we don't convert any path argument to string by calling str(). Because it can hide errors and cause unexpected behavior instead of exception. For example on Windows array('u', '扡摣晥') represents not Unicode name '扡摣晥', but bytes name b'abcdef'.
msg264113 - (view) Author: Guido van Rossum (gvanrossum) * (Python committer) Date: 2016-04-24 16:05
OK, I actually agree with you on memoryview and and other types that support the buffer view. I will update PEP 484 to exclude memoryview. But I think bytearray has a special role and purose, and as much as possible it needs to be accepted in places that support bytes. So IMO as long as the os module supports bytes at all, I see no reason not to support bytearray. (It even has the extra trailing \0 needed to pass it to a typical C function.)

So as long as we support bytes (and on Linux, and probably some other Unixoid systems, bytes are the *correct* type for paths) we should also support bytearray.

The issue that Path doesn't support bytes is separate. I think it's the right choice. But that doesn't mean e.g. os.listdir() shouldn't support bytes.
msg264118 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2016-04-24 18:22
The os.path module and os.fsdecode() don't support bytearray paths. Should we add bytearray support? This will complicate and slowdown os.path functions which often are used multiple times in tight loops.

Correspondingly, many Python implemented functions in the os module, such as makedirs() or walk(), doesn't support bytearray paths.

In 3.2 almost all functions used PyUnicode_FSConverter() and supported only str and bytes paths. Few functions accidentally supported other types (on Windows only!): chdir(), rmdir() and unlink() supported read-only buffers (i.e. bytearray was not supported), and rename() and symplink() supported any buffers, including bytearray. Now all these functions emit a warning since non-string paths are deprecated on Windows. It looks to me that the support of non-bytes paths was added by accident.
msg264124 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2016-04-24 19:30
I'm really sceptical that anyone really use bytearray for filenames in Python. I'm quite sure that most functions fail with bytearray. Do you have an example of application using bytearray? What's the point of using bytearray? To limit memory copy and optimize the memory usage?

The two most common functions for filenames are os.path.join() and open(). I just tried:

>>> os.path.join(bytearray(b"a"), b"b")
Traceback (most recent call last):
  ...
TypeError: startswith first arg must be bytes or a tuple of bytes, not str

>>> open(bytearray(b"/etc/issue"))
Traceback (most recent call last):
  ...
TypeError: invalid file: bytearray(b'/etc/issue')

Hum, so bytearray are not accepted for open() nor os.path.join(), right?
msg264136 - (view) Author: Guido van Rossum (gvanrossum) * (Python committer) Date: 2016-04-25 00:31
OK, it isn't worth fighting for. I will update PEP 484. It seems mypy
doesn't support the special treatment bytearray anyway.

For posterity, the reason I thought that bytearray should be accepted is
not that bytearray is better in any way, just that sometimes you might read
a filename off a bytearray that you read off a file or a socket. But it's
clearly not a strong use case (or we had gotten complaints long ago) so
I'll let it go. Sorry!! Thanks for the feedback.
msg264138 - (view) Author: Guido van Rossum (gvanrossum) * (Python committer) Date: 2016-04-25 00:53
Sigh, I was wrong, not only does mypy promote bytearray to bytes, it
actually depends on this in its own code. Not for filenames, but for a
regex match. It reads the source into a bytearray, and then passes that to
something that uses regex matches (which *do* support bytearrays, and
rightly so). If I remove the bytearray promotion I run into some type
errors, and if I fix those (without copying data) I run into more type
errors. I guess we could fix it by carefully marking up everything that
takes bytearrays in the stubs, but it's surely inconvenient. So maybe I'll
leave this in mypy for now.
msg264139 - (view) Author: Roundup Robot (python-dev) (Python triager) Date: 2016-04-25 00:55
New changeset 3fb0f9a0b195 by Guido van Rossum in branch 'default':
Rip out the promotion from bytearray/memoryview to bytes. See http://bugs.python.org/issue26800.
https://hg.python.org/peps/rev/3fb0f9a0b195
msg268156 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2016-06-10 21:00
Here is a patch that deprecates support of general bytes-like objects in os functions.
msg269731 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2016-07-02 20:06
Ping.
msg270516 - (view) Author: Brett Cannon (brett.cannon) * (Python committer) Date: 2016-07-15 21:20
Serhiy's patch LGTM.
msg272054 - (view) Author: Brett Cannon (brett.cannon) * (Python committer) Date: 2016-08-05 19:52
Assigning to Serhiy to apply since Larry doesn't care and I already reviewed the patch.
msg272106 - (view) Author: Roundup Robot (python-dev) (Python triager) Date: 2016-08-06 20:22
New changeset 1e01ca34a42c by Serhiy Storchaka in branch 'default':
Issue #26800: Undocumented support of general bytes-like objects
https://hg.python.org/cpython/rev/1e01ca34a42c
History
Date User Action Args
2016-08-14 06:30:18serhiy.storchakasetstatus: open -> closed
resolution: fixed
stage: commit review -> resolved
2016-08-06 20:22:53python-devsetmessages: + msg272106
2016-08-05 19:52:21brett.cannonsetassignee: larry -> serhiy.storchaka
messages: + msg272054
stage: patch review -> commit review
2016-07-15 21:20:23brett.cannonsetnosy: + brett.cannon
messages: + msg270516
2016-07-04 16:50:47serhiy.storchakalinkissue26027 dependencies
2016-07-02 20:06:24serhiy.storchakasetmessages: + msg269731
2016-06-18 11:45:35serhiy.storchakalinkissue26754 dependencies
2016-06-10 21:00:28serhiy.storchakasetfiles: + path_converter-deprecate-buffer.patch
keywords: + patch
messages: + msg268156

stage: needs patch -> patch review
2016-04-25 00:55:16python-devsetnosy: + python-dev
messages: + msg264139
2016-04-25 00:53:05gvanrossumsetmessages: + msg264138
2016-04-25 00:31:18gvanrossumsetmessages: + msg264136
2016-04-24 19:30:43vstinnersetmessages: + msg264124
2016-04-24 18:22:24serhiy.storchakasetmessages: + msg264118
2016-04-24 16:05:35gvanrossumsetmessages: + msg264113
2016-04-23 18:15:33serhiy.storchakasetmessages: + msg264065
2016-04-23 17:21:33gvanrossumsetmessages: + msg264063
2016-04-22 00:19:46vstinnersetmessages: + msg263944
2016-04-22 00:06:00larrysetmessages: + msg263943
2016-04-20 07:41:09pjenveysetmessages: + msg263807
2016-04-20 07:33:28serhiy.storchakasetnosy: + gvanrossum

messages: + msg263805
stage: needs patch
2016-04-18 21:34:55pjenveycreate