Don't accept bytearray as filenames part 2 #70987

pjenvey · 2016-04-18T21:34:55Z

BPO	26800
Nosy	@gvanrossum, @brettcannon, @pitrou, @vstinner, @larryhastings, @pjenvey, @serhiy-storchaka, @rlamy
Files	path_converter-deprecate-buffer.patch

^{Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.}

Show more details

GitHub fields:

assignee = 'https://github.com/serhiy-storchaka'
closed_at = <Date 2016-08-14.06:30:18.009>
created_at = <Date 2016-04-18.21:34:55.352>
labels = ['interpreter-core', 'type-bug']
title = "Don't accept bytearray as filenames part 2"
updated_at = <Date 2016-08-14.06:30:18.008>
user = 'https://github.com/pjenvey'

bugs.python.org fields:

activity = <Date 2016-08-14.06:30:18.008>
actor = 'serhiy.storchaka'
assignee = 'serhiy.storchaka'
closed = True
closed_date = <Date 2016-08-14.06:30:18.009>
closer = 'serhiy.storchaka'
components = ['Interpreter Core']
creation = <Date 2016-04-18.21:34:55.352>
creator = 'pjenvey'
dependencies = []
files = ['43336']
hgrepos = []
issue_num = 26800
keywords = ['patch', '3.3regression']
message_count = 18.0
messages = ['263694', '263805', '263807', '263943', '263944', '264063', '264065', '264113', '264118', '264124', '264136', '264138', '264139', '268156', '269731', '270516', '272054', '272106']
nosy_count = 9.0
nosy_names = ['gvanrossum', 'brett.cannon', 'pitrou', 'vstinner', 'larry', 'pjenvey', 'python-dev', 'serhiy.storchaka', 'Ronan.Lamy']
pr_nums = []
priority = 'normal'
resolution = 'fixed'
stage = 'resolved'
status = 'closed'
superseder = None
type = 'behavior'
url = 'https://bugs.python.org/issue26800'
versions = ['Python 3.6']

pjenvey · 2016-04-18T21:34:55Z

Basically a reopen of the older bpo-8485 with the same name. It was decided there to drop support for bytearray filenames -- partly because of the complexity of handling buffers but it was also deemed to just not make much sense.

This regressed or crept back into the posix module with the big path_converter changes for 3.3:

https://hg.python.org/cpython/file/ee9921b29fd8/Lib/test/test_posix.py#l411

IMHO this functionality should be deprecated/removed per the original discussion, or does someone want to reopen the debate?

The os module docs (and path_converter's own docs) explicitly advertise handling of str or bytes, not bytearrays or buffers. Even os.fsencode rejects bytearrays/buffers.

Related to bpo-26754 -- further inconsistencies around filename handling

serhiy-storchaka · 2016-04-20T07:33:28Z

In the light of bpo-8485 this looks as a bug. Thank you for investigating this Philip.

I already raised this issue on Python-Dev less than a week ago [1], and only Victor had answered, thus we can assume that the consensus about bytes-like paths is not changed.

The only question is whether we need the deprecation period before removing the support of non-bytes bytes-like paths. Since they are supported during two major releases, and there are even tests for this support, I think that the deprecation period is needed. Third-party code written in last 3.5 years can use this feature.

[1] http://comments.gmane.org/gmane.comp.python.devel/157296

pjenvey · 2016-04-20T07:41:09Z

Thanks Serhiy, I did not see the python-dev thread. This coincidentally came up recently in pypy3.

+1 for some kind of deprecation period needed

larryhastings · 2016-04-22T00:06:01Z

I did the path_converter change. IIRC some functions supported bytearray, some did not, and in my quest for consistency I took the superset of functionality and supported bytearray for everything.

Sounds to me like bytearray support should be dropped, but ultimately I don't have a strong opinion. I'm happy to sit back and let the more opinionated decide the matter.

vstinner · 2016-04-22T00:19:46Z

bytearray is designed for performance. I don't think that the code
creating a filename can be a bottleneck in an application.

gvanrossum · 2016-04-23T17:21:33Z

I'm a little bit worried about removing support for bytearray in these cases. It used to be quite easy to support bytes and bytearray (just say you support the buffer API), and I even added language to PEP-484 suggesting that type checkers should consider bytearray (and memoryview) as valid arguments to anything that's declared as taking bytes. I know this is not universally true (a trivial example is a dict with bytes keys) but the existence of the buffer API makes it easy to do in most cases. Why are we moving away from this? If we do, we should probably remove that language from PEP-484 (and the code from mypy, but that's a separate thing).

serhiy-storchaka · 2016-04-23T18:15:33Z

Side comment about "bytes-like" and "byte string". As for language from PEP-484, I think it is too permissive. bytes and bytearray have are "bytes strings" because they are not only sequences of bytes, but have a lot of str-like methods: lower(), split(), startswith(), strip(), etc. Many Python functions that work with "bytes strings" expect the support some of these methods. memoryview() has no these methods, it is even not always the sequence of bytes. Other objects that support the buffer protocol can even be not sequences. This is a problem when we want to support all objects with the buffer protocol in functions written in Python. We need to wrap them in memoryview and cast to the 'B' format. But in many cases the term "byte-like" means that bytes and bytearray are accepted. There are different levels of "byte-likelity", and unfortunately there is no good terminology.

As for moving away from accepting non-bytes paths, I think the arguments are similar to arguments about why Path is not str subclass. Or why we don't convert any path argument to string by calling str(). Because it can hide errors and cause unexpected behavior instead of exception. For example on Windows array('u', '扡摣晥') represents not Unicode name '扡摣晥', but bytes name b'abcdef'.

gvanrossum · 2016-04-24T16:05:35Z

OK, I actually agree with you on memoryview and and other types that support the buffer view. I will update PEP-484 to exclude memoryview. But I think bytearray has a special role and purose, and as much as possible it needs to be accepted in places that support bytes. So IMO as long as the os module supports bytes at all, I see no reason not to support bytearray. (It even has the extra trailing \0 needed to pass it to a typical C function.)

So as long as we support bytes (and on Linux, and probably some other Unixoid systems, bytes are the *correct* type for paths) we should also support bytearray.

The issue that Path doesn't support bytes is separate. I think it's the right choice. But that doesn't mean e.g. os.listdir() shouldn't support bytes.

serhiy-storchaka · 2016-04-24T18:22:25Z

The os.path module and os.fsdecode() don't support bytearray paths. Should we add bytearray support? This will complicate and slowdown os.path functions which often are used multiple times in tight loops.

Correspondingly, many Python implemented functions in the os module, such as makedirs() or walk(), doesn't support bytearray paths.

In 3.2 almost all functions used PyUnicode_FSConverter() and supported only str and bytes paths. Few functions accidentally supported other types (on Windows only!): chdir(), rmdir() and unlink() supported read-only buffers (i.e. bytearray was not supported), and rename() and symplink() supported any buffers, including bytearray. Now all these functions emit a warning since non-string paths are deprecated on Windows. It looks to me that the support of non-bytes paths was added by accident.

vstinner · 2016-04-24T19:30:43Z

I'm really sceptical that anyone really use bytearray for filenames in Python. I'm quite sure that most functions fail with bytearray. Do you have an example of application using bytearray? What's the point of using bytearray? To limit memory copy and optimize the memory usage?

The two most common functions for filenames are os.path.join() and open(). I just tried:

>>> os.path.join(bytearray(b"a"), b"b")
Traceback (most recent call last):
  ...
TypeError: startswith first arg must be bytes or a tuple of bytes, not str

>>> open(bytearray(b"/etc/issue"))
Traceback (most recent call last):
  ...
TypeError: invalid file: bytearray(b'/etc/issue')

Hum, so bytearray are not accepted for open() nor os.path.join(), right?

gvanrossum · 2016-04-25T00:31:18Z

OK, it isn't worth fighting for. I will update PEP-484. It seems mypy
doesn't support the special treatment bytearray anyway.

For posterity, the reason I thought that bytearray should be accepted is
not that bytearray is better in any way, just that sometimes you might read
a filename off a bytearray that you read off a file or a socket. But it's
clearly not a strong use case (or we had gotten complaints long ago) so
I'll let it go. Sorry!! Thanks for the feedback.

gvanrossum · 2016-04-25T00:53:05Z

Sigh, I was wrong, not only does mypy promote bytearray to bytes, it
actually depends on this in its own code. Not for filenames, but for a
regex match. It reads the source into a bytearray, and then passes that to
something that uses regex matches (which *do* support bytearrays, and
rightly so). If I remove the bytearray promotion I run into some type
errors, and if I fix those (without copying data) I run into more type
errors. I guess we could fix it by carefully marking up everything that
takes bytearrays in the stubs, but it's surely inconvenient. So maybe I'll
leave this in mypy for now.

python-dev · 2016-04-25T00:55:16Z

New changeset 3fb0f9a0b195 by Guido van Rossum in branch 'default':
Rip out the promotion from bytearray/memoryview to bytes. See http://bugs.python.org/issue26800.
https://hg.python.org/peps/rev/3fb0f9a0b195

serhiy-storchaka · 2016-06-10T21:00:28Z

Here is a patch that deprecates support of general bytes-like objects in os functions.

serhiy-storchaka · 2016-07-02T20:06:25Z

Ping.

brettcannon · 2016-07-15T21:20:23Z

Serhiy's patch LGTM.

brettcannon · 2016-08-05T19:52:22Z

Assigning to Serhiy to apply since Larry doesn't care and I already reviewed the patch.

python-dev · 2016-08-06T20:22:53Z

New changeset 1e01ca34a42c by Serhiy Storchaka in branch 'default':
Issue bpo-26800: Undocumented support of general bytes-like objects
https://hg.python.org/cpython/rev/1e01ca34a42c

vstinner · 2022-10-18T13:42:50Z

Follow-up issue: #98393 "os module: reject bytes-like strings for paths, only accept the exact bytes type".

pjenvey assigned larryhastings Apr 18, 2016

pjenvey added interpreter-core (Objects, Python, Grammar, and Parser dirs) type-bug An unexpected behavior, bug, or error labels Apr 18, 2016

brettcannon assigned serhiy-storchaka and unassigned larryhastings Aug 5, 2016

serhiy-storchaka closed this as completed Aug 14, 2016

ezio-melotti transferred this issue from another repository Apr 10, 2022

vstinner mentioned this issue Oct 18, 2022

os module: reject bytes-like strings for paths, only accept the exact bytes type #98393

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Don't accept bytearray as filenames part 2 #70987

Don't accept bytearray as filenames part 2 #70987

pjenvey commented Apr 18, 2016

pjenvey commented Apr 18, 2016

serhiy-storchaka commented Apr 20, 2016

pjenvey commented Apr 20, 2016

larryhastings commented Apr 22, 2016

vstinner commented Apr 22, 2016

gvanrossum commented Apr 23, 2016

serhiy-storchaka commented Apr 23, 2016

gvanrossum commented Apr 24, 2016

serhiy-storchaka commented Apr 24, 2016

vstinner commented Apr 24, 2016

gvanrossum commented Apr 25, 2016

gvanrossum commented Apr 25, 2016

python-dev mannequin commented Apr 25, 2016

serhiy-storchaka commented Jun 10, 2016

serhiy-storchaka commented Jul 2, 2016

brettcannon commented Jul 15, 2016

brettcannon commented Aug 5, 2016

python-dev mannequin commented Aug 6, 2016

vstinner commented Oct 18, 2022

Don't accept bytearray as filenames part 2 #70987

Don't accept bytearray as filenames part 2 #70987

Comments

pjenvey commented Apr 18, 2016

pjenvey commented Apr 18, 2016

serhiy-storchaka commented Apr 20, 2016

pjenvey commented Apr 20, 2016

larryhastings commented Apr 22, 2016

vstinner commented Apr 22, 2016

gvanrossum commented Apr 23, 2016

serhiy-storchaka commented Apr 23, 2016

gvanrossum commented Apr 24, 2016

serhiy-storchaka commented Apr 24, 2016

vstinner commented Apr 24, 2016

gvanrossum commented Apr 25, 2016

gvanrossum commented Apr 25, 2016

python-dev mannequin commented Apr 25, 2016

serhiy-storchaka commented Jun 10, 2016

serhiy-storchaka commented Jul 2, 2016

brettcannon commented Jul 15, 2016

brettcannon commented Aug 5, 2016

python-dev mannequin commented Aug 6, 2016

vstinner commented Oct 18, 2022