Don't accept bytearray as filenames, or simplify the API #52731

vstinner · 2010-04-21T12:23:36Z

BPO	8485
Nosy	@malemburg, @loewis, @pitrou, @vstinner
Files	no_bytearray_filename.patch

^{Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.}

Show more details

GitHub fields:

assignee = None
closed_at = <Date 2010-04-22.12:11:16.769>
created_at = <Date 2010-04-21.12:23:35.733>
labels = ['interpreter-core', 'library', 'expert-unicode']
title = "Don't accept bytearray as filenames, or simplify the API"
updated_at = <Date 2010-04-22.12:11:16.767>
user = 'https://github.com/vstinner'

bugs.python.org fields:

activity = <Date 2010-04-22.12:11:16.767>
actor = 'vstinner'
assignee = 'none'
closed = True
closed_date = <Date 2010-04-22.12:11:16.769>
closer = 'vstinner'
components = ['Interpreter Core', 'Library (Lib)', 'Unicode']
creation = <Date 2010-04-21.12:23:35.733>
creator = 'vstinner'
dependencies = []
files = ['17024']
hgrepos = []
issue_num = 8485
keywords = ['patch']
message_count = 6.0
messages = ['103829', '103832', '103833', '103883', '103910', '103958']
nosy_count = 4.0
nosy_names = ['lemburg', 'loewis', 'pitrou', 'vstinner']
pr_nums = []
priority = 'normal'
resolution = 'fixed'
stage = None
status = 'closed'
superseder = None
type = None
url = 'https://bugs.python.org/issue8485'
versions = ['Python 3.2']

vstinner · 2010-04-21T12:23:34Z

r72313 (PEP-383) created the PyUnicode_FSConverter() function: encode an object to a byte string using the default file system encoding. PyBytes and PyByteArray are leaved unchanged (just increment the reference counter), PyUnicode is encoded to PyBytes (if the encoder produces something else, an error is raised).

In my opinion, a file name is a character string (Windows) or a byte string (POSIX) and a bytearray object is unexpected. Only few function support this type: no function of os.path accept them. In the Python test suite, no function use bytearray for filenames.

It's already complex to support 2 types (bytes and str) for filenames in os.path, I think that a third type is too much and has no real world use case (the module manipuling filenames is os.path and it doesn't support bytearray and nobody noticed that).

Suppport bytearray is complex because we need to acquire a lock (using PyObject_GetBuffer()) and release the lock (o->ob_type->tp_as_buffer->bf_releasebuffer(o, 0)... that's not really intuitive...). posixmodule.c uses functions bytes2str() and realease_bytes() to handle bytearray easily. But these functions are static and other modules have to reimplement them.

I propose the reject bytearray in PyUnicode_FSConverter(), or to simplify the API and fix Python to accept bytearray filename everywhere especially in os.path.

Reject bytearray in PyUnicode_FSConverter() is trivial only there is only one test that have to be changed in the whole Python3 test suite: test_empty_bytearray in test_bytes.py. This function shows that bytearray is complex and introduce subtle bugs (the lock/GIL issue). All code using PyUnicode_FSConverter() would become simpler.

Example:

/* Release the lock, decref the object. */
static void
release_bytes(PyObject* o)
{
        if (PyByteArray_Check(o))
                o->ob_type->tp_as_buffer->bf_releasebuffer(o, 0);
        Py_DECREF(o);
}
...
relaease_byte(path);

becomes "Py_DECREF(path);" and release_bytes() can be removed.

And

static char*
bytes2str(PyObject* o, int lock)
{
        if(PyBytes_Check(o))
                return PyBytes_AsString(o);
        else if(PyByteArray_Check(o)) {
                if (lock && PyObject_GetBuffer(o, NULL, 0) < 0)
                        /* On a bytearray, this should not fail. */
                        PyErr_BadInternalCall();
                return PyByteArray_AsString(o);
        } else {
                /* The FS converter should have verified that this
                   is either bytes or bytearray. */
                Py_FatalError("bad object passed to bytes2str");
                /* not reached. */
                return "";
        }
}

...
path = bytes2str(opath);

becomes "path = PyBytes_AS_STRING(opath);" (or maybe "path = PyBytes_AsString(opath);" if you don't trust PyUnicode_FSConverter) and bytes2str() can be removed.

Simplify the API means that bytes2str() and release_bytes() should become protected functions (not static, but prefixed by "_Py_") or maybe public ("Py_" prefix).

But the most complex part is to modify os.path to support bytearray, and this task would not be funny :-)

vstinner · 2010-04-21T12:28:16Z

Patch removing bytearray filename support: it mostly removes code.

pitrou · 2010-04-21T12:28:56Z

Or perhaps the bytearray can be converted to a bytes object. This is not optimal performance-wise but is unlikely to make a difference in real-world code (if you are passing a filename to an external API, chances are some IO will occur which will dwarf the cost of creating a separate bytes object).

But I agree that supporting bytearrays in filename-taking functions, while "nice" from a consistency point of view, isn't really useful in practice. So I would be ok to remove that support if it simplifies (or avoids complexifying) the logic for those functions.

loewis · 2010-04-21T18:43:18Z

I'm in favor of removing that support, too, but please check with python-dev whether anybody thinks a proper deprecation cycle (deprecated in 3.2, removed in 3.3) is needed.

malemburg · 2010-04-21T20:44:54Z

Antoine Pitrou wrote:

Antoine Pitrou <pitrou@free.fr> added the comment:

Or perhaps the bytearray can be converted to a bytes object. This is not optimal performance-wise but is unlikely to make a difference in real-world code (if you are passing a filename to an external API, chances are some IO will occur which will dwarf the cost of creating a separate bytes object).

But I agree that supporting bytearrays in filename-taking functions, while "nice" from a consistency point of view, isn't really useful in practice. So I would be ok to remove that support if it simplifies (or avoids complexifying) the logic for those functions.

+1

bytearrays are basically the remains of the attempt to use mutable
byte string objects in Python 3.x. They may gain some usefulness
in the future, but I doubt that this will be in the area of filenames.

vstinner · 2010-04-22T12:11:15Z

MvL, MaL and Antoine Pitrou agreed to drop support of bytearray filenames in Python. I choosed to remove it directly in Python 3.2 because it wasn't really used, open() and os.path never supported bytearray filenames, and I will simplify the other issues related to surrogates in filenames.

Fixed: r80361 (py3k), don't backport to 3.1.

vstinner added interpreter-core (Objects, Python, Grammar, and Parser dirs) stdlib Python modules in the Lib dir topic-unicode labels Apr 21, 2010

vstinner closed this as completed Apr 22, 2010

ezio-melotti transferred this issue from another repository Apr 10, 2022

vstinner mentioned this issue Oct 18, 2022

os module: reject bytes-like strings for paths, only accept the exact bytes type #98393

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Don't accept bytearray as filenames, or simplify the API #52731

Don't accept bytearray as filenames, or simplify the API #52731

vstinner commented Apr 21, 2010

vstinner commented Apr 21, 2010 •

edited

vstinner commented Apr 21, 2010

pitrou commented Apr 21, 2010

loewis mannequin commented Apr 21, 2010

malemburg commented Apr 21, 2010

vstinner commented Apr 22, 2010

Don't accept bytearray as filenames, or simplify the API #52731

Don't accept bytearray as filenames, or simplify the API #52731

Comments

vstinner commented Apr 21, 2010

vstinner commented Apr 21, 2010 • edited

vstinner commented Apr 21, 2010

pitrou commented Apr 21, 2010

loewis mannequin commented Apr 21, 2010

malemburg commented Apr 21, 2010

vstinner commented Apr 22, 2010

vstinner commented Apr 21, 2010 •

edited