This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: PyByteArray_AS_STRING used unsafely
Type: crash Stage: resolved
Components: Interpreter Core, Library (Lib) Versions: Python 3.1, Python 3.2
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: Nosy List: benjamin.peterson, georg.brandl, loewis, mark.dickinson, pitrou, sh, skrah
Priority: release blocker Keywords: patch

Created on 2009-12-22 07:31 by sh, last changed 2022-04-11 14:56 by admin. This issue is now closed.

Files
File name Uploaded Description Edit
bytearray.patch pitrou, 2010-01-17 11:59
Messages (20)
msg96795 - (view) Author: Sebastian Hagen (sh) Date: 2009-12-22 07:31
Various functions in the 'posix' module that take filename arguments
accept bytearray values for those arguments, and mishandle those objects
in a way that leads to segfaults.

Python 3.1 (r31:73572, Jul 23 2009, 23:41:26)
[GCC 4.3.3] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import os
>>> os.mkdir(bytearray(b'/'))
Segmentation fault

There's at least two seperate problems with the way posixmodule handles
these objects. The first is that the code isn't set up to handle NULL
retvals from PyByteArray_AS_STRING(), which occur for zero-byte
bytearray objects. This causes a NULL-pointer dereference in
PyUnicode_FSConverter() if you pass a zero-length bytearray.

The second issue is that release_bytes() calls bytearray_releasebuffer()
with NULL for the first argument, which directly leads to a NULL-pointer
dereference.

I'm attaching a patch against SVN 77001 which should fix both of these
issues.
msg96799 - (view) Author: Stefan Krah (skrah) * (Python committer) Date: 2009-12-22 16:48
Crash confirmed. I don't see any issue with bytes2str(), the second part
of your patch is what fixes the problem. I attach the reduced patch so
it's clear what I mean.
msg96800 - (view) Author: Sebastian Hagen (sh) Date: 2009-12-22 17:19
Not exactly. The last part fixes the second problem, which you get for
non-zero-length bytearrays. But without the first fix, zero-length
bytearrays still lead to a crash:

Python 3.2a0 (py3k:77001M, Dec 22 2009, 18:17:08)
[GCC 4.3.4] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import posix
>>> posix.mkdir(bytearray(0))
Segmentation fault

That's what the rest of the patch fixes.
msg96801 - (view) Author: Stefan Krah (skrah) * (Python committer) Date: 2009-12-22 17:29
Correction: You are of course right about PyByteArray_AS_STRING(), but
with

os.mkdir(bytearray(b''))

I get the segfault in PyUnicodeUCS2_FSConverter():


(gdb) n
1638             size = PyByteArray_GET_SIZE(output);
(gdb) n
1639             data = PyByteArray_AS_STRING(output);
(gdb) p size
$2 = 0
(gdb) n
1641        if (size != strlen(data)) {
(gdb) p data
$3 = (void *) 0x0


Should perhaps PyByteArray_AS_STRING() be fixed?
msg96802 - (view) Author: Sebastian Hagen (sh) Date: 2009-12-22 17:33
You're correct about PyUnicode_FSConverter(), which is why the very
first part of my patch fixes that function. Only fixing that one will
get rid of the segfaults, but also lead to incorrect error reporting for
the zero-length bytearray case; the bytes2str() modification is to get
the right exceptions.

I don't know which precise semantics PyByteArray_AS_STRING() is
*supposed* to have. I assumed it returning NULL was normal for
0-byte-length arrays, and based my patch off of that.
msg96803 - (view) Author: Sebastian Hagen (sh) Date: 2009-12-22 17:49
Correction: "Only fixing that one will
get rid of the segfaults" ... well, for mkdir() on GNU/Linux, anyway.
POSIX.1-2008 doesn't specify what happens if you call mkdir() with a
NULL pointer, so I guess other conforming implementations might in fact
still segfault at that point - it just happens that the one I tested it
on is too nice to do that.

Either way, passing a NULL pointer to those functions is almost
certainly not a good idea.
msg96804 - (view) Author: Stefan Krah (skrah) * (Python committer) Date: 2009-12-22 17:54
Sorry that I missed the first part of your patch. I don't know exactly
what PyByteArray_AS_STRING() is meant to do either, but I think it would
make sense to return an empty string. This here works:

>>> bytes(bytearray(b''))
b''
msg96805 - (view) Author: Sebastian Hagen (sh) Date: 2009-12-22 18:20
I've glanced at some of the other PyByteArray_AS_STRING() (and
PyByteArray_AsStr(), which inherits this behaviour) uses in the stdlib.
By far the heaviest user is bytearrayobject.c; aside from that, there's
by my count only 24 uses in current trunk. I haven't looked at all of
them in detail, but the ones I have looked at all seem to ensure that
the possible NULL retvals don't cause them problems.

Given that, and considering that bytearray itself uses it for all kinds
of operations, I'd be rather reluctant to add any additional overhead to
this macro absent some authoritative statement that the current
behaviour is bad. We'd definitely get better performance by just having
posixmodule.c pay attention to the retval it gets. [Yes, this is
probably premature optimization; but it's not as if fixing posixmodule.c
takes any massive changes either, so I'm not too worried about
additional code complexity in this particular case.]
msg96806 - (view) Author: Martin v. Löwis (loewis) * (Python committer) Date: 2009-12-22 18:42
Stepping back from the issue, I wonder whether posixmodule needs to
accept bytearray objects at all.

IIRC, the rationale was to allow access to file names that are
unencodable, an issue that was mitigated by PEP 383. But even if
byte-oriented file names stay supported, it is unclear why bytearray
needs to be supported (as opposed to just supporting bytes).

I suggest to bring up the issue on python-dev.
msg96808 - (view) Author: Sebastian Hagen (sh) Date: 2009-12-22 19:23
Well, it doesn't *need* to accept them ... but it would certainly be
nice to have. If you've already got the filename in a bytearray object
for some reason, being able to pass it through directly saves you both a
copy and the explicit conversion code, which is a double-win.

From an interface POV, it'd be even better if memoryview was allowed,
too ... is there a specific reason that it's not? If one kind of simple
readable buffers work, I don't see any good reason not to support all
such objects.
msg96810 - (view) Author: Sebastian Hagen (sh) Date: 2009-12-22 19:27
Oh, and *forcing* use of the PEP 383 hack for such interfaces would
really be the Wrong Thing. Byte sequences are the natural (and most
efficient, and least prone to misunderstandings) way to store filenames
on a posix-like. Storing them as unicode-except-not-really is an
acceptable hack for interfaces that need to standardize on strings for
some reasons, but that really doesn't apply to these functions, and I'd
always store such filenames as bytes if I know I'm running on a posix-like.
msg96811 - (view) Author: Martin v. Löwis (loewis) * (Python committer) Date: 2009-12-22 19:47
I doubt that buffers (including bytearray) are as useful in Python for
storing file names as they are in C.

Typically, when creating a buffer, you allocate some maximum size (in C:
MAXPATH or some such), and then you fill the buffer, and null-terminate.
In Python, you can't null-terminate; instead, the buffer must have the
right size already. So I wonder: where would you get a bytearray from
that has exactly the right size for a file name that you want to access
(and where do you get the file name from)?

If it is IAGNI, then: practicality beats purity - just because we
*could* support arbitrary byte areas, it doesn't mean we should.
msg96813 - (view) Author: Martin v. Löwis (loewis) * (Python committer) Date: 2009-12-22 19:48
sh: I don't argue against supporting bytes (here), only against
supporting bytearray.
msg96838 - (view) Author: Stefan Krah (skrah) * (Python committer) Date: 2009-12-23 11:57
I briefly looked at how PyByteArray_AS_STRING() is used in other places.
A similar segfault can be provoked in long_new():

int(bytearray(b''), 10)


Then, there are a couple of places where pointer arithmetic is used with
the NULL pointer. Also, memcpy(x, NULL, 0) can occur. If I'm not
mistaken, both of these work in practice but are strictly speaking
undefined behavior.
msg97001 - (view) Author: Mark Dickinson (mark.dickinson) * (Python committer) Date: 2009-12-29 20:40
Having the ob_bytes field be NULL for a bytearray seems like a strange 
choice;  does anyone know why bytearray was designed this way?  Is it 
mainly for ease of combination with the stringlib library?

There does seem to be an awful lot of code that acts as though it 
expects x->ob_bytes to always be non-NULL for a valid PyByteArrayObject* 
x.  IMO, the use of pointer arithmetic with NULL pointers that Stefan 
pointed out should definitely be fixed.

The long_new problem looks like a trivial fix:  I'd fix it already, but 
given the number of other problems with this macro I'm not sure whether 
it should be fixed in Objects/longobject.c, or in the 
PyByteArray_AS_STRING macro itself.

I guess this is up to Georg and Benjamin, but I don't think 3.2 or 3.1.2 
should go out of the door with this bug present.
msg97029 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2009-12-30 12:27
IMO bytearray should be fixed, for everyone else's sanity.
msg97938 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2010-01-17 11:59
Here is a patch. A static empty string is used in case PyByteArray_AS_STRING() is asked on an empty bytearray, instead of returning NULL.

Another possibility would be to implement tp_new for bytearray and always allocate a new 1-byte string, but this would make the patch slightly more complicated.

(the patch also removes "nullstring", which was unused since Florent's recent patches)
msg97939 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2010-01-17 12:20
The posixmodule was committed with a test in r77571 (py3k) and r77572 (3.1).
msg97940 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2010-01-17 12:21
(I meant the posixmodule fix, of course)
msg97942 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2010-01-17 12:44
The bytearray fix has been committed in r77573 (trunk), r77574 (2.6), r77576 (py3k), r77577 (3.1). The issue can now be closed.
History
Date User Action Args
2022-04-11 14:56:55adminsetgithub: 51810
2010-01-17 12:44:17pitrousetstatus: open -> closed
resolution: fixed
messages: + msg97942

stage: test needed -> resolved
2010-01-17 12:21:21pitrousetmessages: + msg97940
2010-01-17 12:21:06pitrousetfiles: - release_bytes.diff
2010-01-17 12:21:02pitrousetfiles: - posixmodule_fn_bytearray_fix_01.patch
2010-01-17 12:20:49pitrousetmessages: + msg97939
2010-01-17 12:00:02pitrousetfiles: + bytearray.patch

messages: + msg97938
2009-12-30 12:27:02pitrousetnosy: + pitrou, benjamin.peterson, georg.brandl
messages: + msg97029
2009-12-29 20:40:48mark.dickinsonsetpriority: critical -> release blocker

messages: + msg97001
2009-12-24 22:37:51mark.dickinsonsetnosy: + mark.dickinson
2009-12-23 13:22:51r.david.murraysetpriority: high -> critical
nosy: loewis, skrah, sh
components: + Interpreter Core
title: Filename-taking functions in posix segfault when called with a bytearray arg. -> PyByteArray_AS_STRING used unsafely
2009-12-23 11:57:23skrahsetmessages: + msg96838
2009-12-22 19:48:48loewissetmessages: - msg96814
2009-12-22 19:48:40loewissetmessages: + msg96814
2009-12-22 19:48:37loewissetmessages: + msg96813
2009-12-22 19:47:27loewissetmessages: + msg96811
2009-12-22 19:27:24shsetmessages: + msg96810
2009-12-22 19:23:25shsetmessages: + msg96808
2009-12-22 18:42:30loewissetnosy: + loewis
messages: + msg96806
2009-12-22 18:20:40shsetmessages: + msg96805
2009-12-22 17:54:23skrahsetmessages: + msg96804
2009-12-22 17:49:22shsetmessages: + msg96803
2009-12-22 17:33:35shsetmessages: + msg96802
2009-12-22 17:29:09skrahsetmessages: + msg96801
2009-12-22 17:19:16shsetmessages: + msg96800
2009-12-22 16:48:07skrahsetfiles: + release_bytes.diff
nosy: + skrah
messages: + msg96799

2009-12-22 13:06:04r.david.murraysetpriority: high
stage: test needed
2009-12-22 07:31:42shcreate