Issue 7561: PyByteArray_AS_STRING used unsafely

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

This issue has been migrated to GitHub: https://github.com/python/cpython/issues/51810

classification

Title:	PyByteArray_AS_STRING used unsafely
Type:	crash	Stage:	resolved
Components:	Interpreter Core, Library (Lib)	Versions:	Python 3.1, Python 3.2

process

Status:	closed	Resolution:	fixed
Dependencies:		Superseder:
Assigned To:		Nosy List:	benjamin.peterson, georg.brandl, loewis, mark.dickinson, pitrou, sh, skrah
Priority:	release blocker	Keywords:	patch

Created on 2009-12-22 07:31 by sh, last changed 2022-04-11 14:56 by admin. This issue is now closed.

Files
File name	Uploaded	Description	Edit
bytearray.patch	pitrou, 2010-01-17 11:59

Messages (20)
msg96795 - (view)	Author: Sebastian Hagen (sh)	Date: 2009-12-22 07:31
Various functions in the 'posix' module that take filename arguments accept bytearray values for those arguments, and mishandle those objects in a way that leads to segfaults. Python 3.1 (r31:73572, Jul 23 2009, 23:41:26) [GCC 4.3.3] on linux2 Type "help", "copyright", "credits" or "license" for more information. >>> import os >>> os.mkdir(bytearray(b'/')) Segmentation fault There's at least two seperate problems with the way posixmodule handles these objects. The first is that the code isn't set up to handle NULL retvals from PyByteArray_AS_STRING(), which occur for zero-byte bytearray objects. This causes a NULL-pointer dereference in PyUnicode_FSConverter() if you pass a zero-length bytearray. The second issue is that release_bytes() calls bytearray_releasebuffer() with NULL for the first argument, which directly leads to a NULL-pointer dereference. I'm attaching a patch against SVN 77001 which should fix both of these issues.
msg96799 - (view)	Author: Stefan Krah (skrah) *	Date: 2009-12-22 16:48
Crash confirmed. I don't see any issue with bytes2str(), the second part of your patch is what fixes the problem. I attach the reduced patch so it's clear what I mean.
msg96800 - (view)	Author: Sebastian Hagen (sh)	Date: 2009-12-22 17:19
Not exactly. The last part fixes the second problem, which you get for non-zero-length bytearrays. But without the first fix, zero-length bytearrays still lead to a crash: Python 3.2a0 (py3k:77001M, Dec 22 2009, 18:17:08) [GCC 4.3.4] on linux2 Type "help", "copyright", "credits" or "license" for more information. >>> import posix >>> posix.mkdir(bytearray(0)) Segmentation fault That's what the rest of the patch fixes.
msg96801 - (view)	Author: Stefan Krah (skrah) *	Date: 2009-12-22 17:29
Correction: You are of course right about PyByteArray_AS_STRING(), but with os.mkdir(bytearray(b'')) I get the segfault in PyUnicodeUCS2_FSConverter(): (gdb) n 1638 size = PyByteArray_GET_SIZE(output); (gdb) n 1639 data = PyByteArray_AS_STRING(output); (gdb) p size $2 = 0 (gdb) n 1641 if (size != strlen(data)) { (gdb) p data $3 = (void *) 0x0 Should perhaps PyByteArray_AS_STRING() be fixed?
msg96802 - (view)	Author: Sebastian Hagen (sh)	Date: 2009-12-22 17:33
You're correct about PyUnicode_FSConverter(), which is why the very first part of my patch fixes that function. Only fixing that one will get rid of the segfaults, but also lead to incorrect error reporting for the zero-length bytearray case; the bytes2str() modification is to get the right exceptions. I don't know which precise semantics PyByteArray_AS_STRING() is supposed to have. I assumed it returning NULL was normal for 0-byte-length arrays, and based my patch off of that.
msg96803 - (view)	Author: Sebastian Hagen (sh)	Date: 2009-12-22 17:49
Correction: "Only fixing that one will get rid of the segfaults" ... well, for mkdir() on GNU/Linux, anyway. POSIX.1-2008 doesn't specify what happens if you call mkdir() with a NULL pointer, so I guess other conforming implementations might in fact still segfault at that point - it just happens that the one I tested it on is too nice to do that. Either way, passing a NULL pointer to those functions is almost certainly not a good idea.
msg96804 - (view)	Author: Stefan Krah (skrah) *	Date: 2009-12-22 17:54
Sorry that I missed the first part of your patch. I don't know exactly what PyByteArray_AS_STRING() is meant to do either, but I think it would make sense to return an empty string. This here works: >>> bytes(bytearray(b'')) b''
msg96805 - (view)	Author: Sebastian Hagen (sh)	Date: 2009-12-22 18:20
I've glanced at some of the other PyByteArray_AS_STRING() (and PyByteArray_AsStr(), which inherits this behaviour) uses in the stdlib. By far the heaviest user is bytearrayobject.c; aside from that, there's by my count only 24 uses in current trunk. I haven't looked at all of them in detail, but the ones I have looked at all seem to ensure that the possible NULL retvals don't cause them problems. Given that, and considering that bytearray itself uses it for all kinds of operations, I'd be rather reluctant to add any additional overhead to this macro absent some authoritative statement that the current behaviour is bad. We'd definitely get better performance by just having posixmodule.c pay attention to the retval it gets. [Yes, this is probably premature optimization; but it's not as if fixing posixmodule.c takes any massive changes either, so I'm not too worried about additional code complexity in this particular case.]
msg96806 - (view)	Author: Martin v. Löwis (loewis) *	Date: 2009-12-22 18:42
Stepping back from the issue, I wonder whether posixmodule needs to accept bytearray objects at all. IIRC, the rationale was to allow access to file names that are unencodable, an issue that was mitigated by PEP 383. But even if byte-oriented file names stay supported, it is unclear why bytearray needs to be supported (as opposed to just supporting bytes). I suggest to bring up the issue on python-dev.
msg96808 - (view)	Author: Sebastian Hagen (sh)	Date: 2009-12-22 19:23
Well, it doesn't need to accept them ... but it would certainly be nice to have. If you've already got the filename in a bytearray object for some reason, being able to pass it through directly saves you both a copy and the explicit conversion code, which is a double-win. From an interface POV, it'd be even better if memoryview was allowed, too ... is there a specific reason that it's not? If one kind of simple readable buffers work, I don't see any good reason not to support all such objects.
msg96810 - (view)	Author: Sebastian Hagen (sh)	Date: 2009-12-22 19:27
Oh, and forcing use of the PEP 383 hack for such interfaces would really be the Wrong Thing. Byte sequences are the natural (and most efficient, and least prone to misunderstandings) way to store filenames on a posix-like. Storing them as unicode-except-not-really is an acceptable hack for interfaces that need to standardize on strings for some reasons, but that really doesn't apply to these functions, and I'd always store such filenames as bytes if I know I'm running on a posix-like.
msg96811 - (view)	Author: Martin v. Löwis (loewis) *	Date: 2009-12-22 19:47
I doubt that buffers (including bytearray) are as useful in Python for storing file names as they are in C. Typically, when creating a buffer, you allocate some maximum size (in C: MAXPATH or some such), and then you fill the buffer, and null-terminate. In Python, you can't null-terminate; instead, the buffer must have the right size already. So I wonder: where would you get a bytearray from that has exactly the right size for a file name that you want to access (and where do you get the file name from)? If it is IAGNI, then: practicality beats purity - just because we could support arbitrary byte areas, it doesn't mean we should.
msg96813 - (view)	Author: Martin v. Löwis (loewis) *	Date: 2009-12-22 19:48
sh: I don't argue against supporting bytes (here), only against supporting bytearray.
msg96838 - (view)	Author: Stefan Krah (skrah) *	Date: 2009-12-23 11:57
I briefly looked at how PyByteArray_AS_STRING() is used in other places. A similar segfault can be provoked in long_new(): int(bytearray(b''), 10) Then, there are a couple of places where pointer arithmetic is used with the NULL pointer. Also, memcpy(x, NULL, 0) can occur. If I'm not mistaken, both of these work in practice but are strictly speaking undefined behavior.
msg97001 - (view)	Author: Mark Dickinson (mark.dickinson) *	Date: 2009-12-29 20:40
Having the ob_bytes field be NULL for a bytearray seems like a strange choice; does anyone know why bytearray was designed this way? Is it mainly for ease of combination with the stringlib library? There does seem to be an awful lot of code that acts as though it expects x->ob_bytes to always be non-NULL for a valid PyByteArrayObject* x. IMO, the use of pointer arithmetic with NULL pointers that Stefan pointed out should definitely be fixed. The long_new problem looks like a trivial fix: I'd fix it already, but given the number of other problems with this macro I'm not sure whether it should be fixed in Objects/longobject.c, or in the PyByteArray_AS_STRING macro itself. I guess this is up to Georg and Benjamin, but I don't think 3.2 or 3.1.2 should go out of the door with this bug present.
msg97029 - (view)	Author: Antoine Pitrou (pitrou) *	Date: 2009-12-30 12:27
IMO bytearray should be fixed, for everyone else's sanity.
msg97938 - (view)	Author: Antoine Pitrou (pitrou) *	Date: 2010-01-17 11:59
Here is a patch. A static empty string is used in case PyByteArray_AS_STRING() is asked on an empty bytearray, instead of returning NULL. Another possibility would be to implement tp_new for bytearray and always allocate a new 1-byte string, but this would make the patch slightly more complicated. (the patch also removes "nullstring", which was unused since Florent's recent patches)
msg97939 - (view)	Author: Antoine Pitrou (pitrou) *	Date: 2010-01-17 12:20
The posixmodule was committed with a test in r77571 (py3k) and r77572 (3.1).
msg97940 - (view)	Author: Antoine Pitrou (pitrou) *	Date: 2010-01-17 12:21
(I meant the posixmodule fix, of course)
msg97942 - (view)	Author: Antoine Pitrou (pitrou) *	Date: 2010-01-17 12:44
The bytearray fix has been committed in r77573 (trunk), r77574 (2.6), r77576 (py3k), r77577 (3.1). The issue can now be closed.

History
Date	User	Action	Args
2022-04-11 14:56:55	admin	set	github: 51810
2010-01-17 12:44:17	pitrou	set	status: open -> closed resolution: fixed messages: + msg97942 stage: test needed -> resolved
2010-01-17 12:21:21	pitrou	set	messages: + msg97940
2010-01-17 12:21:06	pitrou	set	files: - release_bytes.diff
2010-01-17 12:21:02	pitrou	set	files: - posixmodule_fn_bytearray_fix_01.patch
2010-01-17 12:20:49	pitrou	set	messages: + msg97939
2010-01-17 12:00:02	pitrou	set	files: + bytearray.patch messages: + msg97938
2009-12-30 12:27:02	pitrou	set	nosy: + pitrou, benjamin.peterson, georg.brandl messages: + msg97029
2009-12-29 20:40:48	mark.dickinson	set	priority: critical -> release blocker messages: + msg97001
2009-12-24 22:37:51	mark.dickinson	set	nosy: + mark.dickinson
2009-12-23 13:22:51	r.david.murray	set	priority: high -> critical nosy: loewis, skrah, sh components: + Interpreter Core title: Filename-taking functions in posix segfault when called with a bytearray arg. -> PyByteArray_AS_STRING used unsafely
2009-12-23 11:57:23	skrah	set	messages: + msg96838
2009-12-22 19:48:48	loewis	set	messages: - msg96814
2009-12-22 19:48:40	loewis	set	messages: + msg96814
2009-12-22 19:48:37	loewis	set	messages: + msg96813
2009-12-22 19:47:27	loewis	set	messages: + msg96811
2009-12-22 19:27:24	sh	set	messages: + msg96810
2009-12-22 19:23:25	sh	set	messages: + msg96808
2009-12-22 18:42:30	loewis	set	nosy: + loewis messages: + msg96806
2009-12-22 18:20:40	sh	set	messages: + msg96805
2009-12-22 17:54:23	skrah	set	messages: + msg96804
2009-12-22 17:49:22	sh	set	messages: + msg96803
2009-12-22 17:33:35	sh	set	messages: + msg96802
2009-12-22 17:29:09	skrah	set	messages: + msg96801
2009-12-22 17:19:16	sh	set	messages: + msg96800
2009-12-22 16:48:07	skrah	set	files: + release_bytes.diff nosy: + skrah messages: + msg96799
2009-12-22 13:06:04	r.david.murray	set	priority: high stage: test needed
2009-12-22 07:31:42	sh	create