classification
Title: cannot marshal objects with more than 2**31 elements
Type: behavior Stage: resolved
Components: Interpreter Core Versions: Python 3.4, Python 3.3, Python 2.7
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: serhiy.storchaka Nosy List: christian.heimes, eric.snow, mark.dickinson, pitrou, python-dev, rhettinger, serhiy.storchaka, skrah
Priority: normal Keywords: patch

Created on 2009-02-18 22:45 by mark.dickinson, last changed 2013-07-14 12:39 by serhiy.storchaka. This issue is now closed.

Files
File name Uploaded Description Edit
marshal_overflow.patch serhiy.storchaka, 2013-02-06 15:51 Patch for 3.x review
marshal_overflow-2.7.patch serhiy.storchaka, 2013-02-06 15:52 Patch for 2.7 review
marshal_string_leak.patch serhiy.storchaka, 2013-07-01 14:23 review
Messages (22)
msg82437 - (view) Author: Mark Dickinson (mark.dickinson) * (Python committer) Date: 2009-02-18 22:45
Two closely related issues in Python/marshal.c, involving writing and 
reading of variable-length objects (lists, strings, long integers, ...)

(1) The w_object function in marshal contains many instances of code 
like the following:

else if (PyList_CheckExact(v)) {
	w_byte(TYPE_LIST, p);
	n = PyList_GET_SIZE(v);
	w_long((long)n, p);
	for (i = 0; i < n; i++) {
		w_object(PyList_GET_ITEM(v, i), p);
	}
}

On a 64-bit platform there's potential loss of information here
either in the cast "(long)n" (if sizeof(long) is 4), or in
w_long itself (if sizeof(long) is 8).  Note that w_long, despite
its name, always writes exactly 4 bytes.

There should at least be an exception raised here if n is not
in the range [-2**31, 2**31).  This would make marshalling of
large objects illegal (rather than just wrong).

A more involved fix would allow marshalling of objects of size >= 2**31.  
This would obviously involve changing the marshal format, and would make 
it impossible to marshal a large object on a 64-bit platform and then 
unmarshal it on a 32-bit platform.  The latter may not really be a 
problem, since memory considerations ought to rule that out anyway.

(2) In r_object (and possibly elsewhere) there are corresponding checks 
of the form:

case TYPE_LIST:
	n = r_long(p);
	if (n < 0 || n > INT_MAX) {
		PyErr_SetString(PyExc_ValueError, "bad marshal data");
		retval = NULL;
		break;
	}

...

if we allow marshalling of objects with more than 2**31-1 elements then 
these error checks can be relaxed.  (And as a matter of principle, 
INT_MAX isn't really right here: an int might be only 16 bits long on 
some strange platforms...).
msg82440 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2009-02-18 23:02
Given that marshal is primarily about supporting pyc files, do we care?
msg82570 - (view) Author: Mark Dickinson (mark.dickinson) * (Python committer) Date: 2009-02-21 17:32
It wouldn't hurt to add the overflow checks though, would it?
msg82583 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2009-02-21 21:08
Why not.  Besides it ought to be fun to write the test case for this one :-)
msg181535 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2013-02-06 15:51
Here is a patch for 3.x which adds checks for size overflow (only on 64-bit platform).
msg181536 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2013-02-06 15:52
And here is a patch for 2.7.
msg181537 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2013-02-06 15:56
> (And as a matter of principle, 
> INT_MAX isn't really right here: an int might be only 16 bits long on 
> some strange platforms...).

AFAIK Python doesn't support such platforms (and C standard requites at least 32-bit ints). However there are strange platforms with 64-bit ints. That's why I used the explicit constant 0x7FFFFFFF.
msg181554 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2013-02-06 18:33
Perhaps you could add a bigmem test for this?
(assuming you have enough memory somewhere to test it)
msg181560 - (view) Author: Mark Dickinson (mark.dickinson) * (Python committer) Date: 2013-02-06 19:08
> (and C standard requites at least 32-bit ints)

No, that's not true: see Annex E of the standard, where the minimum for INT_MAX is declared to be 32767.
msg181561 - (view) Author: Stefan Krah (skrah) * (Python committer) Date: 2013-02-06 19:10
Theoretically int may be 16 bits (C99):

5.2.4.2.1 Sizes of integer types <limits.h>

 -- maximum value for an object of type int
    INT_MAX +32767 // 2**15 − 1


I agree that Python wouldn't compile on such a platform anyway.
msg181562 - (view) Author: Stefan Krah (skrah) * (Python committer) Date: 2013-02-06 19:11
Hmm, Mark was faster. ;)
msg181564 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2013-02-06 19:19
> Perhaps you could add a bigmem test for this?
> (assuming you have enough memory somewhere to test it)

I think this is possible.  I now have some experience in the writing of bigmem 
tests. ;)
msg181570 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2013-02-06 20:10
> No, that's not true: see Annex E of the standard, where the minimum for
> INT_MAX is declared to be 32767.

My fault. Do I have to support such a case in the code?
msg181572 - (view) Author: Mark Dickinson (mark.dickinson) * (Python committer) Date: 2013-02-06 20:25
> Do I have to support such a case in the code?

No, I don't see any need for that:  after all, you're making the code *more* portable by replacing those occurrences of INT_MAX with 0x7fffffff. :-)
msg181573 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2013-02-06 20:29
Perhaps we should change signatures of w_string() and r_string() -- replace "int" by at least "long".
msg181574 - (view) Author: Mark Dickinson (mark.dickinson) * (Python committer) Date: 2013-02-06 20:38
Hmm, yes, I think that would also make sense.  I missed those uses of int.

I'll give in, though, and accept that >= 32-bit ints are a de facto standard, even if not de jure.
msg182011 - (view) Author: Roundup Robot (python-dev) Date: 2013-02-13 10:15
New changeset 385d982ce641 by Serhiy Storchaka in branch '2.7':
Issue #5308: Raise ValueError when marshalling too large object (a sequence
http://hg.python.org/cpython/rev/385d982ce641

New changeset e0464fa28c85 by Serhiy Storchaka in branch '3.2':
Issue #5308: Raise ValueError when marshalling too large object (a sequence
http://hg.python.org/cpython/rev/e0464fa28c85

New changeset b48e1cd2d3be by Serhiy Storchaka in branch '3.3':
Issue #5308: Raise ValueError when marshalling too large object (a sequence
http://hg.python.org/cpython/rev/b48e1cd2d3be

New changeset ea36478a36ee by Serhiy Storchaka in branch 'default':
Issue #5308: Raise ValueError when marshalling too large object (a sequence
http://hg.python.org/cpython/rev/ea36478a36ee
msg182014 - (view) Author: Roundup Robot (python-dev) Date: 2013-02-13 10:34
New changeset 72e75ea25d00 by Serhiy Storchaka in branch '2.7':
Fix tests for issue #5308.
http://hg.python.org/cpython/rev/72e75ea25d00

New changeset 0407e5e5915e by Serhiy Storchaka in branch '3.3':
Cleanup a test for issue #5308.
http://hg.python.org/cpython/rev/0407e5e5915e

New changeset e45f2fcf202c by Serhiy Storchaka in branch 'default':
Cleanup a test for issue #5308.
http://hg.python.org/cpython/rev/e45f2fcf202c
msg182015 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2013-02-13 10:38
> Perhaps you could add a bigmem test for this?
> (assuming you have enough memory somewhere to test it)

Some tests require more than 252 GiB of memory.
msg192093 - (view) Author: Christian Heimes (christian.heimes) * (Python committer) Date: 2013-06-30 23:44
The macro W_SIZE at http://hg.python.org/cpython/file/dbdb6f7f9a1a/Python/marshal.c#l130 has introduced a reference leak at
http://hg.python.org/cpython/file/dbdb6f7f9a1a/Python/marshal.c#l390 . Because W_SIZE returns in an error case the reference count of PyObject *utf8 isn't decreased. 

CID 1040640 (#1 of 1): Resource leak (RESOURCE_LEAK)
leaked_storage: Variable "utf8" going out of scope leaks the storage it points to.
msg192131 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2013-07-01 14:23
Good catch. There is another resource leak for general bytes-like object (the buffer can be not released). Here is a patch.
msg192878 - (view) Author: Roundup Robot (python-dev) Date: 2013-07-11 16:21
New changeset 1cf2c42af815 by Serhiy Storchaka in branch '2.7':
Fix reference leaks introduced by the patch for issue #5308.
http://hg.python.org/cpython/rev/1cf2c42af815

New changeset 8b99f2224c3a by Serhiy Storchaka in branch '3.3':
Fix reference leaks introduced by the patch for issue #5308.
http://hg.python.org/cpython/rev/8b99f2224c3a

New changeset 19ed630d8d75 by Serhiy Storchaka in branch 'default':
Fix reference leaks introduced by the patch for issue #5308.
http://hg.python.org/cpython/rev/19ed630d8d75
History
Date User Action Args
2013-07-14 12:39:32serhiy.storchakasetstatus: pending -> closed
stage: commit review -> resolved
2013-07-11 16:23:08serhiy.storchakasetstatus: open -> pending
stage: needs patch -> commit review
resolution: fixed
versions: - Python 3.2
2013-07-11 16:21:54python-devsetmessages: + msg192878
2013-07-01 14:23:45serhiy.storchakasetfiles: + marshal_string_leak.patch

messages: + msg192131
2013-06-30 23:44:31christian.heimessetstatus: closed -> open

nosy: + christian.heimes
messages: + msg192093

resolution: fixed -> (no value)
stage: resolved -> needs patch
2013-02-15 08:25:27serhiy.storchakasetstatus: open -> closed
resolution: fixed
stage: patch review -> resolved
2013-02-13 10:38:38serhiy.storchakasetmessages: + msg182015
2013-02-13 10:34:26python-devsetmessages: + msg182014
2013-02-13 10:15:18python-devsetnosy: + python-dev
messages: + msg182011
2013-02-12 23:44:44serhiy.storchakasetassignee: serhiy.storchaka
2013-02-06 20:38:01mark.dickinsonsetmessages: + msg181574
2013-02-06 20:29:13serhiy.storchakasetmessages: + msg181573
2013-02-06 20:25:23mark.dickinsonsetmessages: + msg181572
2013-02-06 20:10:10serhiy.storchakasetmessages: + msg181570
2013-02-06 19:19:39serhiy.storchakasetmessages: + msg181564
2013-02-06 19:11:51skrahsetmessages: + msg181562
2013-02-06 19:10:56skrahsetnosy: + skrah
messages: + msg181561
2013-02-06 19:09:01eric.snowsetnosy: + eric.snow
2013-02-06 19:08:42mark.dickinsonsetmessages: + msg181560
2013-02-06 18:33:29pitrousetnosy: + pitrou
messages: + msg181554
2013-02-06 15:56:03serhiy.storchakasetmessages: + msg181537
2013-02-06 15:52:09serhiy.storchakasetfiles: + marshal_overflow-2.7.patch

messages: + msg181536
2013-02-06 15:51:10serhiy.storchakasetfiles: + marshal_overflow.patch
keywords: + patch
messages: + msg181535

stage: needs patch -> patch review
2013-02-06 10:19:22serhiy.storchakasetnosy: + serhiy.storchaka

versions: + Python 3.4
2011-06-26 20:46:43terry.reedysetstage: needs patch
versions: + Python 3.2, Python 3.3, - Python 3.1
2009-06-07 14:54:00mark.dickinsonsetassignee: mark.dickinson -> (no value)
2009-02-21 21:08:12rhettingersetmessages: + msg82583
2009-02-21 17:32:40mark.dickinsonsetpriority: normal
assignee: mark.dickinson
messages: + msg82570
2009-02-18 23:02:02rhettingersetnosy: + rhettinger
messages: + msg82440
2009-02-18 22:45:03mark.dickinsoncreate