classification
Title: cannot marshal objects with more than 2**31 elements
Type: behavior Stage: needs patch
Components: Interpreter Core Versions: Python 3.3, Python 3.2, Python 2.7
process
Status: open Resolution:
Dependencies: Superseder:
Assigned To: Nosy List: mark.dickinson, rhettinger
Priority: normal Keywords:

Created on 2009-02-18 22:45 by mark.dickinson, last changed 2011-06-26 20:46 by terry.reedy.

Messages (4)
msg82437 - (view) Author: Mark Dickinson (mark.dickinson) * (Python committer) Date: 2009-02-18 22:45
Two closely related issues in Python/marshal.c, involving writing and 
reading of variable-length objects (lists, strings, long integers, ...)

(1) The w_object function in marshal contains many instances of code 
like the following:

else if (PyList_CheckExact(v)) {
	w_byte(TYPE_LIST, p);
	n = PyList_GET_SIZE(v);
	w_long((long)n, p);
	for (i = 0; i < n; i++) {
		w_object(PyList_GET_ITEM(v, i), p);
	}
}

On a 64-bit platform there's potential loss of information here
either in the cast "(long)n" (if sizeof(long) is 4), or in
w_long itself (if sizeof(long) is 8).  Note that w_long, despite
its name, always writes exactly 4 bytes.

There should at least be an exception raised here if n is not
in the range [-2**31, 2**31).  This would make marshalling of
large objects illegal (rather than just wrong).

A more involved fix would allow marshalling of objects of size >= 2**31.  
This would obviously involve changing the marshal format, and would make 
it impossible to marshal a large object on a 64-bit platform and then 
unmarshal it on a 32-bit platform.  The latter may not really be a 
problem, since memory considerations ought to rule that out anyway.

(2) In r_object (and possibly elsewhere) there are corresponding checks 
of the form:

case TYPE_LIST:
	n = r_long(p);
	if (n < 0 || n > INT_MAX) {
		PyErr_SetString(PyExc_ValueError, "bad marshal data");
		retval = NULL;
		break;
	}

...

if we allow marshalling of objects with more than 2**31-1 elements then 
these error checks can be relaxed.  (And as a matter of principle, 
INT_MAX isn't really right here: an int might be only 16 bits long on 
some strange platforms...).
msg82440 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2009-02-18 23:02
Given that marshal is primarily about supporting pyc files, do we care?
msg82570 - (view) Author: Mark Dickinson (mark.dickinson) * (Python committer) Date: 2009-02-21 17:32
It wouldn't hurt to add the overflow checks though, would it?
msg82583 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2009-02-21 21:08
Why not.  Besides it ought to be fun to write the test case for this one :-)
History
Date User Action Args
2011-06-26 20:46:43terry.reedysetstage: needs patch
versions: + Python 3.2, Python 3.3, - Python 3.1
2009-06-07 14:54:00mark.dickinsonsetassignee: mark.dickinson ->
2009-02-21 21:08:12rhettingersetmessages: + msg82583
2009-02-21 17:32:40mark.dickinsonsetpriority: normal
assignee: mark.dickinson
messages: + msg82570
2009-02-18 23:02:02rhettingersetnosy: + rhettinger
messages: + msg82440
2009-02-18 22:45:03mark.dickinsoncreate