classification
Title: file written using marshal in 3.2 can be read by 2.7, but not 3.2 or 3.3
Type: behavior Stage: resolved
Components: Versions: Python 3.2, Python 3.3
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: Nosy List: Arfrever, amaury.forgeotdarc, benjamin.peterson, djc, georg.brandl, grubert, haypo, pitrou, python-dev, vinay.sajip
Priority: release blocker Keywords: needs review, patch

Created on 2011-06-09 09:10 by vinay.sajip, last changed 2011-07-02 17:52 by georg.brandl. This issue is now closed.

Files
File name Uploaded Description Edit
data.bin vinay.sajip, 2011-06-09 09:10 Test data - objects written using marshal in Python 3.2
tried_renaming.diff vinay.sajip, 2011-06-20 06:57 review
ab1c38ffb8d4.diff vinay.sajip, 2011-06-24 14:28 review
multiple_dump_load_test.patch grubert, 2011-06-25 16:17 failing test to test_marshal.py review
multiple_dump_load_read_write_test.patch grubert, 2011-06-26 08:23 multiple dump write, load read review
0feab4e7b27f.diff vinay.sajip, 2011-06-26 19:49 Merged Engelbert's test patches into fix branch. review
from_antoine_comments.diff vinay.sajip, 2011-07-01 00:16 Updates in response to Antoine's review comments. review
update_to_test_importlib.diff vinay.sajip, 2011-07-01 01:12 Change to test_importlib added review
Repositories containing patches
http://hg.python.org/sandbox/vsajip#fix12291
Messages (40)
msg137943 - (view) Author: Vinay Sajip (vinay.sajip) * (Python committer) Date: 2011-06-09 09:10
The attached file 'data.bin' was written using Python 3.2. It can be read by Python 2.7, but in 3.2 and 3.3, after the first object is read, the file pointer is positioned at EOF, causing an error on subsequent reads. A simple test script 'marshtest.py' is below:

import marshal
import sys

f = open('data.bin', 'rb')
t = marshal.load(f)
print('Python version:', sys.version)
print('Object just loaded was:\n%r' % (t,))
print('File position is now at %d' % f.tell())
t = marshal.load(f)
print('Object just loaded was:\n%r' % (t,))
print('File position is now at %d' % f.tell())

Results of running it under various Python versions:

vinay@eta-natty:~/projects/scratch$ python marshtest.py 
('Python version:', '2.7.1+ (r271:86832, Apr 11 2011, 18:05:24) \n[GCC 4.5.2]')
Object just loaded was:
(u'text', u'alfa', 202, 1.0, '\x00\x00\x00\x01]q\x00K\x03a')
File position is now at 52
Object just loaded was:
(u'text', u'alfa', 212, 1.0, '\x00\x00\x00\x01]q\x00K\x03a')
File position is now at 104

vinay@eta-natty:~/projects/scratch$ python3.2 marshtest.py 
Python version: 3.2 (r32:88445, Mar 25 2011, 19:28:28) 
[GCC 4.5.2]
Object just loaded was:
('text', 'alfa', 202, 1.0, b'\x00\x00\x00\x01]q\x00K\x03a')
File position is now at 53617
Traceback (most recent call last):
  File "marshtest.py", line 9, in <module>
    t = marshal.load(f)
EOFError: EOF read where object expected

vinay@eta-natty:~/projects/scratch$ python3.3 marshtest.py 
Python version: 3.3a0 (default:8d4d87dd73ae, Apr  2 2011, 14:25:31) 
[GCC 4.5.2]
Object just loaded was:
('text', 'alfa', 202, 1.0, b'\x00\x00\x00\x01]q\x00K\x03a')
File position is now at 53617
Traceback (most recent call last):
  File "marshtest.py", line 9, in <module>
    t = marshal.load(f)
EOFError: EOF read where object expected

Note the size of the file is 53617 bytes.

vinay@eta-natty:~/projects/scratch$ ls -l data.bin
-rw------- 1 vinay vinay 53617 2011-06-09 09:33 data.bin
msg137944 - (view) Author: Amaury Forgeot d'Arc (amaury.forgeotdarc) * (Python committer) Date: 2011-06-09 09:31
Sadly, marshal.load() looks broken:

- The function starts with the comment
    /* XXX Quick hack -- need to do this differently */

- It starts by calling f.read() which consumes the whole file (and explains the issue reported here)

- The code was probably converted too quickly:
    if (PyBytes_Check(data)) {
          ...
    else if (PyBytes_Check(data)) {
msg137966 - (view) Author: Vinay Sajip (vinay.sajip) * (Python committer) Date: 2011-06-09 13:49
I've added a patch which I think fixes marshal.load(), and would be grateful for a review - it's my first C patch for Python. The test_marshal test passes, though I haven't yet added a test for the failing scenario.
msg137969 - (view) Author: Amaury Forgeot d'Arc (amaury.forgeotdarc) * (Python committer) Date: 2011-06-09 13:55
- Please replace tabs characters by space
- "//" comments are not accepted by some picky C89 compilers

Also, calling f.read(1) for each character may have a large performance impact.  I don't know how this can be solved though.
msg137973 - (view) Author: Benjamin Peterson (benjamin.peterson) * (Python committer) Date: 2011-06-09 14:35
It should probably be buffered at 512 bytes or some other reasonable number.
msg137987 - (view) Author: Georg Brandl (georg.brandl) * (Python committer) Date: 2011-06-09 15:24
Sounds blocker-ish enough to me.
msg138012 - (view) Author: Vinay Sajip (vinay.sajip) * (Python committer) Date: 2011-06-09 16:36
@Amaury: Re formatting and comment style - I will address this. The diff I posted is to the default branch rather than 3.2, though it won't change the main substance of the patch.

Re. performance and buffering: I agree this could have an impact (although strings are not actually read a byte at a time), but it's not easy to address because even if you e.g. wrapped the incoming stream with a BufferedReader, you have nowhere to keep the resulting instance in between multiple calls to marshal.load() from external code.

For best performance ISTM you'd have to implement your own buffering - but surely this is to be avoided? It can't be the only place where you need good I/O performance from file-like objects. A quick scan of the pickle module didn't show me anything I could easily use, and a key difference from pickle is that you don't have a corresponding "Unmarshaller" object which could hold the state that an Unpickler does, and which would be needed to hold the buffering state.

Of course one could implement limited buffering, i.e. for just the duration of a single marshal.load() call, but perhaps this could be done as a later step after evaluating the actual performance of this implementation. Having something that works correctly, albeit slowly, is better than having the current situation ...
msg138024 - (view) Author: Vinay Sajip (vinay.sajip) * (Python committer) Date: 2011-06-09 19:50
Another thought about buffering, which might be a bit of a show-stopper: if in a particular load() call we read ahead for buffering purposes, we've altered the underlying stream (which might not be seekable to allow position restoring). The next call to load() will be likely to fail, since the underlying stream has been positioned further than expected by the earlier load() call, and the buffer state has been lost. Not sure how easy it is to get around this ...
msg138070 - (view) Author: Vinay Sajip (vinay.sajip) * (Python committer) Date: 2011-06-10 09:59
Attached is an improved patch:

1. It's against the 3.2 branch, rather than default.
2. It has an added multiple load test in test_marshal.py.
3. There's more error checking for an EOF condition.
4. I've removed tab chars and used /* C89 comments */.

Of course it's not as fast as the 2.x version, since that had a limitation on only being able to load from file objects, and a fast path for PyFile using stdio calls directly. Hard to compete with that!

Suggestions for improvements gratefully received :-)
msg138160 - (view) Author: Vinay Sajip (vinay.sajip) * (Python committer) Date: 2011-06-11 15:24
Patch is now in my public sandbox on hg.python.org.
msg138162 - (view) Author: Vinay Sajip (vinay.sajip) * (Python committer) Date: 2011-06-11 15:38
For some reason, "Create Patch" is failing with a

[Errno 2] No such file or directory: '/home/roundup/trackers/tracker/cpython/Doc/Makefile'

I've logged an issue on the meta tracker:

http://psf.upfronthosting.co.za/roundup/meta/issue405
msg138571 - (view) Author: Georg Brandl (georg.brandl) * (Python committer) Date: 2011-06-18 08:15
Any reviewers?
msg138585 - (view) Author: Benjamin Peterson (benjamin.peterson) * (Python committer) Date: 2011-06-18 14:42
Why can't you just call fileno() on the file object?
msg138617 - (view) Author: Vinay Sajip (vinay.sajip) * (Python committer) Date: 2011-06-18 22:41
Sorry I'm being dense, but which file object do you mean?
msg138618 - (view) Author: Benjamin Peterson (benjamin.peterson) * (Python committer) Date: 2011-06-18 22:59
2011/6/18 Vinay Sajip <report@bugs.python.org>:
>
> Vinay Sajip <vinay_sajip@yahoo.co.uk> added the comment:
>
> Sorry I'm being dense, but which file object do you mean?

The python file object.
msg138623 - (view) Author: Vinay Sajip (vinay.sajip) * (Python committer) Date: 2011-06-19 00:42
> Benjamin Peterson <benjamin@python.org> added the  comment:
> >  Vinay Sajip <vinay_sajip@yahoo.co.uk> added the  comment:
> >
> > Sorry I'm being dense, but which file object do you  mean?
> 
> The python file  object.

Do you mean special-case handling of the circumstance when the file-like object 
being marshalled from is actually a file? The existing code paths (from when 
marshal.load expected only to work with Python file objects) use FILE *, and 
these are used by other code to read magic numbers etc. I believe. Or perhaps 
I'm still misunderstanding what you're getting at.
msg138624 - (view) Author: Benjamin Peterson (benjamin.peterson) * (Python committer) Date: 2011-06-19 01:02
2011/6/18 Vinay Sajip <report@bugs.python.org>:
>
> Vinay Sajip <vinay_sajip@yahoo.co.uk> added the comment:
>
>> Benjamin Peterson <benjamin@python.org> added the  comment:
>> >  Vinay Sajip <vinay_sajip@yahoo.co.uk> added the  comment:
>> >
>> > Sorry I'm being dense, but which file object do you  mean?
>>
>> The python file  object.
>
> Do you mean special-case handling of the circumstance when the file-like object
> being marshalled from is actually a file? The existing code paths (from when
> marshal.load expected only to work with Python file objects) use FILE *, and
> these are used by other code to read magic numbers etc. I believe. Or perhaps
> I'm still misunderstanding what you're getting at.

When python uses dump() or load() with a file object, you can call
fileno() (and then fopen?) to use it like a C-file object.
msg138625 - (view) Author: Vinay Sajip (vinay.sajip) * (Python committer) Date: 2011-06-19 11:39
The problem with calling fileno() and fdopen() is that you bypass the buffering information held in BufferedIOReader. The first call works, but the FILE * pointer is now positioned at 4K, rather than just past the end of the object just read. The next call fails.

I verified that calling f.tell() after marshal.load(f) returns 4096, rather than just the size of the object read by the load().

Just to be clear, here's what I did in marshal_load:

    int is_file = 0;
    int fd;

    data = PyObject_CallMethod(f, "fileno", "");
    if (data == NULL)
        PyErr_Clear();
    else {
        fd = PyLong_AsLong(data);
        Py_DECREF(data);
        is_file = 1;
    }
    if (is_file) {
        rf.readable = NULL;
        rf.fp = fdopen(fd, "rb");
    }
    else {
        /* what I was doing before to set up rf */
    }
    /* and on to the read_object call */
msg138631 - (view) Author: Benjamin Peterson (benjamin.peterson) * (Python committer) Date: 2011-06-19 14:54
2011/6/19 Vinay Sajip <report@bugs.python.org>:
>
> Vinay Sajip <vinay_sajip@yahoo.co.uk> added the comment:
>
> The problem with calling fileno() and fdopen() is that you bypass the buffering information held in BufferedIOReader. The first call works, but the FILE * pointer is now positioned at 4K, rather than just past the end of the object just read. The next call fails.

But presumably once you have the fd, you can llseek().
msg138636 - (view) Author: Vinay Sajip (vinay.sajip) * (Python committer) Date: 2011-06-19 15:43
This seems a bit hacky, and I'm not sure how reliable it is. I added this after the read_object call:

    if (is_file) {
        PyObject * newpos;
        int cp, np;

        cp = ftell(rf.fp);
        newpos = PyObject_CallMethod(f, "seek", "ii", cp, SEEK_SET);
        assert(newpos != NULL);
        np = PyLong_AsLong(newpos);
        Py_DECREF(newpos);
        assert(cp == np);
    }

When I run the code lots of times, I sometimes get assertion failures at the

assert(newpos != NULL)

line. It's not an EOF condition thing, necessarily: I do get expected behaviour at least sometimes when seeking to the end of file.
msg138637 - (view) Author: Benjamin Peterson (benjamin.peterson) * (Python committer) Date: 2011-06-19 15:45
2011/6/19 Vinay Sajip <report@bugs.python.org>:
>
> Vinay Sajip <vinay_sajip@yahoo.co.uk> added the comment:
>
> This seems a bit hacky, and I'm not sure how reliable it is. I added this after the read_object call:
>
>    if (is_file) {
>        PyObject * newpos;
>        int cp, np;
>
>        cp = ftell(rf.fp);
>        newpos = PyObject_CallMethod(f, "seek", "ii", cp, SEEK_SET);

Why not 0?

>        assert(newpos != NULL);
>        np = PyLong_AsLong(newpos);
>        Py_DECREF(newpos);
>        assert(cp == np);
>    }
>
> When I run the code lots of times, I sometimes get assertion failures at the
>
> assert(newpos != NULL)

That's because the call is failing. Why?

>
> line. It's not an EOF condition thing, necessarily: I do get expected behaviour at least sometimes when seeking to the end of file.
>
msg138661 - (view) Author: Vinay Sajip (vinay.sajip) * (Python committer) Date: 2011-06-19 20:48
> Benjamin Peterson <benjamin@python.org> added the  comment:

> > assert(newpos != NULL)
> 
> That's because the  call is failing. Why?
> 

It's seemingly because the Python code did a seek (in Python) which was not 
communicated to the FILE object; after reading all objects from the file, a seek 
was done to the beginning to start reading again. On the previous call, the FILE 
would have been at EOF (at a C level). So it seems as if we have to call ftell 
on the Python object at the beginning of the read, and do an fseek to sync the 
positions. It's doable, of course, so it's the next thing I'll try, but what I'm 
worried about is the thing turning into a bit of a rabbit-hole.

I haven't looked at the io implementation, but for this sort of synchronisation, 
it would be better if when based on a FILE object, the io implementation 
delegated all its operations to the FILE object - but it doesn't look like 
that's the case, and I'm uncomfortable that there may be some undesirable 
consequences of that.
msg138665 - (view) Author: Vinay Sajip (vinay.sajip) * (Python committer) Date: 2011-06-19 21:10
@Benjamin: I missed commenting on your "Why not 0?", but here's the reasoning: one can't assume that the file only contains one object to be read, at the beginning of the file. It may be that some data is being written to file using marshal.dump, interspersed with other data, and then the stream is being read in at a later time, with marshal.load called to load a previously saved object. In that scenario, why would 0 be always the correct offset value to pass to fseek?

I am synchronising the Python object with the FILE *, but not the other way around - in the failing case I mentioned, external Python code has positioned the io object to zero, but that of course will be overwritten to point the io object back to where the FILE object is positioned. Perhaps I just need to go the other way ...

I'm not sure how reliable a solution will be which tries to work around there apparently being two buffering implementations - at the io level and at the FILE level, and trying to keep them synced in a seemingly ad hoc fashion.

It's perfectly possible that I don't know what I'm talking about, so I'd welcome an improved patch using the ideas you've suggested. It's been a very long time since I wrote C stdio code regularly ;-)
msg138669 - (view) Author: Benjamin Peterson (benjamin.peterson) * (Python committer) Date: 2011-06-19 22:10
I think you're right about playing with the bare fd being too fragile. I think a simpler solution is to read say 1024 bytes at a time and buffer it internally.
msg138672 - (view) Author: Vinay Sajip (vinay.sajip) * (Python committer) Date: 2011-06-19 23:04
> Benjamin Peterson <benjamin@python.org> added the  comment:
> 
> I think you're right about playing with the bare fd being too  fragile. I think 
>a simpler solution is to read say 1024 bytes at a time and  buffer it  
>internally.

Doesn't this suffer from a similar problem? Namely, external Python code 
expecting the stream pointer to always be just past the object just read. See my 
earlier comments pointing out that there's nowhere to store the buffer state 
between successive calls to marshal.load. And the synchronising can be a problem 
to achieve with non-seekable streams (including, but not limited to, sockets).

I'm not trying to be difficult. No, really! :-)
msg139095 - (view) Author: engelbert gruber (grubert) * Date: 2011-06-25 16:17
patch to test_marshal.py that obviously fails in current implementation.

IMHO is file might not be seekable one can not cache so maybe do not do it.
msg139151 - (view) Author: engelbert gruber (grubert) * Date: 2011-06-26 08:23
add interleaved writing to the same file (which might happen or not?)

the tests pass when vinay's marshal.c is applied.

sorry for another patch
msg139204 - (view) Author: Vinay Sajip (vinay.sajip) * (Python committer) Date: 2011-06-26 19:53
I've incorporated Engelbert's two patches into test_marshal.py in my sandbox fix branch. I coalesced them into a single additional test in the existing BugsTestCase.
msg139247 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2011-06-27 08:35
Le Sun, 26 Jun 2011 19:49:03 +0000,
Vinay Sajip <report@bugs.python.org> a écrit :
> 
> Added file: http://bugs.python.org/file22487/0feab4e7b27f.diff

Just a nit, could you give descriptive file names to your patches?
Hex numbers quickly get confusing.
msg139262 - (view) Author: Vinay Sajip (vinay.sajip) * (Python committer) Date: 2011-06-27 12:41
> Just a nit, could you give descriptive file names to your patches?
> Hex numbers quickly get confusing.

Ok - I was under the impression that those names were generated automatically from the changeset hash, and that changing the name arbitrarily would break something. Is it not better/sufficient if I just update the description?
msg139508 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2011-06-30 18:22
I think the question is: will the slowdown apply to module import, or only to explicit uses of the marshal module? If the latter, then I think we can live with it - we discourage using marshal as a general-purpose serialization scheme anyway.

As for the patch:
- why do the tests have to carry a large chunk of binary data? if some data is needed, at least it would be nice if it were kept small
- not sure either why it needs zlib and base64; just use the repr() of the bytestring
- "assertEqual = self.assertEqual" is more of a nuisance than anything else; you're making the code more complicated to read just to save a few keystrokes; same for "assertIsInstance = self.assertIsInstance"
- can you add a comment next to the fields you're adding to the marshal C struct?
- if the "r_byte" macro isn't used anymore, it should be removed, not commented out
- in r_string(), what happens if PyBytes_Check(data) is false? there should probably be a TypeError of some sort
- in r_string() again, if data is NULL after the read() call, the exception shouldn't be shadowed by an EOFError
- why do you call read(1) at the beginning? it seems to make the code more complicated for no useful purpose
msg139510 - (view) Author: Vinay Sajip (vinay.sajip) * (Python committer) Date: 2011-06-30 19:12
> Antoine Pitrou <pitrou@free.fr> added the comment:
> 
> I  think the question is: will the slowdown apply to module import, or only to  
>explicit uses of the marshal module? If the latter, then I think we can live  
>with it - we discourage using marshal as a general-purpose serialization scheme  
>anyway.

Thanks for reviewing the patch.

As to your point  - agreed, and as the marshal code is completely broken now, 
something that works is an improvement, even if it's slower than optimal. If 
that turns out to be a problem in practice, it can be fixed.

> As for the patch:
> - why do the tests have to carry a large  chunk of binary data? if some data is 
>needed, at least it would be nice if it  were kept small

Agreed, I just used the initial file where I had the problem - didn't know where 
the problem was, initially. Will look at reducing this.

> - not sure either why it needs zlib and base64; just use the  repr() of the 
>bytestring

The original data was 53K, so just the repr of the bytestring of the raw data is 
around 150K. If I zip the data, it's about 4K, but the repr of that data is 
around 12K. The base64/zlibbed version is 5K.

This will be less of an issue when the test data size is reduced.

> - "assertEqual = self.assertEqual" is more of a  nuisance than anything else; 
>you're making the code more complicated to read  just to save a few keystrokes; 
>same for "assertIsInstance =  self.assertIsInstance"

I'm not sure how it's more complicated or harder to understand. I didn't do it 
just to save keystrokes when writing, it's also to avoid noise when reading. IMO 
this is a question of personal taste, or is this approach proscribed somewhere? 
I've certainly seen it in other Python code.

> - can you add a comment next to the fields you're  adding to the marshal C 
>struct?

Yes, will do.

> - if the "r_byte" macro isn't used anymore,  it should be removed, not 
>commented out

The commenting out is a temporary measure, the comment will be removed before 
commit.

> - in r_string(), what happens if  PyBytes_Check(data) is false? there should 
>probably be a TypeError of some  sort

There is a PyBytes_Check in marshal_load which raises a TypeError if it fails. 
The PyBytes_Check in r_string is only called in the path from marshal_load 
(because p->readable is only non-NULL in that path) - so for the failure to 
occur one would need an initial read to deliver bytes, and a subsequent read to 
deliver non-bytes. Still, I have no problem adding a TypeError raising in 
r_string if PyBytes_Check fails.

> - in r_string() again, if data is NULL after the read() call, the  exception 
>shouldn't be shadowed by an EOFError

Good point, I aim to fix this by changing the lower condition to

if (!PyErr_Occurred() && (read < n)) { ... }

which should be sufficient - do you agree?

> - why do you call read(1) at  the beginning? it seems to make the code more 
>complicated for no useful  purpose

I gave the reasoning in the comment just above the read(1). I agree it makes it 
a little more complicated, but not onerously so - and the approach fails fast as 
well as allowing operation with any stream, not just random-seekable ones.
msg139513 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2011-06-30 19:26
> > - "assertEqual = self.assertEqual" is more of a  nuisance than anything else; 
> >you're making the code more complicated to read  just to save a few keystrokes; 
> >same for "assertIsInstance =  self.assertIsInstance"
> 
> I'm not sure how it's more complicated or harder to understand. I didn't do it 
> just to save keystrokes when writing, it's also to avoid noise when reading. IMO 
> this is a question of personal taste, or is this approach proscribed somewhere? 

It's not proscribed, but trying to remove the "self." because it's
supposed to be more readable is a bit of a strange thing to do.
Also, people reading the test suite should be accustomed to
"self.assertEqual" anyway, so there's no point trying to hide it.

> > - in r_string(), what happens if  PyBytes_Check(data) is false? there should 
> >probably be a TypeError of some  sort
> 
> There is a PyBytes_Check in marshal_load which raises a TypeError if it fails. 
> The PyBytes_Check in r_string is only called in the path from marshal_load 
> (because p->readable is only non-NULL in that path) - so for the failure to 
> occur one would need an initial read to deliver bytes, and a subsequent read to 
> deliver non-bytes. Still, I have no problem adding a TypeError raising in 
> r_string if PyBytes_Check fails.

Error checking can't just be probabilistic. Perhaps there's a bug in the
file-like object; or perhaps it is a non-blocking IO object and read()
will return None at times.

> > - why do you call read(1) at  the beginning? it seems to make the code more 
> >complicated for no useful  purpose
> 
> I gave the reasoning in the comment just above the read(1). I agree it makes it 
> a little more complicated, but not onerously so - and the approach fails fast as 
> well as allowing operation with any stream, not just random-seekable ones.

Well, it wouldn't fail any slower if you didn't do it, since you need to
call read() very soon anyway (presumably as part of the same call to
marshal.load()). Failing "fast" doesn't seem to bring anything here. My
vote is for removing the complication.
msg139537 - (view) Author: Vinay Sajip (vinay.sajip) * (Python committer) Date: 2011-07-01 00:35
> Antoine Pitrou <pitrou@free.fr> added the  comment:

> It's not proscribed, but trying to remove the "self." because  it's
> supposed to be more readable is a bit of a strange thing to do.
> Also,  people reading the test suite should be accustomed to
> "self.assertEqual"  anyway, so there's no point trying to hide it.

It wasn't particularly about self - I'm not against it. Anyway, it's not a big 
deal for me, so I've added the selves back :-)

> Error checking can't  just be probabilistic. Perhaps there's a bug in the
> file-like object; or  perhaps it is a non-blocking IO object and read()
> will return None at  times.

You're right, so I've raised a TypeError if PyBytes_Check fails in r_string.

> Well, it  wouldn't fail any slower if you didn't do it, since you need to
> call read()  very soon anyway (presumably as part of the same call to
> marshal.load()).  Failing "fast" doesn't seem to bring anything here. My
> vote is for removing  the  complication.

Actually I misremembered the complete reason for the call - it was there to 
additionally check that the passed object has a read method. I also realised - 
duh - that I can read zero bytes and still get an empty bytes object back, so 
I've done that, and it does look cleaner. I've also reorganised the marshal_load 
function a little so it flows better.

Your other points have also all been addressed:

I've seen that the original test with the long binary data was exercising the 
same functionality as Engelbert Gruber's later patch was testing. So I've 
removed my earlier test entirely and added tuples and lists into the data being 
marshalled in Engelbert's version of the test. As a result of these changes, the 
zlib/base64 dependency goes away.

The new "readable" field of the C struct has a comment saying what it is.

The commented out r_byte macro has been removed.

Thanks, again, for the review!
msg139539 - (view) Author: Vinay Sajip (vinay.sajip) * (Python committer) Date: 2011-07-01 01:17
As a result of the changes to marshal.c, test_importlib needs a small change: the test_bad_marshal raises an EOFError now, whereas it raised ValueError before.

I think it's because the earlier code in marshal didn't properly check for EOF conditions in some places.

So I've changes the assertRaises(ValueError) to assertRaises(EOFError).

All tests now pass, other than test_ftplib (not related).
msg139588 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2011-07-01 16:55
Latest patch looks ok to me.
msg139628 - (view) Author: Georg Brandl (georg.brandl) * (Python committer) Date: 2011-07-02 09:10
Can we get this committed for 3.2.1 then?
msg139661 - (view) Author: Roundup Robot (python-dev) Date: 2011-07-02 15:43
New changeset edba722f3b02 by Vinay Sajip in branch '3.2':
Closes #12291: Fixed bug which was found when doing  multiple loads from one stream.
http://hg.python.org/cpython/rev/edba722f3b02
msg139662 - (view) Author: Roundup Robot (python-dev) Date: 2011-07-02 16:16
New changeset 42dd11028e94 by Vinay Sajip in branch 'default':
Closes #12291 for 3.3 - merged fix from 3.2.
http://hg.python.org/cpython/rev/42dd11028e94
msg139663 - (view) Author: Georg Brandl (georg.brandl) * (Python committer) Date: 2011-07-02 17:52
Thanks!
History
Date User Action Args
2012-03-30 01:25:24r.david.murraylinkissue14447 superseder
2011-07-02 17:52:23georg.brandlsetmessages: + msg139663
2011-07-02 16:16:11python-devsetmessages: + msg139662
2011-07-02 15:43:18python-devsetstatus: open -> closed

nosy: + python-dev
messages: + msg139661

resolution: fixed
stage: patch review -> resolved
2011-07-02 09:10:33georg.brandlsetmessages: + msg139628
2011-07-01 16:55:20pitrousetmessages: + msg139588
2011-07-01 16:25:55hayposetnosy: + haypo
2011-07-01 01:17:12vinay.sajipsetmessages: + msg139539
2011-07-01 01:12:51vinay.sajipsetfiles: + update_to_test_importlib.diff
2011-07-01 00:35:11vinay.sajipsetmessages: + msg139537
2011-07-01 00:16:05vinay.sajipsetfiles: + from_antoine_comments.diff
2011-06-30 19:26:09pitrousetmessages: + msg139513
2011-06-30 19:12:50vinay.sajipsetmessages: + msg139510
2011-06-30 18:22:22pitrousetmessages: + msg139508
2011-06-27 12:41:06vinay.sajipsetmessages: + msg139262
2011-06-27 08:48:26djcsetnosy: + djc
2011-06-27 08:35:33pitrousetmessages: + msg139247
2011-06-26 19:53:21vinay.sajipsettype: behavior
messages: + msg139204
2011-06-26 19:49:03vinay.sajipsetfiles: + 0feab4e7b27f.diff
2011-06-26 08:23:14grubertsetfiles: + multiple_dump_load_read_write_test.patch

messages: + msg139151
2011-06-25 16:17:25grubertsetfiles: + multiple_dump_load_test.patch
nosy: + grubert
messages: + msg139095

2011-06-25 00:35:35terry.reedysetversions: - Python 3.4
2011-06-24 14:28:15vinay.sajipsetfiles: + ab1c38ffb8d4.diff
2011-06-20 06:58:27vinay.sajipsetfiles: - marshal-patch2.diff
2011-06-20 06:58:15vinay.sajipsetfiles: - marshal-patch.diff
2011-06-20 06:57:27vinay.sajipsetfiles: + tried_renaming.diff
2011-06-20 03:19:02r.david.murraysetnosy: + pitrou
2011-06-19 23:04:53vinay.sajipsetmessages: + msg138672
2011-06-19 22:10:17benjamin.petersonsetmessages: + msg138669
2011-06-19 21:10:21vinay.sajipsetmessages: + msg138665
2011-06-19 20:48:40vinay.sajipsetmessages: + msg138661
2011-06-19 15:45:46benjamin.petersonsetmessages: + msg138637
2011-06-19 15:43:11vinay.sajipsetmessages: + msg138636
2011-06-19 14:54:54benjamin.petersonsetmessages: + msg138631
2011-06-19 11:39:10vinay.sajipsetmessages: + msg138625
2011-06-19 01:02:49benjamin.petersonsetmessages: + msg138624
2011-06-19 00:42:46vinay.sajipsetmessages: + msg138623
2011-06-18 22:59:22benjamin.petersonsetmessages: + msg138618
2011-06-18 22:41:57vinay.sajipsetmessages: + msg138617
2011-06-18 14:42:53benjamin.petersonsetmessages: + msg138585
2011-06-18 08:15:07georg.brandlsetmessages: + msg138571
2011-06-11 15:38:56vinay.sajipsetmessages: + msg138162
2011-06-11 15:24:36vinay.sajipsethgrepos: + hgrepo26
messages: + msg138160
2011-06-10 10:00:00vinay.sajipsetstage: patch review
2011-06-10 09:59:37vinay.sajipsetkeywords: + needs review
files: + marshal-patch2.diff
messages: + msg138070
2011-06-09 19:50:07vinay.sajipsetmessages: + msg138024
2011-06-09 16:36:30vinay.sajipsetmessages: + msg138012
2011-06-09 15:24:18georg.brandlsetmessages: + msg137987
2011-06-09 15:21:41Arfreversetnosy: + Arfrever
2011-06-09 15:00:48barrysetpriority: normal -> release blocker
nosy: + georg.brandl
2011-06-09 14:35:53benjamin.petersonsetnosy: + benjamin.peterson
messages: + msg137973
2011-06-09 13:55:08amaury.forgeotdarcsetmessages: + msg137969
2011-06-09 13:49:09vinay.sajipsetfiles: + marshal-patch.diff
keywords: + patch
messages: + msg137966
2011-06-09 09:31:47amaury.forgeotdarcsetnosy: + amaury.forgeotdarc
messages: + msg137944
2011-06-09 09:10:09vinay.sajipcreate