This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: cPickle module doesn't work with universal line endings
Type: behavior Stage: resolved
Components: Documentation, IO, Library (Lib) Versions: Python 2.7
process
Status: closed Resolution: out of date
Dependencies: 616013 Superseder:
Assigned To: Nosy List: ajaksu2, belopolsky, benjamin.peterson, ggenellina, gjb1002, jackjansen, loewis, pitrou, ronaldoussoren, serhiy.storchaka
Priority: normal Keywords:

Created on 2007-05-23 16:42 by gjb1002, last changed 2022-04-11 14:56 by admin. This issue is now closed.

Files
File name Uploaded Description Edit
pickletest.py gjb1002, 2007-05-23 16:42 Test case
pickletest_py3k.py ajaksu2, 2009-05-12 13:06 Example script adapted to py3k
Messages (16)
msg32115 - (view) Author: Geoffrey Bache (gjb1002) Date: 2007-05-23 16:42
On UNIX, I cannot read pickle files created on Windows using the cPickle module, even if I open the file with universal line endings.

It works fine with the pickle module but is of course slower (and I have to read lots of them)

I attach a test case that pickles and unpickles an smptlib.SMTP object, converting the file to DOS format in between. There is nothing special about SMTP, you can use any object at all in a different module. 

On my system (RHEL4 with Python 2.4.3) I get the following output:

portmoller : pickletest.py cPickle
unix2dos: converting file dump to DOS format ...
Traceback (most recent call last):
  File "pickletest.py", line 14, in ?
    print load(readFile)
ImportError: No module named smtplib
portmoller : pickletest.py pickle
unix2dos: converting file dump to DOS format ...
<smtplib.SMTP instance at 0xb7ea350c>
msg32116 - (view) Author: Gabriel Genellina (ggenellina) Date: 2007-05-25 09:00
Please try again with this modified version. I think you will see that Python is trying to import "smtplib\r"
On Windows, trying to read a pickle file with MAC line endings gives a different error:
cPickle.UnpicklingError: pickle data was truncated

It seems that cPickle support for protocol 0 is broken. If you can, try to use the higher, binary, protocols, they don't have this problem. Even if you must use protocol 0, opening the file always in binary mode should not have this problem.
msg32117 - (view) Author: Gabriel Genellina (ggenellina) Date: 2007-05-25 09:04
I don't see any "Attach" button...
Just add these lines near the top of the test script:

original__import = __import__
def myimport(name, *args):
  print "import",repr(name)
  return original__import(name,*args)
  #end myimport
__builtins__.__import__ = myimport
msg32118 - (view) Author: Gabriel Genellina (ggenellina) Date: 2007-05-25 10:29
The culprit is cPickle.c; it takes certain shortcuts for read() and readline() depending on which type of file you pass in.
For a true file object, it uses its own implementation for those two methods, ignoring the file mode.

But it appears that there is NO WAY universal line endings could work if the pickle contains any unicode object. The pickle format for Unicode quotes any \n but *not* \r so the unpickler cannot determine, when it sees a "\r", if it is a MAC end-of-line or an embedded "\r".
So, the only safe end-of-line character for a pickle using protocol 0 is "\n", and that means that the file must be written in binary mode.
(This may also indicate that you cannot read unicode objects with embedded \r in a MAC using protocol 0, but I don't have a MAC to test it).

So, until this is fixed (either the module or the documentation), one should forget about universal line endings and write all pickle files as binary. (This way ALL lines end in \n and it should work fine on all platforms)
msg32119 - (view) Author: Geoffrey Bache (gjb1002) Date: 2007-05-25 17:24
Yes, I'm sure Python is trying to import "smtplib\r".

For various reasons I need to use protocol 0: not least because I use the pickle files as test data and it's much easier to administer a load of text files than a load of binary files.

I will experiment with reading the files in binary mode on Monday and get back to you. My current workaround is to do loads(file.read()) instead of load(file) which I guess is a performance penalty. Any idea whether this is likely to be slower than just using the pickle module? (I haven't tested this)
msg32120 - (view) Author: Martin v. Löwis (loewis) * (Python committer) Date: 2007-05-29 05:14
Jack, can you take a look? If not, please unassign.
msg32121 - (view) Author: Ronald Oussoren (ronaldoussoren) * (Python committer) Date: 2007-07-12 16:24
I can confirm that this is problem is present in python 2.5 (current svn) running on osx 10.4.10. Given the code of cPickle it is rather amazing that this script does work correctly on a linux system, as gagenellina noted cPickle shortcuts reads from real file objects and completely ignores universal newlines while doing so.

IMHO Fixing this requires replicating the universal newline code in cPickle. 
msg87619 - (view) Author: Daniel Diniz (ajaksu2) * (Python triager) Date: 2009-05-12 13:06
Confirmed in trunk and py3k.
msg87621 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2009-05-12 13:17
Why would use a file in universal line endings mode for saving/loading
pickles? Pickles are binary data (even if version 0 pickles happens to
be human-readable), so you should open the files in binary mode (either
"rb" or "wb").
msg87622 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2009-05-12 13:20
Also, I don't understand how you confirmed this bug under py3k. Text
files under py3k forbid bytes input, which is what pickle produces:

>>> pickle.dump([], sys.stdout)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/antoine/py3k/__svn__/Lib/pickle.py", line 1333, in dump
    Pickler(file, protocol).dump(obj)
TypeError: write() argument 1 must be str, not bytes
msg110352 - (view) Author: Alexander Belopolsky (belopolsky) * (Python committer) Date: 2010-07-15 06:55
This does not look like a valid bug to me.  OP does not show that pickle files are different on different systems, he mangles pickle file with unix2dos instead.  This would certainly produce an invalid pickle because pickle format requires '\n' and no other character as an opcode terminator.

If incompatible pickle files were produced on windows, most likely that was because the data was written in files opened in text rather than binary mode.
msg110355 - (view) Author: Ronald Oussoren (ronaldoussoren) * (Python committer) Date: 2010-07-15 07:50
Antoine, to answer your question about universal newlines in pickle in msg87622. The pickle.py docsstrings in 2.7+ contain the following text (amongst others): 

        The optional protocol argument tells the pickler to use the
        given protocol; supported protocols are 0, 1, 2.  The default
        protocol is 0, to be backwards compatible.  (Protocol 0 is the
        only protocol that can be written to a file opened in text
        mode and read back successfully.  When using a protocol higher
        than 0, make sure the file is opened in binary mode, both when
        pickling and unpickling.)

This clearly indicates that protocol 0 is supposed to compatible with text-mode files. That would mean this issue probably is not invalid, the documentation above implies that a pickle file written in text mode on Windows should be readable on a Unix system.

That said, I'd advise anyone to use the highest possible protocol because higher protocol levels are more efficient and better support newstyle classes.
msg110373 - (view) Author: Alexander Belopolsky (belopolsky) * (Python committer) Date: 2010-07-15 15:41
> The pickle.py docsstrings in 2.7+ contain the following text
> (amongst others): 
>
>       .. Protocol 0 is the
>       only protocol that can be written to a file opened in text
>       mode and read back successfully.

Hmm, indeed.  The ReST documentation also has the following note:

"""
Note: Be sure to always open pickle files created with protocols >= 1 in binary mode. For the old ASCII-based pickle protocol 0 you can use either text mode or binary mode as long as you stay consistent.
"""

but as Gabriel mentioned above, this should be qualified by at least adding unless pickle contains unicode strings with embedded '\r' on platforms that use '\r' as a part of its end of line sequence.

I don't think changing the way unicode is pickled is an option.  Fixing this aspect of cPickle to behave more like pickle.py given the number of other differences does not look like a good use of developer's time.

I think this is the case were existing behavior should just be better documented.  See also issue616013.
msg220397 - (view) Author: Mark Lawrence (BreamoreBoy) * Date: 2014-06-12 23:11
Can this be closed as issue616013 was?
msg237004 - (view) Author: Mark Lawrence (BreamoreBoy) * Date: 2015-03-02 01:47
msg185431 from  #616013 states "Three years later, I don't think anyone is interested in documenting the outdated cPickle." so I believe this should suffer the same fate.
msg370454 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2020-05-31 13:39
Python 2.7 is no longer supported.
History
Date User Action Args
2022-04-11 14:56:24adminsetgithub: 44989
2020-05-31 13:39:09serhiy.storchakasetstatus: open -> closed

nosy: + serhiy.storchaka
messages: + msg370454

resolution: out of date
stage: test needed -> resolved
2019-03-15 22:35:34BreamoreBoysetnosy: - BreamoreBoy
2015-03-02 01:47:14BreamoreBoysetmessages: + msg237004
2014-06-12 23:11:29BreamoreBoysetnosy: + BreamoreBoy
messages: + msg220397
2010-07-15 15:41:30belopolskysetassignee: belopolsky ->
dependencies: + cPickle documentation incomplete
components: + Documentation
versions: + Python 2.7, - Python 2.6
nosy: loewis, jackjansen, ronaldoussoren, gjb1002, belopolsky, ggenellina, pitrou, ajaksu2, benjamin.peterson
messages: + msg110373
resolution: not a bug -> (no value)
2010-07-15 07:50:22ronaldoussorensetstatus: pending -> open

messages: + msg110355
2010-07-15 06:55:36belopolskysetstatus: open -> pending

nosy: + belopolsky
messages: + msg110352

assignee: belopolsky
resolution: not a bug
2009-05-12 13:20:14pitrousetmessages: + msg87622
2009-05-12 13:17:19pitrousetmessages: + msg87621
versions: - Python 3.1
2009-05-12 13:06:46ajaksu2setfiles: + pickletest_py3k.py

type: behavior
components: + IO
versions: + Python 2.6, Python 3.1, - Python 2.4
nosy: + benjamin.peterson, pitrou, ajaksu2

messages: + msg87619
stage: test needed
2008-05-03 09:28:06ronaldoussorensetassignee: jackjansen -> (no value)
2007-05-23 16:42:24gjb1002create