Message 167215 - Python tracker

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author	thomie
Recipients	rhettinger, thomie
Date	2012-08-02.13:44:42
SpamBayes Score	-1.0
Marked as misclassified	Yes
Message-id	<1343915086.31.0.252328658724.issue15535@psf.upfronthosting.co.za>
In-reply-to

Content
Pickling a namedtuple Point(x=10, y=20, z=30) in Python 2.7.2 with protocol level 0 would result in something like the following output: ccopy_reg _reconstructor p0 (c__main__ Point p1 c__builtin__ tuple p2 (I10 I20 I30 tp3 tp4 Rp5 . In Python 2.7.3, the same namedtuple dumps to: ccopy_reg _reconstructor p0 (c__main__ Point p1 c__builtin__ tuple p2 (I10 I20 I30 tp3 tp4 Rp5 ccollections OrderedDict p6 ((lp7 (lp8 S'x' p9 aI10 aa(lp10 S'y' p11 aI20 aa(lp12 S'z' p13 aI30 aatp14 Rp15 b. Note the OrderedDictionary at the end. All data, the field names and the values, are duplicated, which can result in very large pickled files when using nested namedtuples. Loading both dumps with CPython 2.7.3 works. This is why this bug was not noticed any earlier. Loading the second dump with CPython or pypy 2.7.2 does not work however. CPython 2.7.3 broke forward compatibility. Attached is a patch with a fix. The patch makes pickled namedtuples forward compatibile with 2.7.2. This patch does not break backward compability with 2.7.3, since the extra OrderedDict data contained the same information as the tuple. Introduced: http://hg.python.org/cpython/diff/26d5f022eb1a/Lib/collections.py Also relevant: http://bugs.python.org/issue3065

Pickling a namedtuple Point(x=10, y=20, z=30) in Python 2.7.2 with protocol level 0 would result in something like the following output:

  ccopy_reg
  _reconstructor
  p0
  (c__main__
  Point
  p1
  c__builtin__
  tuple
  p2
  (I10
  I20
  I30
  tp3
  tp4
  Rp5
  .

In Python 2.7.3, the same namedtuple dumps to:

  ccopy_reg
  _reconstructor
  p0
  (c__main__
  Point
  p1
  c__builtin__
  tuple
  p2
  (I10
  I20
  I30
  tp3
  tp4
  Rp5
  ccollections
  OrderedDict
  p6
  ((lp7
  (lp8
  S'x'
  p9
  aI10
  aa(lp10
  S'y'
  p11
  aI20
  aa(lp12
  S'z'
  p13
  aI30
  aatp14
  Rp15
  b.

Note the OrderedDictionary at the end. All data, the field names and the values, are duplicated, which can result in very large pickled files when using nested namedtuples.

Loading both dumps with CPython 2.7.3 works. This is why this bug was not noticed any earlier. Loading the second dump with CPython or pypy 2.7.2 does not work however. CPython 2.7.3 broke forward compatibility.

Attached is a patch with a fix. The patch makes pickled namedtuples forward compatibile with 2.7.2. This patch does not break backward compability with 2.7.3, since the extra OrderedDict data contained the same information as the tuple. 

Introduced:
http://hg.python.org/cpython/diff/26d5f022eb1a/Lib/collections.py

Also relevant:
http://bugs.python.org/issue3065

History
Date	User	Action	Args
2012-08-02 13:44:46	thomie	set	recipients: + thomie, rhettinger
2012-08-02 13:44:46	thomie	set	messageid: <1343915086.31.0.252328658724.issue15535@psf.upfronthosting.co.za>
2012-08-02 13:44:45	thomie	link	issue15535 messages
2012-08-02 13:44:44	thomie	create