This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author stw
Recipients stw
Date 2012-05-10.20:03:52
SpamBayes Score -1.0
Marked as misclassified Yes
Message-id <1336680234.2.0.688892312973.issue14775@psf.upfronthosting.co.za>
In-reply-to
Content
I've found that unpickling a certain kind of dictionary is substantially slower in python 2.7 compared to python 2.6. The dictionary has keys that are tuples of strings - a 1-tuple is enough to see the effect. The problem seems to be caused by garbage collection, as turning it off eliminates the slowdown. Both pickle and cPickle modules are affected.


I've attached two files to demonstrate this. The file 'make_file.py'
creates a dictionary of specified size, with keys containing 1-tuples of random strings. It then dumps the dictionary to a pickle file using a specified pickle module.

The file 'load_file.py' unpickles the file created by 'make_file.py', using a specified pickle module, and prints the time taken. The code can be run with garbage collection either on or off.

The results below are for a dictionary of 200000 entries. Each entry is the time taken in seconds with garbage collection on / garbage collection off. The row headings are the module used to pickle the data, the column headings the module used to unpickle it.


python 2.6, n = 200000

               size    pickle      cPickle
    pickle     4.3M    3.02/2.65   0.786/0.559 
    cPickle    3.4M    2.27/2.04   0.66/0.443 


python 2.7, n = 200000

               size    pickle      cPickle
    pickle     4.3M    10.5/2.67   6.62/0.563 
    cPickle    2.4M    1.45/1.39   0.362/0.325 


When pickle is used to pickle the data, there is a significant slowdown in python 2.7 compared to python 2.6 with garbage collection on. With garbage collection off the times in python 2.7 are essentially identical to those in python 2.6.

When cPickle is used to pickle the data, both unpicklers are faster in python 2.7 than in python 2.6. Presumably the speedup is due to the dictionary optimizations introduced from issue #5670.


Both pickle and cPickle show a slowdown when data pickled in python 2.6 is unpickled in python 2.7:


pickled in python 2.6, unpickled in python 2.7, n = 200000

                      size    pickle (2.7)    cPickle (2.7)
    pickle (2.6)      4.3M    10.4/2.66       6.64/0.56 
    cPickle (2.6)     3.4M    8.73/2.08       6.1/0.452 


I don't know enough about the internals of the pickle modules or garbage collector to offer an explanation/fix. The list of optimizations for python 2.7 indicates changes to both pickle modules (issues #5670 and #5084) and the garbage collector (issues #4074 and #4688). It seems possible that the slowdown is the result of some interaction between these changes.


Further notes:

1. System details: python 2.6.5 and python 2.7.3 on Ubuntu 10.04, 1.73GHz Pentium M processor.

2. Only pickle files created with protocols 1 and 2 are affected. Pickling with protocol 0 gives similar timings on python 2.6 and 2.7.

3. The fact that the dictionary's keys are tuples is relevant, although the length of the tuple is not. Unpickling a dictionary whose keys are strings does not show any slowdown.
History
Date User Action Args
2012-05-10 20:03:54stwsetrecipients: + stw
2012-05-10 20:03:54stwsetmessageid: <1336680234.2.0.688892312973.issue14775@psf.upfronthosting.co.za>
2012-05-10 20:03:53stwlinkissue14775 messages
2012-05-10 20:03:53stwcreate