classification
Title: Pickling of methodcaller, attrgetter, and itemgetter
Type: enhancement Stage: resolved
Components: Library (Lib) Versions: Python 3.5
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: serhiy.storchaka Nosy List: Antony.Lee, Jason Curtis, josh.r, pitrou, python-dev, rhettinger, serhiy.storchaka, zach.ware
Priority: normal Keywords: patch

Created on 2014-11-27 06:33 by Antony.Lee, last changed 2016-05-16 19:28 by serhiy.storchaka. This issue is now closed.

Files
File name Uploaded Description Edit
pickle_getter_and_caller.patch josh.r, 2014-11-29 04:06 review
pickle_getter_and_caller2.patch josh.r, 2014-11-29 04:32 review
issue22955.diff zach.ware, 2014-11-29 22:05 josh.r's patch with itemgetter and attrgetter reimplementations review
pickle_getter_and_caller3.patch serhiy.storchaka, 2014-12-14 16:49 review
pickle_getter_and_caller4.patch serhiy.storchaka, 2015-05-16 20:12 review
Messages (23)
msg231752 - (view) Author: Antony Lee (Antony.Lee) * Date: 2014-11-27 06:33
methodcaller and attrgetter objects seem to be picklable, but in fact the pickling is erroneous:

>>> import operator, pickle
>>> pickle.loads(pickle.dumps(operator.methodcaller("foo")))
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: methodcaller needs at least one argument, the method name
>>> pickle.loads(pickle.dumps(operator.attrgetter("foo")))
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: attrgetter expected 1 arguments, got 0

When looking at the pickle disassembly, it seems that the argument to the constructor is indeed not pickled.

>>> import pickletools; pickletools.dis(pickle.dumps(operator.methodcaller("foo")))
    0: \x80 PROTO      3
    2: c    GLOBAL     'operator methodcaller'
   25: q    BINPUT     0
   27: )    EMPTY_TUPLE
   28: \x81 NEWOBJ
   29: q    BINPUT     1
   31: .    STOP
highest protocol among opcodes = 2
msg231768 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2014-11-27 16:19
I think this issue needs different solutions for 3.5 and maintained releases. We can implement the pickling of methodcaller, attrgetter and itemgetter in 3.5 (I agree this is good idea). And it would be good if pickling of these types will raise an exception in maintained releases.
msg231831 - (view) Author: Zachary Ware (zach.ware) * (Python committer) Date: 2014-11-28 21:33
Note that pickling of the pure Python version of methodcaller works as expected:

Python 3.4.2 (default, Nov 20 2014, 12:40:10) 
[GCC 4.8.3] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import sys
>>> sys.modules['_operator'] = None
>>> import operator
>>> import pickle
>>> pickle.loads(pickle.dumps(operator.methodcaller('foo')))
<operator.methodcaller object at 0x7ff869945898>

The pure Python attrgetter and itemgetter don't work due to using functions defined in __init__().

2.7 already raises TypeError on attempts to pickle any of the three.
msg231841 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2014-11-29 00:12
+1 for adding pickling support to Python 3.5.

I don't see much of a need for any revision to 3.4.
msg231848 - (view) Author: Josh Rosenberg (josh.r) * Date: 2014-11-29 02:57
I've made a patch that I believe should cover all three cases, including tests.

In addition to the pickling behavior, I've made two other changes:

1. methodcaller verifies during construction that the name is a string (PyUnicode), and interns it; attrgetter did this already, and I tweaked methodcaller to match for correctness and performance reasons
2. I added proper repr functionality to all three objects. Partially this is just to make it look nicer, but it was also a decent way to spot verify that the pickle/unpickle sequence behaved correctly

Anyone care to review?
msg231849 - (view) Author: Josh Rosenberg (josh.r) * Date: 2014-11-29 03:26
Don't bother reviewing just yet. There is an issue with attrgetter's pickling (which the unit tests caught), and I need to update the pure Python modules to match.
msg231850 - (view) Author: Josh Rosenberg (josh.r) * Date: 2014-11-29 04:06
Okay, this one passes the tests for the built-in module. I'm not sure what's going wrong with the pure Python module. I'm getting the error:

    _pickle.PicklingError: Can't pickle <class 'operator.attrgetter'>: it's not the same object as operator.attrgetter

once for each of the three objects. Anyone recognize this? Is this some weird artifact of the multiple imports required to test both pure Python and C versions of the module that I need to work around, or did I make a mistake somewhere else?
msg231851 - (view) Author: Josh Rosenberg (josh.r) * Date: 2014-11-29 04:32
Ah, solved it (I think). The bootstrapper used to import the Python and C versions of the module leaves sys.modules unpopulated (Does pickle itself may populate it when it finds no module of that name?). I added a setUp method to the unittest class for operator that explicitly sets sys.modules['operator'] to whichever version is being tested at the time so pickle's lookup works as expected. Is that the right solution? New patch uploaded with that change.
msg231872 - (view) Author: Zachary Ware (zach.ware) * (Python committer) Date: 2014-11-29 21:52
I'd prefer to just reimplement itemgetter and attrgetter to make them picklable rather than adding pickling methods to them; see attached patch.

I also posted a few comments, but I just went ahead and addressed them myself in this patch.  I'm not qualified to give the _operator.c changes a proper review, but they look good enough to me if others agree that __reduce__ is the best approach in C.
msg231873 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2014-11-29 22:10
operator.methodcaller is similar to functools.partial which is pickleable and can be used as a sample.

In C implementation some code can be shared between __repr__ and __reduce__ methods.

As for tests, different protocols should be tested. Also should be tested compatibility between C and Python implementations, instances pickled with one implementation should be unpickleable with other implementation. Move pickle tests into new test class.

If add __repr__ methods, they need tests. The restriction of method name type should be tested too.
msg232363 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2014-12-09 10:35
> I'd prefer to just reimplement itemgetter and attrgetter to make 
> them picklable rather than adding pickling methods to them;
> see attached patch.

That isn't the usual approach.  The pickling methods are there for a reason.  I prefer to leave the existing code in a stable state and avoid unnecessary code churn or risk introducing bugs into code that is working correctly and as designed.
msg232364 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2014-12-09 10:40
Please remember that a potential new pickling feature is the least import part of the design of methodcaller, itemgetter, and attrgetter.  Pickle support should be driven by the design rather become a predominant consideration.

One other note:  the OP's original concern has very little to do with these particular objects.  Instead, it is the picking and unpickling tools themselves that tend to have crummy error messages when presented with objects that weren't specially designed with pickle support.
msg232370 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2014-12-09 12:01
> Instead, it is the picking and unpickling tools themselves that tend to have crummy error messages when presented with objects that weren't specially designed with pickle support.

See issue22995 about this.
msg232512 - (view) Author: Zachary Ware (zach.ware) * (Python committer) Date: 2014-12-12 06:36
Serhiy: functools.partial is a somewhat less than ideal comparison.  The pure-Python version is not picklable, the Python and C versions return different things (the Python version is a function returning a function, the C version is a regular class and returns an instance).  Also, both versions make their necessary attributes public anyway, unlike methodcaller.

Raymond: Not necessarily the usual approach, no.  However, I think my reimplementations of the pure-Python itemgetter and attrgetter have a few benefits, namely:
- they're somewhat less complex and thus a bit easier to understand
- they're slightly faster
- they don't require extra pickling methods, which to me just seem like clutter when it's so simple to not need them

Note that I have no intention of reimplementing the C versions: those are much more mature than the Python versions, and would likely require pickling methods anyway.

All that said, I'm not going to fight about it; if I'm overruled, I'm overruled.

Josh: Serhiy's points about needing more tests stand; would you like to add them?  You can use your patch or mine as a base, depending on how you feel about reimplementing the pure-Python (item|attr}getter.  If you use yours, please remember to look through my comments on it.
msg232616 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2014-12-13 17:54
> functools.partial is a somewhat less than ideal comparison.  The pure-Python version is not picklable, the Python and C versions return different things (the Python version is a function returning a function, the C version is a regular class and returns an instance).

Looks as Python version of functools.partial() needs a fix.

Reimplementations of the pure-Python itemgetter and attrgetter to automatically pickleable Python classes have a disadvantage. It makes the pickling incompatible between Python and C versions. This means that itemgetter pickled in CPython will be not unpickleable on Python implementation which don't use C accelerator and vice versa.
msg232617 - (view) Author: Zachary Ware (zach.ware) * (Python committer) Date: 2014-12-13 17:58
Serhiy Storchaka added the comment:
> Reimplementations of the pure-Python itemgetter and attrgetter to
> automatically pickleable Python classes have a disadvantage. It makes
> the pickling incompatible between Python and C versions. This means
> that itemgetter pickled in CPython will be not unpickleable on Python
> implementation which don't use C accelerator and vice versa.

That's a very good point that I hadn't thought about.  Consider my
patch withdrawn.
msg232641 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2014-12-14 16:49
Here is revised Josh's patch. Added tests for consistency between both implementations, fixed inconsistencies and bugs.

I still hesitate about pickling format of methodcaller. First, there is asymmetry between positional and keyword arguments. Second, for now methodcaller is not inheritable, but if it will be in future (as functools.partial is), it would be harder to extend pickling format to support instance attributes.
msg243364 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2015-05-16 20:12
methodcaller with keyword arguments pickled with pickle_getter_and_caller3.patch needs Python 3.5 to unpickle. Following patch pickles it in backward compatible way.
msg243676 - (view) Author: Roundup Robot (python-dev) Date: 2015-05-20 15:29
New changeset 435bc22f39e3 by Serhiy Storchaka in branch 'default':
Issue #22955: attrgetter, itemgetter and methodcaller objects in the operator
https://hg.python.org/cpython/rev/435bc22f39e3
msg243687 - (view) Author: Roundup Robot (python-dev) Date: 2015-05-20 19:03
New changeset c93e5ba1cc20 by Serhiy Storchaka in branch 'default':
Issue #22955: Fixed test_operator. It left Python implementation in
https://hg.python.org/cpython/rev/c93e5ba1cc20
msg243746 - (view) Author: Roundup Robot (python-dev) Date: 2015-05-21 11:20
New changeset 2688655e431a by Serhiy Storchaka in branch 'default':
Issue #22955: Fixed reference leak in attrgetter.repr().
https://hg.python.org/cpython/rev/2688655e431a
msg265718 - (view) Author: Jason Curtis (Jason Curtis) Date: 2016-05-16 18:59
This is still an issue with operator.attrgetter in 3.4.3, even after clearing sys.modules['_operator']:

$ python3
Python 3.4.3 (default, Oct 14 2015, 20:28:29) 
[GCC 4.8.4] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import sys
>>> sys.modules['_operator'] = None
>>> import operator
>>> import pickle
>>> pickle.loads(pickle.dumps(operator.attrgetter("foo")))
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
_pickle.PicklingError: Can't pickle <function attrgetter.__init__.<locals>.func at 0x7f25728d5bf8>: attribute lookup func on operator failed
msg265727 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2016-05-16 19:28
This is new feature in 3.5.
History
Date User Action Args
2016-05-16 19:28:48serhiy.storchakasetmessages: + msg265727
versions: - Python 3.4
2016-05-16 18:59:43Jason Curtissetnosy: + Jason Curtis

messages: + msg265718
versions: + Python 3.4
2015-05-27 08:49:19serhiy.storchakasetstatus: open -> closed
resolution: fixed
stage: patch review -> resolved
2015-05-21 11:20:05python-devsetmessages: + msg243746
2015-05-20 19:03:10python-devsetmessages: + msg243687
2015-05-20 15:29:47python-devsetnosy: + python-dev
messages: + msg243676
2015-05-16 20:12:07serhiy.storchakasetfiles: + pickle_getter_and_caller4.patch

messages: + msg243364
2014-12-14 16:49:30serhiy.storchakasetfiles: + pickle_getter_and_caller3.patch

messages: + msg232641
stage: needs patch -> patch review
2014-12-13 17:58:13zach.waresetmessages: + msg232617
2014-12-13 17:54:41serhiy.storchakasetmessages: + msg232616
2014-12-12 06:36:09zach.waresetmessages: + msg232512
2014-12-09 12:01:46serhiy.storchakasetmessages: + msg232370
2014-12-09 10:40:38rhettingersetmessages: + msg232364
2014-12-09 10:35:27rhettingersetmessages: + msg232363
2014-11-29 22:10:38serhiy.storchakasetmessages: + msg231873
2014-11-29 22:05:20zach.waresetfiles: - bad-issue22955.diff
2014-11-29 22:05:11zach.waresetfiles: + issue22955.diff
2014-11-29 21:52:06zach.waresetfiles: + bad-issue22955.diff

messages: + msg231872
2014-11-29 04:32:35josh.rsetfiles: + pickle_getter_and_caller2.patch

messages: + msg231851
2014-11-29 04:06:52josh.rsetfiles: - pickle_getter_and_caller.patch
2014-11-29 04:06:30josh.rsetfiles: + pickle_getter_and_caller.patch

messages: + msg231850
2014-11-29 03:26:12josh.rsetmessages: + msg231849
2014-11-29 02:58:34josh.rsetversions: - Python 3.4
2014-11-29 02:57:36josh.rsetfiles: + pickle_getter_and_caller.patch
versions: + Python 3.4
nosy: + josh.r

messages: + msg231848

keywords: + patch
2014-11-29 00:12:43rhettingersetnosy: + rhettinger

messages: + msg231841
versions: - Python 3.4
2014-11-28 21:33:51zach.waresetnosy: + zach.ware
title: Pickling of methodcaller and attrgetter -> Pickling of methodcaller, attrgetter, and itemgetter
messages: + msg231831

versions: + Python 3.4
2014-11-27 19:31:46serhiy.storchakasetassignee: serhiy.storchaka
2014-11-27 16:19:51serhiy.storchakasetmessages: + msg231768
2014-11-27 13:59:36pitrousetnosy: + pitrou, serhiy.storchaka

type: enhancement
versions: + Python 3.5, - Python 3.4
2014-11-27 06:43:37rhettingersetstage: needs patch
2014-11-27 06:33:03Antony.Leecreate