This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: __reduce__ not being called in dervied extension class from datetime.datetime
Type: behavior Stage:
Components: IO Versions: Python 3.6
process
Status: open Resolution:
Dependencies: Superseder:
Assigned To: Nosy List: Jeff Reback, belopolsky, p-ganssle, serhiy.storchaka, yselivanov
Priority: normal Keywords:

Created on 2016-11-18 01:31 by Jeff Reback, last changed 2022-04-11 14:58 by admin.

Messages (7)
msg281070 - (view) Author: Jeff Reback (Jeff Reback) Date: 2016-11-18 01:31
xref to https://github.com/pandas-dev/pandas/issues/14679.

pandas has had a cython extension class to datetime.datetime for quite some time. A simple __reduce__ is defined.

    def __reduce__(self):
        object_state = self.value, self.freq, self.tzinfo
        print(object_state)
        return (Timestamp, object_state)

In 3.5.2:

Python 3.5.2 |Continuum Analytics, Inc.| (default, Jul  2 2016, 17:52:12) 
[GCC 4.2.1 Compatible Apple LLVM 4.2 (clang-425.0.28)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import pandas as pd
>>> import pickle
>>> pickle.dumps(pd.Timestamp('20130101'))
(1356998400000000000, None, None)
b'\x80\x03cpandas.tslib\nTimestamp\nq\x00\x8a\x08\x00\x00\xc6\xe8\xda\x06\xd5\x12NN\x87q\x01Rq\x02.'

But in 3.6.03b

Python 3.6.0b3 | packaged by conda-forge | (default, Nov  2 2016, 03:28:12) 
[GCC 4.2.1 Compatible Apple LLVM 6.0 (clang-600.0.54)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import pandas as pd
>>> import pickle
>>> pickle.dumps(pd.Timestamp('20130101'))
b'\x80\x03cpandas.tslib\nTimestamp\nq\x00C\n\x07\xdd\x01\x01\x00\x00\x00\x00\x00\x00q\x01\x85q\x02Rq\x03.'


So it appears __reduce__ is no longer called at all (I tried defining __getstate__, __getnewargs__ as well, but to no avail). Instead it looks like datetime.datetime.__reduce__ (well a c function is actually called).

Link to the codebase. https://github.com/pandas-dev/pandas/blob/master/pandas/tslib.pyx#L490
msg281077 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2016-11-18 06:20
That is because some datetime classes now define __reduce_ex__ instead of __reduce__ (see issue24773). It has higher priority.
msg281099 - (view) Author: Jeff Reback (Jeff Reback) Date: 2016-11-18 11:31
ok thanks for the info. fixed in pandas here: https://github.com/pandas-dev/pandas/pull/14689

is this documented in the whatsnew?
msg281104 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2016-11-18 12:04
It seems to me that this is not documented. I'm even not sure that this change is compatible. Additional bit is saved only with protocol 4. The default protocol is 3, thus your should explicitly specify it. But protocol 4 is supported since 3.4. It seems to me that if you pickle the datetime object with the fold bit set with protocol 4, you could get invalid result when unpickle it in 3.4 and 3.5. Yet one doubtful detail is that the fold bit is added to the hour bit in datetime.time, but to the month field in datetime.datetime.

In any case __reduce_ex__ has the highest priority. After implementing it your can be sure that it would be used.
msg281112 - (view) Author: Alexander Belopolsky (belopolsky) * (Python committer) Date: 2016-11-18 12:56
> On Nov 18, 2016, at 7:04 AM, Serhiy Storchaka <report@bugs.python.org> wrote:
> 
> Yet one doubtful detail is that the fold bit is added to the hour bit in datetime.time, but to the month field in datetime.datetime.

The reason behind this choice is documented in PEP 495: "We picked these bytes because they are the only bytes that are checked by the current unpickle code. Thus loading post-PEP fold=1 pickles in a pre-PEP Python will result in an exception rather than an instance with out of range components."

https://www.python.org/dev/peps/pep-0495/#pickles
msg281119 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2016-11-18 13:49
Thank you for your explanation Alexander. It looks reasonable.

But there are two drawbacks.

1. By default the fold bit is ignored. That means that you lost it when use multiprocessing or other library that uses default pickle protocol.

2. It is not easy to make a workaround for pickles with fold=1 in pre-PEP Python.
msg281126 - (view) Author: Alexander Belopolsky (belopolsky) * (Python committer) Date: 2016-11-18 14:57
> On Nov 18, 2016, at 8:49 AM, Serhiy Storchaka <report@bugs.python.org> wrote:
> 
> But there are two drawbacks.

It is not too late to make improvements.  If you have specific proposals - please bring them up on  the mailing list.
History
Date User Action Args
2022-04-11 14:58:39adminsetgithub: 72916
2018-07-05 15:57:32p-gansslesetnosy: + p-ganssle
2016-11-18 14:57:16belopolskysetmessages: + msg281126
2016-11-18 13:49:43serhiy.storchakasetmessages: + msg281119
2016-11-18 12:56:13belopolskysetmessages: + msg281112
2016-11-18 12:04:13serhiy.storchakasetnosy: + yselivanov
messages: + msg281104
2016-11-18 11:31:12Jeff Rebacksetmessages: + msg281099
2016-11-18 06:20:05serhiy.storchakasetnosy: + serhiy.storchaka, belopolsky
messages: + msg281077
2016-11-18 01:31:03Jeff Rebackcreate