classification
Title: Improve pickle format for timezone aware datetime instances
Type: enhancement Stage: resolved
Components: Versions: Python 3.6
process
Status: closed Resolution: rejected
Dependencies: 9183 Superseder:
Assigned To: belopolsky Nosy List: alexandre.vassalotti, belopolsky, fdrake, mark.dickinson, ned.deily, pitrou, python-dev, serhiy.storchaka, tim.peters, vstinner
Priority: normal Keywords: patch

Created on 2010-06-21 20:15 by belopolsky, last changed 2017-03-06 17:30 by belopolsky. This issue is now closed.

Files
File name Uploaded Description Edit
issue9051.diff belopolsky, 2010-06-22 03:26 review
issue9051-utc-pickle-proto.diff belopolsky, 2014-06-30 01:26 review
timezone_test_pickle.patch serhiy.storchaka, 2015-10-24 09:58 review
Messages (19)
msg108313 - (view) Author: Alexander Belopolsky (belopolsky) * (Python committer) Date: 2010-06-21 20:15
>>> s = pickle.dumps(timezone.utc)
>>> pickle.loads(s)
Traceback (most recent call last):
 ..
TypeError: ("Required argument 'offset' (pos 1) not found", <class 'datetime.timezone'>, ())
msg108408 - (view) Author: Alexander Belopolsky (belopolsky) * (Python committer) Date: 2010-06-22 18:39
Python version is fixed in sandbox and committed in r82154.
msg108492 - (view) Author: Alexander Belopolsky (belopolsky) * (Python committer) Date: 2010-06-23 22:20
Committed in r82184.  Leaving the issue open pending a more thorough review of pickling in datetime module.
msg108608 - (view) Author: Alexander Belopolsky (belopolsky) * (Python committer) Date: 2010-06-25 15:55
The datetime module provides compact pickled representation for date, datetime, time and timedelta instances:

type: size
date: 34
datetime: 44
time: 36
timedelta: 37


On the other hand, current pickle size for timezone is 64 and the size of an aware datetime instance is 105 bytes.

Since stability is important for pickle format, the best format should be developed before the first release to include timezone class.
msg108611 - (view) Author: Fred Drake (fdrake) (Python committer) Date: 2010-06-25 16:56
As part of this, we should ensure references to common timezones, like
UTC, only create references to a single instance rather than filling
memory with multiple instances.

One consequence of this is that shared instances should probably be immutable.
msg108619 - (view) Author: Alexander Belopolsky (belopolsky) * (Python committer) Date: 2010-06-25 19:29
I am attaching a python prototype implementing interned UTC instance pickling.  The patch is against sandbox revision r82218 of datetime.py.

Note that the pickling protocol requires that an instance or factory function is defined at the module level.

The pickle size saving is substantial:


>>> len(dumps(datetime.now(timezone.utc)))
61
>>> len(dumps(datetime.now(timezone.min)))
163

but there is still room for improvement:

>>> len(dumps(datetime.now()))
44

I do feel, however, that further improvements will see diminishing returns.
msg221920 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2014-06-29 23:53
Your latest patch doesn't have a review link. Would you like to regenerate it against the latest default?
msg221933 - (view) Author: Alexander Belopolsky (belopolsky) * (Python committer) Date: 2014-06-30 01:26
I updated the patch.  Please note that it only includes python code, so you need to suppress _datetime acceleration to test:

>>> import sys
>>> sys.modules['_datetime'] = None
>>> from pickle import *
>>> from datetime import *
>>> dumps(timezone.utc)
b'\x80\x03cdatetime\n_utc\nq\x00.'
msg221934 - (view) Author: Alexander Belopolsky (belopolsky) * (Python committer) Date: 2014-06-30 01:29
Hmm.  Still no "review" link.  Is there something special that I need to do to get it?  Use different name, maybe?
msg221935 - (view) Author: Ned Deily (ned.deily) * (Python committer) Date: 2014-06-30 01:33
FWIW, I see review links for both of your files: all the way over to the right in the Files section.
msg221936 - (view) Author: Alexander Belopolsky (belopolsky) * (Python committer) Date: 2014-06-30 01:37
> I see review links for both of your files

Now I do too.  I guess they just take time to appear.  Note that I unlinked the file that Antoine complained about.
msg251342 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2015-09-22 19:16
For stability it is better to use public name 'timezone.utc' instead of '_utc'. Dotted names now are supported with all protocols.

> but there is still room for improvement:

Don't worry about this. When you pickle a number of datetime objects with the same timezone, the timezone is pickled only once and for all other datetime objects only a reference is used.
msg251668 - (view) Author: Alexander Belopolsky (belopolsky) * (Python committer) Date: 2015-09-26 22:54
> For stability it is better to use public name 'timezone.utc' instead of '_utc'.

Can you elaborate on the "stability" consideration?

I would like to revisit this issue since we will have some changes made to datetime pickles in the context of PEP 495.
msg251677 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2015-09-27 03:12
For now "_utc" is not a part of API. When use it in pickling, you implicitly make it a part of API, and should support it forever (and force other Python implementations to support it). As far as it looks as implementation detail, there is a risk that the "_utc" variable can be renamed and this will break pickle compatibility.
msg251812 - (view) Author: Alexander Belopolsky (belopolsky) * (Python committer) Date: 2015-09-29 02:23
> there is a risk that the "_utc" variable can be renamed
> and this will break pickle compatibility.

I think we will still need to maintain _utc global indefinitely to keep the old pickles readable.

On the other hand, it looks like support for qualnames is only available with pickle protocol 4, so I don't see any downside from from storing the full qualname other than an 8-byte increase in the size of the pickle.

I guess my position on this will be +0:  I'll accept a patch if someone will submit it, but I am unlikely to do it myself.
msg251814 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2015-09-29 02:58
> I think we will still need to maintain _utc global indefinitely to keep the old pickles readable.

We have no need to maintain _utc global indefinitely if don't add it.

> On the other hand, it looks like support for qualnames is only available with pickle protocol 4,

No, support for qualnames now is available with all pickle protocols.
msg253402 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2015-10-24 09:58
Following patch adds tests for timezone pickling and copying. Note, that timezone.utc identity is preserved.
msg254721 - (view) Author: Roundup Robot (python-dev) (Python triager) Date: 2015-11-16 09:21
New changeset e48da8f01316 by Serhiy Storchaka in branch '3.4':
Issue #9051: Added tests for pickling and copying the timezone objects.
https://hg.python.org/cpython/rev/e48da8f01316

New changeset 634905e9628d by Serhiy Storchaka in branch '3.5':
Issue #9051: Added tests for pickling and copying the timezone objects.
https://hg.python.org/cpython/rev/634905e9628d

New changeset 9c31544d6a66 by Serhiy Storchaka in branch 'default':
Issue #9051: Added tests for pickling and copying the timezone objects.
https://hg.python.org/cpython/rev/9c31544d6a66
msg254724 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2015-11-16 09:53
>>> import pickle, pickletools, datetime
>>> len(pickle.dumps([datetime.datetime.utcfromtimestamp(i) for i in range(10000)], 3)) / 10000
30.9283
>>> len(pickle.dumps([datetime.datetime.fromtimestamp(i, datetime.timezone.utc) for i in range(10000)], 3)) / 10000
32.936
>>> len(pickle.dumps([datetime.datetime.utcfromtimestamp(i) for i in range(10000)], 4)) / 10000
19.0074
>>> len(pickle.dumps([datetime.datetime.fromtimestamp(i, datetime.timezone.utc) for i in range(10000)], 4)) / 10000
21.0144

In best case the UTC timezone costs only 2 bytes per datetime instance (BINGET + 1-byte index). In worst case it can cost 5 bytes (LONG_BINGET + 4-bytes index). Saving few bytes for single timezone instance has significant effect only for small pickled data. But in this case it is more advantageous to save a datetime just as a timestamp. I suggest to close this issue.
History
Date User Action Args
2017-03-06 17:30:07belopolskysetstatus: pending -> closed
resolution: rejected
stage: patch review -> resolved
2017-03-06 13:17:56serhiy.storchakasetstatus: open -> pending
2015-11-16 09:54:00serhiy.storchakasetmessages: + msg254724
2015-11-16 09:21:18python-devsetnosy: + python-dev
messages: + msg254721
2015-10-24 09:58:03serhiy.storchakasetfiles: + timezone_test_pickle.patch

messages: + msg253402
stage: needs patch -> patch review
2015-09-29 02:58:17serhiy.storchakasetmessages: + msg251814
2015-09-29 02:24:12belopolskysetnosy: + tim.peters
2015-09-29 02:23:56belopolskysetmessages: + msg251812
2015-09-27 03:12:50serhiy.storchakasetmessages: + msg251677
2015-09-26 22:54:31belopolskysetmessages: + msg251668
2015-09-22 19:16:14serhiy.storchakasetversions: + Python 3.6, - Python 3.5
nosy: + serhiy.storchaka

messages: + msg251342

type: behavior -> enhancement
2014-07-01 05:35:24rhettingersettitle: Improve pickle format for aware datetime instances -> Improve pickle format for timezone aware datetime instances
2014-06-30 01:37:16belopolskysetmessages: + msg221936
2014-06-30 01:33:55ned.deilysetnosy: + ned.deily
messages: + msg221935
2014-06-30 01:29:39belopolskysetversions: + Python 3.5, - Python 3.3
2014-06-30 01:29:20belopolskysetmessages: + msg221934
2014-06-30 01:27:52belopolskysetfiles: - issue9051-utc-pickle-proto.diff
2014-06-30 01:26:54belopolskysetfiles: + issue9051-utc-pickle-proto.diff

messages: + msg221933
2014-06-29 23:53:43pitrousetmessages: + msg221920
2014-06-29 20:08:53belopolskysetnosy: + pitrou, vstinner, alexandre.vassalotti
2011-01-11 15:16:45belopolskysetnosy: fdrake, mark.dickinson, belopolsky
versions: + Python 3.3, - Python 3.2
2010-07-28 18:01:38belopolskysetdependencies: + Intern UTC timezone
2010-06-25 19:29:17belopolskysetfiles: + issue9051-utc-pickle-proto.diff
keywords: + patch
messages: + msg108619
2010-06-25 16:56:49fdrakesetmessages: + msg108611
2010-06-25 15:55:07belopolskysetpriority: low -> normal

nosy: + fdrake
title: Cannot pickle timezone instances -> Improve pickle format for aware datetime instances
messages: + msg108608

stage: patch review -> needs patch
2010-06-23 22:20:32belopolskysetpriority: normal -> low
keywords: - patch, easy
messages: + msg108492
2010-06-22 18:39:06belopolskysetmessages: + msg108408
2010-06-22 03:26:51belopolskysetfiles: + issue9051.diff
2010-06-22 03:26:31belopolskysetfiles: - issue8455.diff
2010-06-22 03:26:11belopolskysetkeywords: + easy, patch
nosy: + mark.dickinson

files: + issue8455.diff
stage: test needed -> patch review
2010-06-21 20:15:07belopolskycreate