This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author pierreglaser
Recipients alexandre.vassalotti, pierreglaser, pitrou
Date 2019-02-05.14:40:22
SpamBayes Score -1.0
Marked as misclassified Yes
Message-id <1549377622.96.0.737929222958.issue35900@roundup.psfhosted.org>
In-reply-to
Content
Pickler objects provide a dispatch_table attribute, where the user can specify
custom saving functions depending on the object-to-be-saved type. However, for
performance purposes, this table is predated (in the C implementation only) by
a hardcoded switch that will take care of the saving for many built-in types,
without a lookup in the dispatch_table.

Especially, it is not possible to define custom saving methods for functions
and classes, although the current default (save_global, that saves an object
using its module attribute path) is likely to fail at pickling or unpickling
time in many cases.

The aforementioned failures exist on purpose in the standard library (as a way
to allow for the serialization of functions accessible from non-dynamic (*)
modules only). However, there exist cases where serializing functions from
dynamic modules matter. These cases are currently handled thanks the
cloudpickle module (https://github.com/cloudpipe/cloudpickle), that is used by
many distributed data-science frameworks such as pyspark, ray and dask. For the
reasons explained above, cloudpickle's Pickler subclass derives from the python
Pickler class instead of its C class, which severely harms its performance.

While prototyping with Antoine Pitrou, we came to the conclusion that a hook
could be added to the C Pickler class, in which an optional user-defined
callback would be invoked (if defined) when saving functions and classes
instead of the traditional save_global. Here is a patch so that we can have
something concrete of which to discuss.

(*) dynamic module are modules that cannot be imported by name as traditional
    python file backed module. Examples include the __main__ module that can be
    populated dynamically by running a script or by a, user writing code in a
    python shell / jupyter notebook.
History
Date User Action Args
2019-02-05 14:40:25pierreglasersetrecipients: + pierreglaser, pitrou, alexandre.vassalotti
2019-02-05 14:40:22pierreglasersetmessageid: <1549377622.96.0.737929222958.issue35900@roundup.psfhosted.org>
2019-02-05 14:40:22pierreglaserlinkissue35900 messages
2019-02-05 14:40:22pierreglasercreate