Message188889
On 5/10/2013 11:46 PM, Stefan Mihaila wrote:
> Changes by Stefan Mihaila <mstefanro@gmail.com>:
>
>
> ----------
> nosy: +mstefanro
>
> _______________________________________
> Python tracker <report@bugs.python.org>
> <http://bugs.python.org/issue17810>
> _______________________________________
>
Hello. I've worked on implementing PEP3154 as part of GSoC2012.
My work is available in a repo at [1].
The blog I've used to report my work is at [2] and contains some useful
information.
Here is a list of features that were implemented as part of GSoC:
* Pickling of very large bytes and strings
* Better pickling of small string and bytes (+ tests)
* Native pickling of sets and frozensets (+ tests)
* Self-referential sets and frozensets (+ tests)
* Implicit memoization (BINPUT is implicit for certain opcodes)
- The argument against this was that pickletools.optimize would
not be able to prevent memoization of objects that are not
referred later. For such situations, a special flag at beginning
could be added, which indicates whether implicit BINPUT is enabled.
This flag could be added as one of the higher-order bits of the
protocol
version. For instance:
PROTO \x04 + BINUNICODE ".."
and
PROTO \x84 + BINUNICODE ".." + BINPUT 1
would be equivalent. Then pickletools.optimize could choose whether
it wants implicit BINPUT or not. Sure, this would complicate
matters and it's
not for me to decide whether it's worth it.
In my midterm report at [3] there are some examples of what a
pickled string
looks in v4 without implicit memoization, and some size comparisons
to v3.
* Pickling of nested globals, methods etc. (+ tests)
* Pickling calls to __new__ with keyword args (+ tests)
* A BAIL_OUT opcode was always outputted when pickling failed, so that
the Pickler and Unpickler can be both run at once on different ends
of a stream. The Pickler could guarantee to always send a
correct pickle on the stream. The Unpickler would never end up hanging
when Pickling failed mid-work.
- At the time, Alexandre suggested this would probably not be a great
idea because it should be the responsibility of the protocol used
to assure some consistency. However, this does not appear to be
a trivial task to achieve. The size of the pickle is not known in
advance, and waiting for the Pickler to complete before sending
the data via stream is not as efficient, because the Unpickler
would not be able to run at the same time.
write and read methods of the stream would have to be wrapped and
some escape sequence used. This would
increase the size of the pickled string for some sort of worst-case
of the escape sequence, probably. My thought was that it would be
beneficial for the average user to have the guarantee that the Pickler
always outputs a correct pickle to a stream, even if it raises an
exception.
* Other minor changes that I can't really remember.
Although I'm sure Alexandre had his good reasons to start the work from
scratch, it would be a shame to waste all this work. The features mentioned
above are working and although the implementation may not be ideal (I don't
have the cpython experience of a regular dev), I'm sure useful bits can be
extracted from it.
Alexandre suggested that I extract bits and post patches, so I have
attached,
for now, support for pickling methods and nested globals (+tests).
I'm willing to do so for some or the rest of the features, should this
be requested
and should I have the necessary time to do so.
[1] https://bitbucket.org/mstefanro/pickle4/
[2] https://pypickle4.wordpress.com/
[3] https://gist.github.com/mstefanro/3086647 |
|
Date |
User |
Action |
Args |
2013-05-11 00:09:08 | mstefanro | set | recipients:
+ mstefanro, rhettinger, pitrou, alexandre.vassalotti, Arfrever, asvetlov, neologix, serhiy.storchaka |
2013-05-11 00:09:06 | mstefanro | link | issue17810 messages |
2013-05-11 00:09:06 | mstefanro | create | |
|