classification
Title: Integrate pickle protocol version 4 GSoC work by Stefan Mihaila
Type: enhancement Stage: resolved
Components: Library (Lib) Versions: Python 3.4
process
Status: closed Resolution: out of date
Dependencies: Superseder: Implement PEP 3154 (pickle protocol 4)
View: 17810
Assigned To: alexandre.vassalotti Nosy List: alexandre.vassalotti, asvetlov, jcea, mstefanro, pitrou, serhiy.storchaka
Priority: normal Keywords: gsoc, patch

Created on 2012-08-13 20:48 by alexandre.vassalotti, last changed 2013-05-11 11:12 by mstefanro. This issue is now closed.

Files
File name Uploaded Description Edit
pickle4.diff alexandre.vassalotti, 2012-08-14 02:42 review
pickle4-2.diff alexandre.vassalotti, 2012-08-17 22:40 review
d0c3a8d4947a.diff mstefanro, 2013-05-11 11:11 review
Repositories containing patches
https://bitbucket.org/mstefanro/pickle4
Messages (15)
msg168144 - (view) Author: Alexandre Vassalotti (alexandre.vassalotti) * (Python committer) Date: 2012-08-13 20:48
Stefan Mihaila has been working on the implementation of PEP 3154, plus some other enhancements. His work is pretty complete and ready to be reviewed. I will do my best to finish a thorough review of his changes by the end of next week.
msg168175 - (view) Author: Alexandre Vassalotti (alexandre.vassalotti) * (Python committer) Date: 2012-08-14 05:34
I get warnings when compiling with the patch:


/home/avassalotti/pickle4/Modules/_pickle.c: In function ‘save_global_binary’:
/home/avassalotti/pickle4/Modules/_pickle.c:2952: warning: pointer targets in passing argument 2 of ‘_Pickler_Write’ differ in signedness
/home/avassalotti/pickle4/Modules/_pickle.c:872: note: expected ‘const char *’ but argument is of type ‘unsigned char *’
/home/avassalotti/pickle4/Modules/_pickle.c:2956: warning: pointer targets in passing argument 2 of ‘_Pickler_Write’ differ in signedness
/home/avassalotti/pickle4/Modules/_pickle.c:872: note: expected ‘const char *’ but argument is of type ‘unsigned char *’
/home/avassalotti/pickle4/Modules/_pickle.c:2960: warning: pointer targets in passing argument 2 of ‘_Pickler_Write’ differ in signedness
/home/avassalotti/pickle4/Modules/_pickle.c:872: note: expected ‘const char *’ but argument is of type ‘unsigned char *’
/home/avassalotti/pickle4/Modules/_pickle.c:2975: warning: pointer targets in passing argument 2 of ‘_Pickler_Write’ differ in signedness
/home/avassalotti/pickle4/Modules/_pickle.c:872: note: expected ‘const char *’ but argument is of type ‘unsigned char *’
/home/avassalotti/pickle4/Modules/_pickle.c:2979: warning: pointer targets in passing argument 2 of ‘_Pickler_Write’ differ in signedness
/home/avassalotti/pickle4/Modules/_pickle.c:872: note: expected ‘const char *’ but argument is of type ‘unsigned char *’
/home/avassalotti/pickle4/Modules/_pickle.c:2988: warning: pointer targets in passing argument 2 of ‘_Pickler_Write’ differ in signedness
/home/avassalotti/pickle4/Modules/_pickle.c:872: note: expected ‘const char *’ but argument is of type ‘unsigned char *’
/home/avassalotti/pickle4/Modules/_pickle.c:3012: warning: pointer targets in passing argument 2 of ‘_Pickler_Write’ differ in signedness
/home/avassalotti/pickle4/Modules/_pickle.c:872: note: expected ‘const char *’ but argument is of type ‘unsigned char *’
/home/avassalotti/pickle4/Modules/_pickle.c: In function ‘save_set’:
/home/avassalotti/pickle4/Modules/_pickle.c:3368: warning: suggest parentheses around assignment used as truth value
/home/avassalotti/pickle4/Modules/_pickle.c: In function ‘save_frozenset’:
/home/avassalotti/pickle4/Modules/_pickle.c:3422: warning: suggest parentheses around assignment used as truth value


These should be fixed.
msg168176 - (view) Author: Alexandre Vassalotti (alexandre.vassalotti) * (Python committer) Date: 2012-08-14 05:46
There are reference leaks in the _pickle.c part that will need to be fixed too. 


22:36:29 [~/pickle4]$ ./python -m test.regrtest -R :: test_pickle
[1/1] test_pickle
beginning 9 repetitions
123456789
.........
test_pickle leaked [14780, 14780, 14780, 14780] references, sum=59120
1 test failed:
    test_pickle
[319952 refs]
msg168200 - (view) Author: Stefan Mihaila (mstefanro) * Date: 2012-08-14 13:43
Maybe we could postpone the review process for a few days
until I fix some known issues
msg168482 - (view) Author: Alexandre Vassalotti (alexandre.vassalotti) * (Python committer) Date: 2012-08-17 22:40
Oops, wrong patch. Uploading the right one.
msg168520 - (view) Author: Stefan Mihaila (mstefanro) * Date: 2012-08-18 17:14
Maybe you can set this issue as the superseder of issue9269, because the patches there have already been applied here.
msg168581 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2012-08-19 16:18
Is this patch stable? Or are there further changes coming?
msg168599 - (view) Author: Stefan Mihaila (mstefanro) * Date: 2012-08-19 22:23
There are still some upcoming changes.
msg168809 - (view) Author: Alexandre Vassalotti (alexandre.vassalotti) * (Python committer) Date: 2012-08-21 21:53
Some quick thoughts about the new implicit memoization scheme in Stefan's implementation. 

- The new scheme will need to be documented in PEP 3154 before we can accept the change.
- I don't really like the idea of changing the semantics of the PUT and GET opcodes. I would prefer new opcodes if possible.
- I would like to see benchmarks for this change.
msg168894 - (view) Author: Stefan Mihaila (mstefanro) * Date: 2012-08-22 15:49
>- I don't really like the idea of changing the semantics of the PUT and GET opcodes. I would prefer new opcodes if possible.

Well, the semantics of PUT and GET haven't really changed. It's just that the PUT opcode is not generated anymore and memoization is done "in agreement" (i.e. both the pickler and the unpickler know when to memoize so their memo tables stay in sync). So, in fact, it's the semantics of the other opcodes that has slightly changed.

>- I would like to see benchmarks for this change.

I've tried the following two snippets with timeit:

    ./python3.3 -m timeit \
                -s 'from pickle import dumps' \
                -s 'd=["a"]*100'
                'dumps(d,3)' # replace 3 with 4 for comparison

    ./python3.3 -m timeit \
                -s 'from pickle import dumps' \
                -s 'd=list(map(chr,range(0,256)))' \
                'dumps(d,3)' # replace 3 with 4 for comparison
                             # you can also use loads(dumps(d,3)) here to benchmark both 
                             # operations at once


The first one generates 99 BINGET opcodes. It generates 1 BINPUT opcode in pickle3 and no BINPUT opcodes in pickle4.
There appears no noticeable speed difference.

The second one generates no BINGET opcodes. It generates no BINPUT opcodes in v4, respectively 256 BINPUT opcodes in v3. It appears the v4 one is slightly faster, but I have a hard time comparing correctly, given that the measurements seem to have a very large standard deviation (v4 gets times somewhere between 32.3 and 44.2, whereas v3 gets times between 37.7 and 52.2).

I'm not sure this is the best way to benchmark, so let me know what is usually used.
msg168897 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2012-08-22 16:47
Le mercredi 22 août 2012 à 15:49 +0000, Stefan Mihaila a écrit :
> I'm not sure this is the best way to benchmark, so let me know what is
> usually used.

http://hg.python.org/benchmarks/ has pickling benchmarks, you may have
to modify them to test different pickling protocols.
msg168946 - (view) Author: Stefan Mihaila (mstefanro) * Date: 2012-08-23 15:31
Are there also some known techniques on tracking down memory leaks?

I've played around with sys.gettotalrefcount to narrow down
the place where the leaks occur, but they seem to only occur in v4,
i.e. pickle.dumps(3.0+1j, 4) leaks but pickle.dumps(3.0+1j, 3) does
not.
However, there appears to be no difference in the code that gets
executed in v3 to the one executed in v4.
msg169934 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2012-09-06 18:07
> Are there also some known techniques on tracking down memory leaks?

Nothing more than the usual debugging techniques. It is more of a matter of taste whether you like to add print() (or printf ;-)) calls, or set breakpoints in an actual debugger.

> i.e. pickle.dumps(3.0+1j, 4) leaks but pickle.dumps(3.0+1j, 3) does
not.

Well it looks like you've narrowed things down a bit here.

> However, there appears to be no difference in the code that gets
> executed in v3 to the one executed in v4.

Even the differences in memoization?
msg187497 - (view) Author: Alexandre Vassalotti (alexandre.vassalotti) * (Python committer) Date: 2013-04-21 06:56
I have started a new implementation of PEP 3154 since Stefan hasn't been active on his. Moving the discussion to Issue #17810.
msg187565 - (view) Author: Stefan Mihaila (mstefanro) * Date: 2013-04-22 14:22
Hello. I apologize once again for not finalizing my work, but once I have started my final year of faculty and a job, I have been busy pretty much all the time. I would really like to finish this as I've really enjoyed working on it, and everything on PEP 3154 and some other stuff has already been implemented. The only remaining part was finalizing the code review and fixing some memory leaks that gave me some headaches at the time.
I would really appreciate if you could give me a few more days before deciding to start a new implementation from scratch. I'll get to fixing those memory leaks in the next couple of days and then the code review can be finalized.
Would this be acceptable to you?
History
Date User Action Args
2013-05-11 11:12:04mstefanrosetfiles: + d0c3a8d4947a.diff
2013-04-22 14:22:15mstefanrosetmessages: + msg187565
2013-04-21 06:56:16alexandre.vassalottisetstatus: open -> closed
superseder: Implement PEP 3154 (pickle protocol 4)
messages: + msg187497

dependencies: - Unbinding of methods
resolution: out of date
stage: patch review -> resolved
2012-11-24 14:18:01daniel.urbanlinkissue12457 superseder
2012-11-18 10:07:39serhiy.storchakasetnosy: + serhiy.storchaka
2012-10-04 02:11:04jceasetnosy: + jcea
2012-09-06 18:07:02pitrousetmessages: + msg169934
2012-08-23 15:31:14mstefanrosetmessages: + msg168946
2012-08-22 16:47:52pitrousetmessages: + msg168897
2012-08-22 15:49:58mstefanrosetmessages: + msg168894
2012-08-21 21:53:00alexandre.vassalottisetmessages: + msg168809
2012-08-19 22:23:08mstefanrosetmessages: + msg168599
2012-08-19 16:18:43pitrousetmessages: + msg168581
2012-08-18 17:14:52mstefanrosetmessages: + msg168520
2012-08-17 22:40:15alexandre.vassalottisetfiles: + pickle4-2.diff

messages: + msg168482
2012-08-17 22:39:33alexandre.vassalottisetfiles: - pickle4-2.diff
2012-08-17 21:40:39alexandre.vassalottisetdependencies: + Unbinding of methods
2012-08-17 21:29:47alexandre.vassalottisetfiles: + pickle4-2.diff
2012-08-17 16:34:46asvetlovsetnosy: + asvetlov
2012-08-14 13:43:49mstefanrosetmessages: + msg168200
2012-08-14 05:46:47alexandre.vassalottisetmessages: + msg168176
2012-08-14 05:34:45alexandre.vassalottisetmessages: + msg168175
2012-08-14 02:42:43alexandre.vassalottisetfiles: + pickle4.diff
2012-08-14 02:42:07alexandre.vassalottisetfiles: - pickle4.diff
2012-08-14 02:41:10alexandre.vassalottisetfiles: + pickle4.diff
2012-08-13 20:48:39alexandre.vassalotticreate