msg209524 - (view) |
Author: STINNER Victor (vstinner) * |
Date: 2014-01-28 10:10 |
Attached patched disables references for int and float types.
|
msg209525 - (view) |
Author: STINNER Victor (vstinner) * |
Date: 2014-01-28 10:14 |
Use attached bench.py to compare performances.
Without the patch:
---
dumps v0: 389.6 ms
data size v0: 45582.9 kB
loads v0: 573.3 ms
dumps v1: 391.4 ms
data size v1: 45582.9 kB
loads v1: 558.0 ms
dumps v2: 166.9 ms
data size v2: 41395.4 kB
loads v2: 482.2 ms
dumps v3: 431.2 ms
data size v3: 41395.4 kB
loads v3: 543.8 ms
dumps v4: 361.8 ms
data size v4: 37000.9 kB
loads v4: 560.4 ms
---
With the patch:
---
dumps v0: 391.4 ms
data size v0: 45582.9 kB
loads v0: 578.2 ms
dumps v1: 392.3 ms
data size v1: 45582.9 kB
loads v1: 556.8 ms
dumps v2: 167.7 ms
data size v2: 41395.4 kB
loads v2: 484.6 ms
dumps v3: 170.3 ms
data size v3: 41395.4 kB
loads v3: 467.0 ms
dumps v4: 122.8 ms
data size v4: 37000.9 kB
loads v4: 468.9 ms
---
dumps v3 is 60% faster, loads v3 is also 14% *faster*.
dumps v4 is 66% faster, loads v4 is 16% faster.
|
msg209526 - (view) |
Author: STINNER Victor (vstinner) * |
Date: 2014-01-28 10:19 |
Performance of Python 3.3.3+:
---
dumps v0: 374.8 ms
data size v0: 45582.9 kB
loads v0: 625.3 ms
dumps v1: 374.6 ms
data size v1: 45582.9 kB
loads v1: 605.1 ms
dumps v2: 152.9 ms
data size v2: 41395.4 kB
loads v2: 556.5 ms
---
So with the patch, the Python 3.4 default version (4) is *faster* (dump 20% faster, load 16% faster) and produces *smaller files* (10% smaller).
|
msg209528 - (view) |
Author: STINNER Victor (vstinner) * |
Date: 2014-01-28 10:20 |
Oh by the way, on Python 3.4, the file size (on version 3 and 4) is unchanged with my patch. Writing a reference produces takes exactly the same size than an integer.
|
msg209530 - (view) |
Author: Vajrasky Kok (vajrasky) * |
Date: 2014-01-28 10:25 |
I am doing clinic conversion for marshal module so I am adding myself to nosy list to make sure both tickets are synchronized.
http://bugs.python.org/issue20185
|
msg209531 - (view) |
Author: STINNER Victor (vstinner) * |
Date: 2014-01-28 10:36 |
> I am doing clinic conversion for marshal module so I am adding myself to nosy list to make sure both tickets are synchronized.
The Derby is suspended until the release of Python 3.4 final.
I consider this issue as an important performance regression that should be fixed before Python 3.4 final.
|
msg209532 - (view) |
Author: Serhiy Storchaka (serhiy.storchaka) * |
Date: 2014-01-28 10:54 |
Did you tested for numerous shared int and floats? [1000] * 1000000 and [1000.0] * 1000000? AFAIK this was important use cases for adding 3 or 4 versions.
|
msg209533 - (view) |
Author: STINNER Victor (vstinner) * |
Date: 2014-01-28 11:34 |
> Did you tested for numerous shared int and floats? [1000] * 1000000 and [1000.0] * 1000000? AFAIK this was important use cases for adding 3 or 4 versions.
Here are new benchmarks on Python 3.4 with:
Integers: [1000] * 1000000
Floats: [1000.0] * 1000000
Integers, without the patch:
dumps v3: 62.8 ms
data size v3: 4882.8 kB
loads v3: 10.7 ms
Integers, with the patch:
dumps v3: 18.6 ms (-70%)
data size v3: 4882.8 kB (same size)
loads v3: 27.7 ms (+158%)
Floats, without the patch:
dumps v3: 62.5 ms
data size v3: 4882.8 kB
loads v3: 11.0 ms
Floats, with the patch:
dumps v3: 29.3 ms (-53%)
data size v3: 8789.1 kB (+80%)
loads v3: 25.5 ms (+132%)
The version 3 was added by:
---
changeset: 82816:01372117a5b4
user: Kristján Valur Jónsson <sweskman@gmail.com>
date: Tue Mar 19 18:02:10 2013 -0700
files: Doc/library/marshal.rst Include/marshal.h Lib/test/test_marshal.py Misc/NEWS Python/marshal.c
description:
Issue #16475: Support object instancing, recursion and interned strings in marshal
---
This issue tells about "sharing string constants, common tuples, even common code objects", not sharing numbers.
For real data, here are interesting numbers:
http://bugs.python.org/issue16475#msg176013
Integers only represent 4.8% of serialized data, and only 8.2% of these integers can be shared. (Floats represent 0.29%.) Whereas strings repsent 58% and 57% can be shared.
|
msg209534 - (view) |
Author: Martin v. Löwis (loewis) * |
Date: 2014-01-28 11:35 |
For the record, format 3 was added through issue16475, format 4 was added through issue19219.
In msg175962, Kristjan argued that there is no reason _not_ to share int objects, e.g. across multiple code objects. Now it seems that this argument is flawed: there is a reason, namely the performance impact.
OTOH, I consider both use case (marshaling a large number of integers, and desiring to share ints across code objects) equally obscure: you shouldn't worry about marshal performance too much if you have loads of tiny int objects, and you shouldn't worry whether these ints get shared or not.
As a compromise, we could suppress the sharing for small int objects, since they are singletons, anyway. This would allow marshal to preserve/copy the object graph, while not impacting the use case that the original poster on python-dev presented.
|
msg209535 - (view) |
Author: Antoine Pitrou (pitrou) * |
Date: 2014-01-28 11:39 |
> Integers, without the patch:
>
> dumps v3: 62.8 ms
> data size v3: 4882.8 kB
> loads v3: 10.7 ms
>
> Integers, with the patch:
>
> dumps v3: 18.6 ms (-70%)
> data size v3: 4882.8 kB (same size)
> loads v3: 27.7 ms (+158%)
As I wrote on python-dev, dumps performance isn't important for the pyc
use case, but loads performance is. Therefore it appears this patch goes
into the wrong direction.
You are also ignoring the *runtime* benefit of sharing objects: smaller
memory footprint of the actual Python process.
|
msg211164 - (view) |
Author: Serhiy Storchaka (serhiy.storchaka) * |
Date: 2014-02-13 19:29 |
> As a compromise, we could suppress the sharing for small int objects, since they are singletons, anyway.
This is implementation detail.
But we can use more efficient way to memoizating small int objects.
I also suppose than efficient C implementation of IdentityDict will significant decrease an overhead (may be 2 times). As far as this is probably a first case for which IdentityDict give significant benefit, I'll provide an implementation for 3.5.
|
msg211173 - (view) |
Author: Serhiy Storchaka (serhiy.storchaka) * |
Date: 2014-02-13 21:35 |
Here is a patch which special cases small integers. It decreases v3 and v4 dump time of bench.py by 10% without affecting load time. Of course in real cases the benefit will be much less.
|
msg211691 - (view) |
Author: Larry Hastings (larry) * |
Date: 2014-02-20 06:05 |
This is not a release blocker. You guys can play with this for 3.5. FWIW I prefer Serhiy's approach.
|
msg234903 - (view) |
Author: Serhiy Storchaka (serhiy.storchaka) * |
Date: 2015-01-28 16:45 |
Here is more general solution. For simple values (ints, floats, complex numbers, short strings) it is faster to use the value itself as a key than create new integer object (id).
Without the patch:
data ver. dumps(ms) loads(ms) size(KiB)
genData 2 103.9 186.4 4090.7
genData 3 250.3 196.8 4090.7
genData 4 223.5 182.5 3651.3
[1000]*10**6 2 98.6 134.8 4882.8
[1000]*10**6 3 491.1 62.2 4882.8
[1000]*10**6 4 494.9 62.1 4882.8
[1000.0]*10**6 2 173.5 158.4 8789.1
[1000.0]*10**6 3 494.8 62.2 4882.8
[1000.0]*10**6 4 491.4 62.8 4882.8
[1000.0j]*10**6 2 288.8 221.4 16601.6
[1000.0j]*10**6 3 493.6 62.4 4882.8
[1000.0j]*10**6 4 489.2 62.0 4882.8
20 pydecimals 2 85.0 114.7 3936.5
20 pydecimals 3 97.2 44.3 3373.4
20 pydecimals 4 86.2 40.0 3297.5
With marshal3_numbers.patch:
data ver. dumps(ms) loads(ms) size(KiB)
genData 3 108.4 187.5 4090.7
genData 4 83.0 179.3 3651.3
[1000]*10**6 3 104.2 145.8 4882.8
[1000]*10**6 4 103.9 147.0 4882.8
[1000.0]*10**6 3 177.4 154.5 8789.1
[1000.0]*10**6 4 177.1 164.2 8789.1
[1000.0j]*10**6 3 501.6 61.1 4882.8
[1000.0j]*10**6 4 501.6 62.3 4882.8
20 pydecimals 3 95.2 41.9 3373.4
20 pydecimals 4 83.5 38.5 3297.5
With marshal_refs_by_value.patch:
data ver. dumps(ms) loads(ms) size(KiB)
genData 3 150.4 197.0 4090.7
genData 4 122.1 184.8 3651.3
[1000]*10**6 3 169.1 62.3 4882.8
[1000]*10**6 4 169.2 62.2 4882.8
[1000.0]*10**6 3 313.5 62.2 4882.8
[1000.0]*10**6 4 312.8 62.3 4882.8
[1000.0j]*10**6 3 410.6 62.5 4882.8
[1000.0j]*10**6 4 410.5 62.3 4882.8
20 pydecimals 3 68.5 41.1 3373.4
20 pydecimals 4 57.5 39.8 3297.5
The marshal_refs_by_value.patch produces data so compact as unpatched code does, but dumps faster. Usually it dumps slower than marshal3_numbers.patch, but may produce smaller data and loads faster. Surprisingly it dumps the code of the _pydecimal module faster.
As side effect the patch can turn some simple equal values to identical objects.
|
msg235050 - (view) |
Author: Serhiy Storchaka (serhiy.storchaka) * |
Date: 2015-01-30 19:43 |
Here are results of the benchmark which measures dump and load time for all pyc files in the stdlib (including tests).
https://bitbucket.org/storchaka/cpython-stuff/src/default/marshal/marshalbench.py
$ find * -name __pycache__ -exec rm -rf '{}' +
$ ./python -m compileall -qq -r 99 Lib
$ find Lib -name '*.pyc' | xargs ./python marshalbench.py
Without the patch:
ver. dumps loads size
770.3 19941.2
0 695.7 1178.4 19941.2
1 680.4 1180.9 19941.2
2 635.9 1115.9 19941.2
3 896.3 757.3 19941.2
4 748.0 700.6 19941.2
With marshal3_numbers.patch:
ver. dumps loads size
750.5 19946.1
0 690.2 1173.7 19946.1
1 678.2 1166.6 19946.1
2 630.9 1105.2 19946.1
3 873.2 751.5 19946.1
4 724.6 687.6 19946.1
With marshal_refs_by_value.patch:
ver. dumps loads size
780.2 19935.8
0 712.7 1202.6 19935.8
1 713.8 1195.1 19935.8
2 676.5 1126.0 19935.8
3 686.1 762.7 19935.8
4 515.1 697.6 19935.8
|
msg235384 - (view) |
Author: Serhiy Storchaka (serhiy.storchaka) * |
Date: 2015-02-04 11:16 |
Here is a patch which adds separate dict for interned strings (otherwise they can be uninterned) and for bytes. It also slightly simplify the code.
|
msg235385 - (view) |
Author: Serhiy Storchaka (serhiy.storchaka) * |
Date: 2015-02-04 11:30 |
And here is alternative patch which uses a hashtable.
Both patches have about the same performance for *.pyc files, but marshal_hashtable.patch is much faster for duplicated values. Marshalling [1000]*10**6, [1000.0]*10**6 and [1000.0j]*10**6 with version 3 an 4 is so fast as marshalling [1000]*10**6 with version 2 (i.e. 5 times faster than current implementation).
data ver. dumps(ms) loads(ms) size(KiB)
genData 2 99.9 188.9 4090.7
genData 3 148.2 189.1 4090.7
genData 4 121.4 177.4 3651.3
[1000]*10**6 2 97.7 131.6 4882.8
[1000]*10**6 3 95.1 63.1 4882.8
[1000]*10**6 4 95.1 64.4 4882.8
[1000.0]*10**6 2 172.9 153.5 8789.1
[1000.0]*10**6 3 97.4 61.9 4882.8
[1000.0]*10**6 4 95.7 61.6 4882.8
[1000.0j]*10**6 2 288.6 228.2 16601.6
[1000.0j]*10**6 3 94.9 61.6 4882.8
[1000.0j]*10**6 4 95.1 62.2 4882.8
20 pydecimals 2 88.0 111.4 3929.6
20 pydecimals 3 57.0 51.4 3368.5
20 pydecimals 4 46.6 39.9 3292.8
|
msg235469 - (view) |
Author: Serhiy Storchaka (serhiy.storchaka) * |
Date: 2015-02-06 07:11 |
What are your thoughts Victor?
|
msg235479 - (view) |
Author: Antoine Pitrou (pitrou) * |
Date: 2015-02-06 12:39 |
Can you post the results of your hashtable patch with marshalbench.py?
PS: we don't care about versions older than 4.
|
msg235485 - (view) |
Author: Serhiy Storchaka (serhiy.storchaka) * |
Date: 2015-02-06 14:11 |
Here are new results with corrected marshalbench.py (more precise and with recalculated total size):
Without the patch:
ver. dumps loads size
746.5 19971.2
0 669.1 1149.2 26202.9
1 682.0 1130.1 26202.9
2 652.0 1134.8 26202.4
3 920.8 757.9 21316.7
4 771.2 691.4 19971.2
With marshal3_numbers.patch:
3 894.6 778.4 21321.6
4 733.1 704.2 19976.2
With marshal_refs_by_value_3.patch:
3 687.6 777.7 21310.3
4 535.5 701.5 19966.3
With marshal_hashtable.patch:
3 656.0 764.2 21316.7
4 512.9 696.6 19971.2
|
msg235713 - (view) |
Author: Serhiy Storchaka (serhiy.storchaka) * |
Date: 2015-02-10 22:00 |
Update patch addresses Victor's comments.
|
msg235721 - (view) |
Author: STINNER Victor (vstinner) * |
Date: 2015-02-11 00:09 |
Except of the minor suggestion that I added on the review, the patch looks good the me. It's quite simple and makes dumps() 34% faster (for protocol 4).
|
msg235722 - (view) |
Author: STINNER Victor (vstinner) * |
Date: 2015-02-11 00:09 |
(I didn't reproduce the benchmark, I just used Serhy numbers. I guess that using memcpy() doesn't make the code slower.)
|
msg235746 - (view) |
Author: Roundup Robot (python-dev) |
Date: 2015-02-11 13:55 |
New changeset 012364df2712 by Serhiy Storchaka in branch 'default':
Issue #20416: marshal.dumps() with protocols 3 and 4 is now 40-50% faster on
https://hg.python.org/cpython/rev/012364df2712
|
msg235753 - (view) |
Author: Serhiy Storchaka (serhiy.storchaka) * |
Date: 2015-02-11 15:53 |
Thanks for your review Antoine and Victor.
|
|
Date |
User |
Action |
Args |
2022-04-11 14:57:57 | admin | set | github: 64615 |
2015-02-11 15:53:15 | serhiy.storchaka | set | status: open -> closed messages:
+ msg235753
assignee: serhiy.storchaka resolution: fixed stage: patch review -> resolved |
2015-02-11 13:55:58 | python-dev | set | nosy:
+ python-dev messages:
+ msg235746
|
2015-02-11 00:09:58 | vstinner | set | messages:
+ msg235722 |
2015-02-11 00:09:12 | vstinner | set | messages:
+ msg235721 |
2015-02-10 22:00:54 | serhiy.storchaka | set | files:
+ marshal_hashtable_2.patch
messages:
+ msg235713 |
2015-02-06 14:11:15 | serhiy.storchaka | set | messages:
+ msg235485 |
2015-02-06 12:39:25 | pitrou | set | messages:
+ msg235479 |
2015-02-06 07:11:58 | serhiy.storchaka | set | messages:
+ msg235469 title: Marshal: special case int and float, don't use references -> Marshal: performance regression with versions 3 and 4 |
2015-02-04 11:30:45 | serhiy.storchaka | set | files:
+ marshal_hashtable.patch
messages:
+ msg235385 |
2015-02-04 11:16:49 | serhiy.storchaka | set | files:
+ marshal_refs_by_value_3.patch
messages:
+ msg235384 |
2015-01-30 19:43:35 | serhiy.storchaka | set | messages:
+ msg235050 |
2015-01-28 16:46:03 | serhiy.storchaka | set | files:
+ bench_issue20416.py |
2015-01-28 16:45:22 | serhiy.storchaka | set | files:
+ marshal_refs_by_value.patch
messages:
+ msg234903 components:
+ Interpreter Core stage: patch review |
2014-02-20 06:05:12 | larry | set | priority: release blocker -> normal
messages:
+ msg211691 versions:
+ Python 3.5, - Python 3.4 |
2014-02-13 21:35:45 | serhiy.storchaka | set | files:
+ marshal_small_ints_refs.patch
messages:
+ msg211173 |
2014-02-13 19:29:29 | serhiy.storchaka | set | messages:
+ msg211164 |
2014-01-28 11:39:26 | pitrou | set | messages:
+ msg209535 |
2014-01-28 11:35:30 | loewis | set | messages:
+ msg209534 |
2014-01-28 11:34:43 | vstinner | set | messages:
+ msg209533 |
2014-01-28 10:54:50 | serhiy.storchaka | set | messages:
+ msg209532 |
2014-01-28 10:50:55 | serhiy.storchaka | set | nosy:
+ pitrou, kristjan.jonsson, serhiy.storchaka
|
2014-01-28 10:36:08 | vstinner | set | priority: normal -> release blocker nosy:
+ larry messages:
+ msg209531
|
2014-01-28 10:25:57 | vajrasky | set | nosy:
+ vajrasky messages:
+ msg209530
|
2014-01-28 10:20:21 | vstinner | set | messages:
+ msg209528 |
2014-01-28 10:19:31 | vstinner | set | messages:
+ msg209526 |
2014-01-28 10:14:45 | vstinner | set | nosy:
+ loewis
|
2014-01-28 10:14:40 | vstinner | set | files:
+ bench.py
messages:
+ msg209525 |
2014-01-28 10:10:48 | vstinner | create | |