Marshal: performance regression with versions 3 and 4 #64615

vstinner · 2014-01-28T10:10:49Z

BPO	20416
Nosy	@loewis, @pitrou, @kristjanvalur, @vstinner, @larryhastings, @serhiy-storchaka, @vajrasky
Files	marshal3_numbers.patch bench.py marshal_small_ints_refs.patch marshal_refs_by_value.patch bench_issue20416.py marshal_refs_by_value_3.patch marshal_hashtable.patch marshal_hashtable_2.patch

^{Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.}

Show more details

GitHub fields:

assignee = 'https://github.com/serhiy-storchaka'
closed_at = <Date 2015-02-11.15:53:15.720>
created_at = <Date 2014-01-28.10:10:48.757>
labels = ['interpreter-core', 'performance']
title = 'Marshal: performance regression with versions 3 and 4'
updated_at = <Date 2015-02-11.15:53:15.719>
user = 'https://github.com/vstinner'

bugs.python.org fields:

activity = <Date 2015-02-11.15:53:15.719>
actor = 'serhiy.storchaka'
assignee = 'serhiy.storchaka'
closed = True
closed_date = <Date 2015-02-11.15:53:15.720>
closer = 'serhiy.storchaka'
components = ['Interpreter Core']
creation = <Date 2014-01-28.10:10:48.757>
creator = 'vstinner'
dependencies = []
files = ['33770', '33771', '34071', '37894', '37895', '38012', '38013', '38088']
hgrepos = []
issue_num = 20416
keywords = ['patch']
message_count = 25.0
messages = ['209524', '209525', '209526', '209528', '209530', '209531', '209532', '209533', '209534', '209535', '211164', '211173', '211691', '234903', '235050', '235384', '235385', '235469', '235479', '235485', '235713', '235721', '235722', '235746', '235753']
nosy_count = 8.0
nosy_names = ['loewis', 'pitrou', 'kristjan.jonsson', 'vstinner', 'larry', 'python-dev', 'serhiy.storchaka', 'vajrasky']
pr_nums = []
priority = 'normal'
resolution = 'fixed'
stage = 'resolved'
status = 'closed'
superseder = None
type = 'performance'
url = 'https://bugs.python.org/issue20416'
versions = ['Python 3.5']

vstinner · 2014-01-28T10:10:48Z

Attached patched disables references for int and float types.

vstinner · 2014-01-28T10:14:40Z

Use attached bench.py to compare performances.

Without the patch:
---
dumps v0: 389.6 ms
data size v0: 45582.9 kB
loads v0: 573.3 ms

dumps v1: 391.4 ms
data size v1: 45582.9 kB
loads v1: 558.0 ms

dumps v2: 166.9 ms
data size v2: 41395.4 kB
loads v2: 482.2 ms

dumps v3: 431.2 ms
data size v3: 41395.4 kB
loads v3: 543.8 ms

dumps v4: 361.8 ms
data size v4: 37000.9 kB
loads v4: 560.4 ms
---

With the patch:
---
dumps v0: 391.4 ms
data size v0: 45582.9 kB
loads v0: 578.2 ms

dumps v1: 392.3 ms
data size v1: 45582.9 kB
loads v1: 556.8 ms

dumps v2: 167.7 ms
data size v2: 41395.4 kB
loads v2: 484.6 ms

dumps v3: 170.3 ms
data size v3: 41395.4 kB
loads v3: 467.0 ms

dumps v4: 122.8 ms
data size v4: 37000.9 kB
loads v4: 468.9 ms
---

dumps v3 is 60% faster, loads v3 is also 14% *faster*.

dumps v4 is 66% faster, loads v4 is 16% faster.

vstinner · 2014-01-28T10:19:32Z

Performance of Python 3.3.3+:
---
dumps v0: 374.8 ms
data size v0: 45582.9 kB
loads v0: 625.3 ms

dumps v1: 374.6 ms
data size v1: 45582.9 kB
loads v1: 605.1 ms

dumps v2: 152.9 ms
data size v2: 41395.4 kB
loads v2: 556.5 ms
---

So with the patch, the Python 3.4 default version (4) is *faster* (dump 20% faster, load 16% faster) and produces *smaller files* (10% smaller).

vstinner · 2014-01-28T10:20:22Z

Oh by the way, on Python 3.4, the file size (on version 3 and 4) is unchanged with my patch. Writing a reference produces takes exactly the same size than an integer.

vajrasky · 2014-01-28T10:25:57Z

I am doing clinic conversion for marshal module so I am adding myself to nosy list to make sure both tickets are synchronized.

http://bugs.python.org/issue20185

vstinner · 2014-01-28T10:36:09Z

I am doing clinic conversion for marshal module so I am adding myself to nosy list to make sure both tickets are synchronized.

The Derby is suspended until the release of Python 3.4 final.

I consider this issue as an important performance regression that should be fixed before Python 3.4 final.

serhiy-storchaka · 2014-01-28T10:54:50Z

Did you tested for numerous shared int and floats? [1000] * 1000000 and [1000.0] * 1000000? AFAIK this was important use cases for adding 3 or 4 versions.

vstinner · 2014-01-28T11:34:43Z

Did you tested for numerous shared int and floats? [1000] * 1000000 and [1000.0] * 1000000? AFAIK this was important use cases for adding 3 or 4 versions.

Here are new benchmarks on Python 3.4 with:

Integers: [1000] * 1000000
Floats: [1000.0] * 1000000

Integers, without the patch:

dumps v3: 62.8 ms
data size v3: 4882.8 kB
loads v3: 10.7 ms

Integers, with the patch:

dumps v3: 18.6 ms (-70%)
data size v3: 4882.8 kB (same size)
loads v3: 27.7 ms (+158%)

Floats, without the patch:

dumps v3: 62.5 ms
data size v3: 4882.8 kB
loads v3: 11.0 ms

Floats, with the patch:

dumps v3: 29.3 ms (-53%)
data size v3: 8789.1 kB (+80%)
loads v3: 25.5 ms (+132%)

The version 3 was added by:
---
changeset: 82816:01372117a5b4
user: Kristján Valur Jónsson <sweskman@gmail.com>
date: Tue Mar 19 18:02:10 2013 -0700
files: Doc/library/marshal.rst Include/marshal.h Lib/test/test_marshal.py Misc/NEWS Python/marshal.c
description:
Issue bpo-16475: Support object instancing, recursion and interned strings in marshal
---

This issue tells about "sharing string constants, common tuples, even common code objects", not sharing numbers.

For real data, here are interesting numbers:
http://bugs.python.org/issue16475#msg176013

Integers only represent 4.8% of serialized data, and only 8.2% of these integers can be shared. (Floats represent 0.29%.) Whereas strings repsent 58% and 57% can be shared.

loewis · 2014-01-28T11:35:31Z

For the record, format 3 was added through bpo-16475, format 4 was added through bpo-19219.

In msg175962, Kristjan argued that there is no reason _not_ to share int objects, e.g. across multiple code objects. Now it seems that this argument is flawed: there is a reason, namely the performance impact.

OTOH, I consider both use case (marshaling a large number of integers, and desiring to share ints across code objects) equally obscure: you shouldn't worry about marshal performance too much if you have loads of tiny int objects, and you shouldn't worry whether these ints get shared or not.

As a compromise, we could suppress the sharing for small int objects, since they are singletons, anyway. This would allow marshal to preserve/copy the object graph, while not impacting the use case that the original poster on python-dev presented.

pitrou · 2014-01-28T11:39:27Z

Integers, without the patch:

dumps v3: 62.8 ms
data size v3: 4882.8 kB
loads v3: 10.7 ms

Integers, with the patch:

dumps v3: 18.6 ms (-70%)
data size v3: 4882.8 kB (same size)
loads v3: 27.7 ms (+158%)

As I wrote on python-dev, dumps performance isn't important for the pyc
use case, but loads performance is. Therefore it appears this patch goes
into the wrong direction.

You are also ignoring the *runtime* benefit of sharing objects: smaller
memory footprint of the actual Python process.

serhiy-storchaka · 2014-02-13T19:29:29Z

As a compromise, we could suppress the sharing for small int objects, since they are singletons, anyway.

This is implementation detail.

But we can use more efficient way to memoizating small int objects.

I also suppose than efficient C implementation of IdentityDict will significant decrease an overhead (may be 2 times). As far as this is probably a first case for which IdentityDict give significant benefit, I'll provide an implementation for 3.5.

serhiy-storchaka · 2014-02-13T21:35:45Z

Here is a patch which special cases small integers. It decreases v3 and v4 dump time of bench.py by 10% without affecting load time. Of course in real cases the benefit will be much less.

larryhastings · 2014-02-20T06:05:13Z

This is not a release blocker. You guys can play with this for 3.5. FWIW I prefer Serhiy's approach.

serhiy-storchaka · 2015-01-28T16:45:22Z

Here is more general solution. For simple values (ints, floats, complex numbers, short strings) it is faster to use the value itself as a key than create new integer object (id).

Without the patch:

data ver. dumps(ms) loads(ms) size(KiB)

genData 2 103.9 186.4 4090.7
genData 3 250.3 196.8 4090.7
genData 4 223.5 182.5 3651.3

[1000]*10**6 2 98.6 134.8 4882.8
[1000]*10**6 3 491.1 62.2 4882.8
[1000]*10**6 4 494.9 62.1 4882.8

[1000.0]*10**6 2 173.5 158.4 8789.1
[1000.0]*10**6 3 494.8 62.2 4882.8
[1000.0]*10**6 4 491.4 62.8 4882.8

[1000.0j]*10**6 2 288.8 221.4 16601.6
[1000.0j]*10**6 3 493.6 62.4 4882.8
[1000.0j]*10**6 4 489.2 62.0 4882.8

20 pydecimals 2 85.0 114.7 3936.5
20 pydecimals 3 97.2 44.3 3373.4
20 pydecimals 4 86.2 40.0 3297.5

With marshal3_numbers.patch:

data ver. dumps(ms) loads(ms) size(KiB)

genData 3 108.4 187.5 4090.7
genData 4 83.0 179.3 3651.3

[1000]*10**6 3 104.2 145.8 4882.8
[1000]*10**6 4 103.9 147.0 4882.8

[1000.0]*10**6 3 177.4 154.5 8789.1
[1000.0]*10**6 4 177.1 164.2 8789.1

[1000.0j]*10**6 3 501.6 61.1 4882.8
[1000.0j]*10**6 4 501.6 62.3 4882.8

20 pydecimals 3 95.2 41.9 3373.4
20 pydecimals 4 83.5 38.5 3297.5

With marshal_refs_by_value.patch:

data ver. dumps(ms) loads(ms) size(KiB)

genData 3 150.4 197.0 4090.7
genData 4 122.1 184.8 3651.3

[1000]*10**6 3 169.1 62.3 4882.8
[1000]*10**6 4 169.2 62.2 4882.8

[1000.0]*10**6 3 313.5 62.2 4882.8
[1000.0]*10**6 4 312.8 62.3 4882.8

[1000.0j]*10**6 3 410.6 62.5 4882.8
[1000.0j]*10**6 4 410.5 62.3 4882.8

20 pydecimals 3 68.5 41.1 3373.4
20 pydecimals 4 57.5 39.8 3297.5

The marshal_refs_by_value.patch produces data so compact as unpatched code does, but dumps faster. Usually it dumps slower than marshal3_numbers.patch, but may produce smaller data and loads faster. Surprisingly it dumps the code of the _pydecimal module faster.

As side effect the patch can turn some simple equal values to identical objects.

serhiy-storchaka · 2015-01-30T19:43:34Z

Here are results of the benchmark which measures dump and load time for all pyc files in the stdlib (including tests).

https://bitbucket.org/storchaka/cpython-stuff/src/default/marshal/marshalbench.py

$ find * -name __pycache__ -exec rm -rf '{}' +
$ ./python -m compileall -qq -r 99 Lib
$ find Lib -name '*.pyc' | xargs ./python marshalbench.py

Without the patch:

ver. dumps loads size
770.3 19941.2
0 695.7 1178.4 19941.2
1 680.4 1180.9 19941.2
2 635.9 1115.9 19941.2
3 896.3 757.3 19941.2
4 748.0 700.6 19941.2

With marshal3_numbers.patch:

ver. dumps loads size
750.5 19946.1
0 690.2 1173.7 19946.1
1 678.2 1166.6 19946.1
2 630.9 1105.2 19946.1
3 873.2 751.5 19946.1
4 724.6 687.6 19946.1

With marshal_refs_by_value.patch:

ver. dumps loads size
780.2 19935.8
0 712.7 1202.6 19935.8
1 713.8 1195.1 19935.8
2 676.5 1126.0 19935.8
3 686.1 762.7 19935.8
4 515.1 697.6 19935.8

serhiy-storchaka · 2015-02-04T11:16:48Z

Here is a patch which adds separate dict for interned strings (otherwise they can be uninterned) and for bytes. It also slightly simplify the code.

serhiy-storchaka · 2015-02-04T11:30:45Z

And here is alternative patch which uses a hashtable.

Both patches have about the same performance for *.pyc files, but marshal_hashtable.patch is much faster for duplicated values. Marshalling [1000]*10**6, [1000.0]*10**6 and [1000.0j]*10**6 with version 3 an 4 is so fast as marshalling [1000]*10**6 with version 2 (i.e. 5 times faster than current implementation).

data ver. dumps(ms) loads(ms) size(KiB)

genData 2 99.9 188.9 4090.7
genData 3 148.2 189.1 4090.7
genData 4 121.4 177.4 3651.3

[1000]*10**6 2 97.7 131.6 4882.8
[1000]*10**6 3 95.1 63.1 4882.8
[1000]*10**6 4 95.1 64.4 4882.8

[1000.0]*10**6 2 172.9 153.5 8789.1
[1000.0]*10**6 3 97.4 61.9 4882.8
[1000.0]*10**6 4 95.7 61.6 4882.8

[1000.0j]*10**6 2 288.6 228.2 16601.6
[1000.0j]*10**6 3 94.9 61.6 4882.8
[1000.0j]*10**6 4 95.1 62.2 4882.8

20 pydecimals 2 88.0 111.4 3929.6
20 pydecimals 3 57.0 51.4 3368.5
20 pydecimals 4 46.6 39.9 3292.8

serhiy-storchaka · 2015-02-06T07:11:59Z

What are your thoughts Victor?

pitrou · 2015-02-06T12:39:25Z

Can you post the results of your hashtable patch with marshalbench.py?

PS: we don't care about versions older than 4.

serhiy-storchaka · 2015-02-06T14:11:15Z

Here are new results with corrected marshalbench.py (more precise and with recalculated total size):

Without the patch:
ver. dumps loads size
746.5 19971.2
0 669.1 1149.2 26202.9
1 682.0 1130.1 26202.9
2 652.0 1134.8 26202.4
3 920.8 757.9 21316.7
4 771.2 691.4 19971.2

With marshal3_numbers.patch:
3 894.6 778.4 21321.6
4 733.1 704.2 19976.2

With marshal_refs_by_value_3.patch:
3 687.6 777.7 21310.3
4 535.5 701.5 19966.3

With marshal_hashtable.patch:
3 656.0 764.2 21316.7
4 512.9 696.6 19971.2

serhiy-storchaka · 2015-02-10T22:00:54Z

Update patch addresses Victor's comments.

vstinner · 2015-02-11T00:09:12Z

Except of the minor suggestion that I added on the review, the patch looks good the me. It's quite simple and makes dumps() 34% faster (for protocol 4).

vstinner · 2015-02-11T00:09:58Z

(I didn't reproduce the benchmark, I just used Serhy numbers. I guess that using memcpy() doesn't make the code slower.)

python-dev · 2015-02-11T13:55:58Z

New changeset 012364df2712 by Serhiy Storchaka in branch 'default':
Issue bpo-20416: marshal.dumps() with protocols 3 and 4 is now 40-50% faster on
https://hg.python.org/cpython/rev/012364df2712

serhiy-storchaka · 2015-02-11T15:53:16Z

Thanks for your review Antoine and Victor.

vstinner added the performance Performance or resource usage label Jan 28, 2014

vstinner added the release-blocker label Jan 28, 2014

larryhastings removed the release-blocker label Feb 20, 2014

serhiy-storchaka added the interpreter-core (Objects, Python, Grammar, and Parser dirs) label Jan 28, 2015

serhiy-storchaka changed the title ~~Marshal: special case int and float, don't use references~~ Marshal: performance regression with versions 3 and 4 Feb 6, 2015

serhiy-storchaka closed this as completed Feb 11, 2015

serhiy-storchaka self-assigned this Feb 11, 2015

ezio-melotti transferred this issue from another repository Apr 10, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Marshal: performance regression with versions 3 and 4 #64615

Marshal: performance regression with versions 3 and 4 #64615

vstinner commented Jan 28, 2014

vstinner commented Jan 28, 2014

vstinner commented Jan 28, 2014

vstinner commented Jan 28, 2014

vstinner commented Jan 28, 2014

vajrasky mannequin commented Jan 28, 2014

vstinner commented Jan 28, 2014

serhiy-storchaka commented Jan 28, 2014

vstinner commented Jan 28, 2014

loewis mannequin commented Jan 28, 2014

pitrou commented Jan 28, 2014

serhiy-storchaka commented Feb 13, 2014

serhiy-storchaka commented Feb 13, 2014

larryhastings commented Feb 20, 2014

serhiy-storchaka commented Jan 28, 2015

serhiy-storchaka commented Jan 30, 2015

serhiy-storchaka commented Feb 4, 2015

serhiy-storchaka commented Feb 4, 2015

serhiy-storchaka commented Feb 6, 2015

pitrou commented Feb 6, 2015

serhiy-storchaka commented Feb 6, 2015

serhiy-storchaka commented Feb 10, 2015

vstinner commented Feb 11, 2015

vstinner commented Feb 11, 2015

python-dev mannequin commented Feb 11, 2015

serhiy-storchaka commented Feb 11, 2015

Marshal: performance regression with versions 3 and 4 #64615

Marshal: performance regression with versions 3 and 4 #64615

Comments

vstinner commented Jan 28, 2014

vstinner commented Jan 28, 2014

vstinner commented Jan 28, 2014

vstinner commented Jan 28, 2014

vstinner commented Jan 28, 2014

vajrasky mannequin commented Jan 28, 2014

vstinner commented Jan 28, 2014

serhiy-storchaka commented Jan 28, 2014

vstinner commented Jan 28, 2014

loewis mannequin commented Jan 28, 2014

pitrou commented Jan 28, 2014

serhiy-storchaka commented Feb 13, 2014

serhiy-storchaka commented Feb 13, 2014

larryhastings commented Feb 20, 2014

serhiy-storchaka commented Jan 28, 2015

serhiy-storchaka commented Jan 30, 2015

serhiy-storchaka commented Feb 4, 2015

serhiy-storchaka commented Feb 4, 2015

serhiy-storchaka commented Feb 6, 2015

pitrou commented Feb 6, 2015

serhiy-storchaka commented Feb 6, 2015

serhiy-storchaka commented Feb 10, 2015

vstinner commented Feb 11, 2015

vstinner commented Feb 11, 2015

python-dev mannequin commented Feb 11, 2015

serhiy-storchaka commented Feb 11, 2015