classification
Title: Verify the etree_parse and etree_iterparse benchmarks are working appropriately
Type: performance Stage:
Components: Benchmarks, Extension Modules, Library (Lib), XML Versions: Python 3.6
process
Status: closed Resolution: not a bug
Dependencies: 25814 Superseder:
Assigned To: Nosy List: brett.cannon, eli.bendersky, pitrou, python-dev, scoder, serhiy.storchaka
Priority: normal Keywords: patch

Created on 2015-11-16 19:41 by brett.cannon, last changed 2016-03-12 17:21 by brett.cannon. This issue is now closed.

Files
File name Uploaded Description Edit
etree_iterparse.patch serhiy.storchaka, 2015-11-21 11:06 review
etree_iterparse_2.patch serhiy.storchaka, 2015-11-26 00:05 review
etree_start_handler_no_attrib.patch serhiy.storchaka, 2015-12-09 18:15 review
Messages (20)
msg254746 - (view) Author: Brett Cannon (brett.cannon) * (Python committer) Date: 2015-11-16 19:41
If you look at bit.ly/pycon-ca-keynote you will notice that the etree_parse and etree_iterparse benchmarks were horrible for everyone. Because of how badly everyone seemed to do, I think the benchmarks should be verified to be doing reasonable things on implementations other than CPython 2.7.
msg254748 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2015-11-16 20:07
Would you have a quick summary for those not willing to watch a whole keynote?
msg254751 - (view) Author: Brett Cannon (brett.cannon) * (Python committer) Date: 2015-11-16 20:14
That link is to a Jupyter notebook so you don't have to watch anything. Plus the video is not even up yet so you can't skip the keynote even if you wanted to since you can't watch it yet. :)
msg254752 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2015-11-16 20:32
Ok, so when you say "horrible for everyone", this is really IronPython and Jython, right? :-) Other runtimes seem to do ok (perhaps not stellar, but ok).
msg254753 - (view) Author: Brett Cannon (brett.cannon) * (Python committer) Date: 2015-11-16 20:38
Well, Jython and IronPython obviously did the worst, but even Python 3 didn't do as well as I would have expected, so I still want to double-check the benchmarks to see if it's obvious why CPython 2.7 beats out everyone.
msg254763 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2015-11-16 22:23
I think these histograms would look better with logarithmic scale.
msg254764 - (view) Author: Brett Cannon (brett.cannon) * (Python committer) Date: 2015-11-16 22:33
Let's not pollute the issue with a critique of my notebook. You can feel free to email me personally to discuss it if you want, including why I purposefully didn't use a logarithmic scale.
msg255021 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2015-11-20 21:34
Sorry Brett.

How tests were ran? There are two implementations of ElementTree, accelerated and non-accelerated. xml.etree.ElementTree by default is accelerated in Python 3, but non-accelerated in Python 2.

$ python2.7 bm_elementtree.py -n 7 --take_geo_mean 
0.463665158795
$ python2.7 bm_elementtree.py -n 7 --take_geo_mean --etree-module=xml.etree.ElementTree
5.46309932568
$ python3.4 bm_elementtree.py -n 7 --take_geo_mean --etree-module=xml.etree.ElementTree
0.813397633467649
$ python3.4 bm_elementtree.py -n 7 --take_geo_mean --etree-module=xml.etree.ElementTree --no-accelerator
5.31174765817514

If run the test with the same options --etree-module=xml.etree.ElementTree, it will use accelerated implementation in Python 3 and non-accelerated in Python 2.
msg255024 - (view) Author: Brett Cannon (brett.cannon) * (Python committer) Date: 2015-11-20 22:18
The commands I used are in the notebook for each implementation and you can get the same result with `python3 perf.py -b etree python2 python3`.
msg255027 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2015-11-20 22:44
The slowing down Python 3 can be related to adding XMLPullParser (issue17741).
msg255050 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2015-11-21 11:06
Proposed patch optimizes iterparse(). Now it is only 33% slower than in 2.7 (was 2.6 times slower).
msg255394 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2015-11-26 00:05
Updated to tip.
msg255936 - (view) Author: Brett Cannon (brett.cannon) * (Python committer) Date: 2015-12-05 08:33
Serhiy's latest patch LGTM.
msg256013 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2015-12-06 15:01
Thank you for your review Brett. First than apply this optimization I want to fix errors propagating issue (issue25814). The patch for it is mainly the simplified part of the patch for this issue.
msg256039 - (view) Author: Roundup Robot (python-dev) Date: 2015-12-07 00:31
New changeset dd67c8c53aea by Serhiy Storchaka in branch 'default':
Issue #25638: Optimized ElementTree.iterparse(); it is now 2x faster.
https://hg.python.org/cpython/rev/dd67c8c53aea
msg256059 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2015-12-07 12:18
The iterparse benchmark in 3.6 still is 30% slower than in 2.7. The parse benchmark is 70% slower. Hence there are other causes of the slowing down.

One of causes is that in 3.x an empty dict instead of None is passed to start handler as attrib parameter if the start tag has no attributes. This makes parsing parsing about 10% slower.
msg256158 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2015-12-09 18:15
Following patch speeds up ElementTree parsing (the result of the etree parse benchmark is improved by 10%). Actually it restores 2.7 code and avoids creating an empty dict for attributes if not needed.
msg256167 - (view) Author: Roundup Robot (python-dev) Date: 2015-12-10 07:52
New changeset 1fe904420c20 by Serhiy Storchaka in branch 'default':
Issue #25638: Optimized ElementTree parsing; it is now 10% faster.
https://hg.python.org/cpython/rev/1fe904420c20
msg256172 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2015-12-10 09:05
Thank you for your review Brett. Now the parse benchmark in 3.6 is only 50% slower than in 2.7. Will continue to find bottlenecks.
msg261651 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2016-03-12 13:55
I am not able to find the cause of the slowdown.

I think this issue can be closed now. The etree_parse and etree_iterparse benchmarks are working appropriately and showing real regression in CPython 3.x. The cause of the regression is not known.
History
Date User Action Args
2016-03-12 17:21:48brett.cannonsetstatus: open -> closed
resolution: not a bug
2016-03-12 13:55:48serhiy.storchakasetassignee: serhiy.storchaka ->
messages: + msg261651
2015-12-10 09:05:11serhiy.storchakasetmessages: + msg256172
stage: patch review ->
2015-12-10 07:52:33python-devsetmessages: + msg256167
2015-12-09 18:15:52serhiy.storchakasetfiles: + etree_start_handler_no_attrib.patch

messages: + msg256158
stage: patch review
2015-12-07 12:18:06serhiy.storchakasetmessages: + msg256059
stage: commit review -> (no value)
2015-12-07 00:31:38python-devsetnosy: + python-dev
messages: + msg256039
2015-12-06 15:01:51serhiy.storchakasetdependencies: + Propagate all errors from ElementTree.iterparse
messages: + msg256013
2015-12-05 08:33:04brett.cannonsetassignee: brett.cannon -> serhiy.storchaka
messages: + msg255936
stage: patch review -> commit review
2015-11-26 09:40:50serhiy.storchakalinkissue25707 dependencies
2015-11-26 00:05:54serhiy.storchakasetfiles: + etree_iterparse_2.patch

messages: + msg255394
2015-11-21 11:06:46serhiy.storchakasetfiles: + etree_iterparse.patch

type: performance
components: + Extension Modules, Library (Lib), XML
versions: + Python 3.6
keywords: + patch
nosy: + scoder, eli.bendersky

messages: + msg255050
stage: patch review
2015-11-20 22:44:24serhiy.storchakasetmessages: + msg255027
2015-11-20 22:18:20brett.cannonsetmessages: + msg255024
2015-11-20 21:34:36serhiy.storchakasetmessages: + msg255021
2015-11-16 22:33:12brett.cannonsetmessages: + msg254764
2015-11-16 22:23:12serhiy.storchakasetnosy: + serhiy.storchaka
messages: + msg254763
2015-11-16 20:38:33brett.cannonsetmessages: + msg254753
2015-11-16 20:32:54pitrousetmessages: + msg254752
2015-11-16 20:14:14brett.cannonsetmessages: + msg254751
2015-11-16 20:07:56pitrousetmessages: + msg254748
2015-11-16 19:41:33brett.cannonsetassignee: brett.cannon
2015-11-16 19:41:23brett.cannoncreate