Issue 25638: Verify the etree_parse and etree_iterparse benchmarks are working appropriately

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

This issue has been migrated to GitHub: https://github.com/python/cpython/issues/69824

classification

Title:	Verify the etree_parse and etree_iterparse benchmarks are working appropriately
Type:	performance	Stage:
Components:	Benchmarks, Extension Modules, Library (Lib), XML	Versions:	Python 3.6

process

Status:	closed	Resolution:	not a bug
Dependencies:	25814	Superseder:
Assigned To:		Nosy List:	brett.cannon, eli.bendersky, pitrou, python-dev, scoder, serhiy.storchaka
Priority:	normal	Keywords:	patch

Created on 2015-11-16 19:41 by brett.cannon, last changed 2022-04-11 14:58 by admin. This issue is now closed.

Files
File name	Uploaded	Description	Edit
etree_iterparse.patch	serhiy.storchaka, 2015-11-21 11:06		review
etree_iterparse_2.patch	serhiy.storchaka, 2015-11-26 00:05		review
etree_start_handler_no_attrib.patch	serhiy.storchaka, 2015-12-09 18:15		review

Pull Requests
URL	Status	Linked	Edit
PR 11169		vstinner, 2018-12-14 22:10

Messages (20)
msg254746 - (view)	Author: Brett Cannon (brett.cannon) *	Date: 2015-11-16 19:41
If you look at bit.ly/pycon-ca-keynote you will notice that the etree_parse and etree_iterparse benchmarks were horrible for everyone. Because of how badly everyone seemed to do, I think the benchmarks should be verified to be doing reasonable things on implementations other than CPython 2.7.
msg254748 - (view)	Author: Antoine Pitrou (pitrou) *	Date: 2015-11-16 20:07
Would you have a quick summary for those not willing to watch a whole keynote?
msg254751 - (view)	Author: Brett Cannon (brett.cannon) *	Date: 2015-11-16 20:14
That link is to a Jupyter notebook so you don't have to watch anything. Plus the video is not even up yet so you can't skip the keynote even if you wanted to since you can't watch it yet. :)
msg254752 - (view)	Author: Antoine Pitrou (pitrou) *	Date: 2015-11-16 20:32
Ok, so when you say "horrible for everyone", this is really IronPython and Jython, right? :-) Other runtimes seem to do ok (perhaps not stellar, but ok).
msg254753 - (view)	Author: Brett Cannon (brett.cannon) *	Date: 2015-11-16 20:38
Well, Jython and IronPython obviously did the worst, but even Python 3 didn't do as well as I would have expected, so I still want to double-check the benchmarks to see if it's obvious why CPython 2.7 beats out everyone.
msg254763 - (view)	Author: Serhiy Storchaka (serhiy.storchaka) *	Date: 2015-11-16 22:23
I think these histograms would look better with logarithmic scale.
msg254764 - (view)	Author: Brett Cannon (brett.cannon) *	Date: 2015-11-16 22:33
Let's not pollute the issue with a critique of my notebook. You can feel free to email me personally to discuss it if you want, including why I purposefully didn't use a logarithmic scale.
msg255021 - (view)	Author: Serhiy Storchaka (serhiy.storchaka) *	Date: 2015-11-20 21:34
Sorry Brett. How tests were ran? There are two implementations of ElementTree, accelerated and non-accelerated. xml.etree.ElementTree by default is accelerated in Python 3, but non-accelerated in Python 2. $ python2.7 bm_elementtree.py -n 7 --take_geo_mean 0.463665158795 $ python2.7 bm_elementtree.py -n 7 --take_geo_mean --etree-module=xml.etree.ElementTree 5.46309932568 $ python3.4 bm_elementtree.py -n 7 --take_geo_mean --etree-module=xml.etree.ElementTree 0.813397633467649 $ python3.4 bm_elementtree.py -n 7 --take_geo_mean --etree-module=xml.etree.ElementTree --no-accelerator 5.31174765817514 If run the test with the same options --etree-module=xml.etree.ElementTree, it will use accelerated implementation in Python 3 and non-accelerated in Python 2.
msg255024 - (view)	Author: Brett Cannon (brett.cannon) *	Date: 2015-11-20 22:18
The commands I used are in the notebook for each implementation and you can get the same result with `python3 perf.py -b etree python2 python3`.
msg255027 - (view)	Author: Serhiy Storchaka (serhiy.storchaka) *	Date: 2015-11-20 22:44
The slowing down Python 3 can be related to adding XMLPullParser (issue17741).
msg255050 - (view)	Author: Serhiy Storchaka (serhiy.storchaka) *	Date: 2015-11-21 11:06
Proposed patch optimizes iterparse(). Now it is only 33% slower than in 2.7 (was 2.6 times slower).
msg255394 - (view)	Author: Serhiy Storchaka (serhiy.storchaka) *	Date: 2015-11-26 00:05
Updated to tip.
msg255936 - (view)	Author: Brett Cannon (brett.cannon) *	Date: 2015-12-05 08:33
Serhiy's latest patch LGTM.
msg256013 - (view)	Author: Serhiy Storchaka (serhiy.storchaka) *	Date: 2015-12-06 15:01
Thank you for your review Brett. First than apply this optimization I want to fix errors propagating issue (issue25814). The patch for it is mainly the simplified part of the patch for this issue.
msg256039 - (view)	Author: Roundup Robot (python-dev)	Date: 2015-12-07 00:31
New changeset dd67c8c53aea by Serhiy Storchaka in branch 'default': Issue #25638: Optimized ElementTree.iterparse(); it is now 2x faster. https://hg.python.org/cpython/rev/dd67c8c53aea
msg256059 - (view)	Author: Serhiy Storchaka (serhiy.storchaka) *	Date: 2015-12-07 12:18
The iterparse benchmark in 3.6 still is 30% slower than in 2.7. The parse benchmark is 70% slower. Hence there are other causes of the slowing down. One of causes is that in 3.x an empty dict instead of None is passed to start handler as attrib parameter if the start tag has no attributes. This makes parsing parsing about 10% slower.
msg256158 - (view)	Author: Serhiy Storchaka (serhiy.storchaka) *	Date: 2015-12-09 18:15
Following patch speeds up ElementTree parsing (the result of the etree parse benchmark is improved by 10%). Actually it restores 2.7 code and avoids creating an empty dict for attributes if not needed.
msg256167 - (view)	Author: Roundup Robot (python-dev)	Date: 2015-12-10 07:52
New changeset 1fe904420c20 by Serhiy Storchaka in branch 'default': Issue #25638: Optimized ElementTree parsing; it is now 10% faster. https://hg.python.org/cpython/rev/1fe904420c20
msg256172 - (view)	Author: Serhiy Storchaka (serhiy.storchaka) *	Date: 2015-12-10 09:05
Thank you for your review Brett. Now the parse benchmark in 3.6 is only 50% slower than in 2.7. Will continue to find bottlenecks.
msg261651 - (view)	Author: Serhiy Storchaka (serhiy.storchaka) *	Date: 2016-03-12 13:55
I am not able to find the cause of the slowdown. I think this issue can be closed now. The etree_parse and etree_iterparse benchmarks are working appropriately and showing real regression in CPython 3.x. The cause of the regression is not known.

History
Date	User	Action	Args
2022-04-11 14:58:23	admin	set	github: 69824
2018-12-14 22:10:01	vstinner	set	pull_requests: + pull_request10409
2016-03-12 17:21:48	brett.cannon	set	status: open -> closed resolution: not a bug
2016-03-12 13:55:48	serhiy.storchaka	set	assignee: serhiy.storchaka -> messages: + msg261651
2015-12-10 09:05:11	serhiy.storchaka	set	messages: + msg256172 stage: patch review ->
2015-12-10 07:52:33	python-dev	set	messages: + msg256167
2015-12-09 18:15:52	serhiy.storchaka	set	files: + etree_start_handler_no_attrib.patch messages: + msg256158 stage: patch review
2015-12-07 12:18:06	serhiy.storchaka	set	messages: + msg256059 stage: commit review -> (no value)
2015-12-07 00:31:38	python-dev	set	nosy: + python-dev messages: + msg256039
2015-12-06 15:01:51	serhiy.storchaka	set	dependencies: + Propagate all errors from ElementTree.iterparse messages: + msg256013
2015-12-05 08:33:04	brett.cannon	set	assignee: brett.cannon -> serhiy.storchaka messages: + msg255936 stage: patch review -> commit review
2015-11-26 09:40:50	serhiy.storchaka	link	issue25707 dependencies
2015-11-26 00:05:54	serhiy.storchaka	set	files: + etree_iterparse_2.patch messages: + msg255394
2015-11-21 11:06:46	serhiy.storchaka	set	files: + etree_iterparse.patch type: performance components: + Extension Modules, Library (Lib), XML versions: + Python 3.6 keywords: + patch nosy: + scoder, eli.bendersky messages: + msg255050 stage: patch review
2015-11-20 22:44:24	serhiy.storchaka	set	messages: + msg255027
2015-11-20 22:18:20	brett.cannon	set	messages: + msg255024
2015-11-20 21:34:36	serhiy.storchaka	set	messages: + msg255021
2015-11-16 22:33:12	brett.cannon	set	messages: + msg254764
2015-11-16 22:23:12	serhiy.storchaka	set	nosy: + serhiy.storchaka messages: + msg254763
2015-11-16 20:38:33	brett.cannon	set	messages: + msg254753
2015-11-16 20:32:54	pitrou	set	messages: + msg254752
2015-11-16 20:14:14	brett.cannon	set	messages: + msg254751
2015-11-16 20:07:56	pitrou	set	messages: + msg254748
2015-11-16 19:41:33	brett.cannon	set	assignee: brett.cannon
2015-11-16 19:41:23	brett.cannon	create