Issue 17573: add ElementTree XML processing benchmark to benchmark suite

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

This issue has been migrated to GitHub: https://github.com/python/cpython/issues/61773

classification

Title:	add ElementTree XML processing benchmark to benchmark suite
Type:	enhancement	Stage:	resolved
Components:	Benchmarks, XML	Versions:

process

Status:	closed	Resolution:
Dependencies:		Superseder:
Assigned To:		Nosy List:	brett.cannon, pitrou, python-dev, scoder
Priority:	normal	Keywords:	patch

Created on 2013-03-29 11:36 by scoder, last changed 2022-04-11 14:57 by admin. This issue is now closed.

Files
File name	Uploaded	Description	Edit
add_et_benchmark_v1.patch	scoder, 2013-03-29 11:36	Benchmark for ElementTree implementations
add_et_benchmark.patch	scoder, 2013-03-30 09:27	updated benchmark implementation

Messages (8)
msg185496 - (view)	Author: Stefan Behnel (scoder) *	Date: 2013-03-29 11:36
Here is an artificial but pretty broad ElementTree benchmark, testing the modules xml.etree.ElementTree, xml.etree.cElementTree and lxml.etree (if importable). Please add it to the benchmark suite.
msg185505 - (view)	Author: Antoine Pitrou (pitrou) *	Date: 2013-03-29 17:53
Haven't looked at the patch in detail but a couple of things: - I don't think we need to benchmark the slow pure-Python ET, except when the fast version isn't present (basically, the main benchmark should try cET and then fallback on ET) - I'm ok with lxml being benchmarked, but only if a well-defined version is included in the source tree (the problem being of course that it's not pure Python, so it will have to be built on the fly, assuming all dependencies are present :-/)
msg185507 - (view)	Author: Stefan Behnel (scoder) *	Date: 2013-03-29 18:08
I considered lxml.etree support more of a convenience feature, just for comparison. Given that it's a binary package that doesn't run reliably on other Python implementations apart of CPython, I don't think it's really interesting to make it part of the benchmark suite. I'd rather add an explicit option to enable it than include it there. I'm ok with the conditional import for ET, although I don't see a reason to exclude it. Why not be able to compare the performance of both implementations as well? There's a slowpickle benchmark, for example. So, what about only testing cET by default and adding an explicit option "--etree-module=package.module" to change the imported module, e.g. "--etree-module=lxml.etree" to benchmark lxml or "--etree-module=cElementTree" to benchmark a separately installed 3rd party cET package?
msg185508 - (view)	Author: Antoine Pitrou (pitrou) *	Date: 2013-03-29 18:12
> I'm ok with the conditional import for ET, although I don't see a > reason to exclude it. Why not be able to compare the performance of > both implementations as well? There's a slowpickle benchmark, for > example. It made sense in 2.7 where both implementations were visibly selectable (and the pure Python ones were arguably the "default" choice since their names were less obtuse). But in 3.3 the C accelerator is automatically enabled when importing xml.etree. So I don't think making a difference makes much sense anymore. > So, what about only testing cET by default and adding an explicit > option "--etree-module=package.module" to change the imported module, > e.g. "--etree-module=lxml.etree" to benchmark lxml or > "--etree-module=cElementTree" to benchmark a separately installed 3rd > party cET package? Well, we could still add a "lxml" benchmark but disable it by default (I mean not make it part of the main sub-suites). That way people can run it explicitly if they want. (also, since it's a "lxml" benchmark, it may test other things than simply the etree API, if you like)
msg185551 - (view)	Author: Stefan Behnel (scoder) *	Date: 2013-03-30 09:27
Ok, but an lxml benchmark is independent from this patch then. I updated it to only use cElementTree, with the additional "--etree-module" option and also a "--no-accelerator" option for advanced usage. Another thing I did is to split the actual benchmark code into three: one that does parse-process-serialise, one that only does process-serialise, and one that does only generate-serialise. The intention is to also represent the use case of just getting stuff out, instead of always getting everything in through the parser.
msg186659 - (view)	Author: Roundup Robot (python-dev)	Date: 2013-04-12 18:22
New changeset dbf5693d7013 by Antoine Pitrou in branch 'default': Issue #17573: add three elementtree benchmarks. Initial patch by Stefan Behnel. http://hg.python.org/benchmarks/rev/dbf5693d7013
msg186663 - (view)	Author: Roundup Robot (python-dev)	Date: 2013-04-12 18:46
New changeset 723d7f134cf5 by Antoine Pitrou in branch 'default': Tweak etree benchmarks and add an etree_iterparse benchmark (followup to issue #17573). http://hg.python.org/benchmarks/rev/723d7f134cf5
msg186664 - (view)	Author: Antoine Pitrou (pitrou) *	Date: 2013-04-12 18:48
I've committed the benchmarks after some changes: - smaller document to reduce runtimes - avoid measuring processing and serializing performance as part of the parsing benchmark - added an iterparse benchmark (iterparse can be important for e.g. XMPP implementations)

History
Date	User	Action	Args
2022-04-11 14:57:43	admin	set	github: 61773
2013-04-12 18:48:39	pitrou	set	status: open -> closed messages: + msg186664 stage: resolved
2013-04-12 18:46:25	python-dev	set	messages: + msg186663
2013-04-12 18:22:40	python-dev	set	nosy: + python-dev messages: + msg186659
2013-03-30 09:27:50	scoder	set	files: + add_et_benchmark.patch messages: + msg185551
2013-03-29 18:12:59	pitrou	set	messages: + msg185508
2013-03-29 18:08:45	scoder	set	messages: + msg185507
2013-03-29 17:53:47	pitrou	set	messages: + msg185505
2013-03-29 11:46:30	scoder	set	nosy: + brett.cannon, pitrou
2013-03-29 11:36:44	scoder	create