This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: add ElementTree XML processing benchmark to benchmark suite
Type: enhancement Stage: resolved
Components: Benchmarks, XML Versions:
process
Status: closed Resolution:
Dependencies: Superseder:
Assigned To: Nosy List: brett.cannon, pitrou, python-dev, scoder
Priority: normal Keywords: patch

Created on 2013-03-29 11:36 by scoder, last changed 2022-04-11 14:57 by admin. This issue is now closed.

Files
File name Uploaded Description Edit
add_et_benchmark_v1.patch scoder, 2013-03-29 11:36 Benchmark for ElementTree implementations
add_et_benchmark.patch scoder, 2013-03-30 09:27 updated benchmark implementation
Messages (8)
msg185496 - (view) Author: Stefan Behnel (scoder) * (Python committer) Date: 2013-03-29 11:36
Here is an artificial but pretty broad ElementTree benchmark, testing the modules xml.etree.ElementTree, xml.etree.cElementTree and lxml.etree (if importable). Please add it to the benchmark suite.
msg185505 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2013-03-29 17:53
Haven't looked at the patch in detail but a couple of things:

- I don't think we need to benchmark the slow pure-Python ET, except when the fast version isn't present (basically, the main benchmark should try cET and then fallback on ET)

- I'm ok with lxml being benchmarked, but only if a well-defined version is included in the source tree (the problem being of course that it's not pure Python, so it will have to be built on the fly, assuming all dependencies are present :-/)
msg185507 - (view) Author: Stefan Behnel (scoder) * (Python committer) Date: 2013-03-29 18:08
I considered lxml.etree support more of a convenience feature, just for comparison. Given that it's a binary package that doesn't run reliably on other Python implementations apart of CPython, I don't think it's really interesting to make it part of the benchmark suite. I'd rather add an explicit option to enable it than include it there.

I'm ok with the conditional import for ET, although I don't see a reason to exclude it. Why not be able to compare the performance of both implementations as well? There's a slowpickle benchmark, for example.

So, what about only testing cET by default and adding an explicit option "--etree-module=package.module" to change the imported module, e.g. "--etree-module=lxml.etree" to benchmark lxml or "--etree-module=cElementTree" to benchmark a separately installed 3rd party cET package?
msg185508 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2013-03-29 18:12
> I'm ok with the conditional import for ET, although I don't see a
> reason to exclude it. Why not be able to compare the performance of
> both implementations as well? There's a slowpickle benchmark, for
> example.

It made sense in 2.7 where both implementations were visibly selectable
(and the pure Python ones were arguably the "default" choice since their
names were less obtuse). But in 3.3 the C accelerator is automatically
enabled when importing xml.etree. So I don't think making a difference
makes much sense anymore.

> So, what about only testing cET by default and adding an explicit
> option "--etree-module=package.module" to change the imported module,
> e.g. "--etree-module=lxml.etree" to benchmark lxml or
> "--etree-module=cElementTree" to benchmark a separately installed 3rd
> party cET package?

Well, we could still add a "lxml" benchmark but disable it by default (I
mean not make it part of the main sub-suites). That way people can run
it explicitly if they want.

(also, since it's a "lxml" benchmark, it may test other things than
simply the etree API, if you like)
msg185551 - (view) Author: Stefan Behnel (scoder) * (Python committer) Date: 2013-03-30 09:27
Ok, but an lxml benchmark is independent from this patch then. I updated it to only use cElementTree, with the additional "--etree-module" option and also a "--no-accelerator" option for advanced usage.

Another thing I did is to split the actual benchmark code into three: one that does parse-process-serialise, one that only does process-serialise, and one that does only generate-serialise. The intention is to also represent the use case of just getting stuff out, instead of always getting everything in through the parser.
msg186659 - (view) Author: Roundup Robot (python-dev) (Python triager) Date: 2013-04-12 18:22
New changeset dbf5693d7013 by Antoine Pitrou in branch 'default':
Issue #17573: add three elementtree benchmarks.  Initial patch by Stefan Behnel.
http://hg.python.org/benchmarks/rev/dbf5693d7013
msg186663 - (view) Author: Roundup Robot (python-dev) (Python triager) Date: 2013-04-12 18:46
New changeset 723d7f134cf5 by Antoine Pitrou in branch 'default':
Tweak etree benchmarks and add an etree_iterparse benchmark (followup to issue #17573).
http://hg.python.org/benchmarks/rev/723d7f134cf5
msg186664 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2013-04-12 18:48
I've committed the benchmarks after some changes:
- smaller document to reduce runtimes
- avoid measuring processing and serializing performance as part of the parsing benchmark
- added an iterparse benchmark (iterparse can be important for e.g. XMPP implementations)
History
Date User Action Args
2022-04-11 14:57:43adminsetgithub: 61773
2013-04-12 18:48:39pitrousetstatus: open -> closed

messages: + msg186664
stage: resolved
2013-04-12 18:46:25python-devsetmessages: + msg186663
2013-04-12 18:22:40python-devsetnosy: + python-dev
messages: + msg186659
2013-03-30 09:27:50scodersetfiles: + add_et_benchmark.patch

messages: + msg185551
2013-03-29 18:12:59pitrousetmessages: + msg185508
2013-03-29 18:08:45scodersetmessages: + msg185507
2013-03-29 17:53:47pitrousetmessages: + msg185505
2013-03-29 11:46:30scodersetnosy: + brett.cannon, pitrou
2013-03-29 11:36:44scodercreate