Message 90465 - Python tracker

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author	MLModel
Recipients	MLModel, georg.brandl
Date	2009-07-13.00:55:50
SpamBayes Score	2.7474134e-12
Marked as misclassified	No
Message-id	<1247446553.79.0.919218325186.issue6472@psf.upfronthosting.co.za>
In-reply-to

Content
I can't quite sort this out, because it's difficult to see what is intended. The documentation of xml.etree.ElementTree (19.11 in the Library doc) uses terms like "iterator", "tree iterator", "iterable", "list" in vague and perhaps not quite accurate ways. I can't tell from the documentation which functions/methods return lists, which return a generator, which return an unspecified kind of iterable, and so on. Moreover, the results are different using ElementTree than they are using cElementTree. In particular, getiterator() returns a list in ElementTree and a generator in cElementTree. This can make a substantial difference in performance when iterating over a large number of nodes (in addition to cElementTree's parsing being what appears to be about 10x faster). I think someone should go over the page and sort this out and make it clear what the user can expect. (I don't think it's fair to overgeneralize to things like "iterables" if the module is really meant to be making a commitment to a list or a generator.) I also think that the differences in the results of methods returned in the Python and C versions of the module should be highlighted. I stumbled on this trying to parses and extract individual bits of information out of large XML files. I full well realize there are better ways to do this (SAX, e.g.) and better ways to search than just iterate over all the tags of the type I'm interested in, but I should still know what to expect from ElementTree, especially because it is so wonderful!

I can't quite sort this out, because it's difficult to see what is
intended. The documentation of xml.etree.ElementTree (19.11 in the
Library doc) uses terms like "iterator", "tree iterator", "iterable",
"list" in vague and perhaps not quite accurate ways. I can't tell from
the documentation which functions/methods return lists, which return a
generator, which return an unspecified kind of iterable, and so on.
Moreover, the results are different using ElementTree than they are
using cElementTree. In particular, getiterator() returns a list in
ElementTree and a generator in cElementTree. This can make a substantial
difference in performance when iterating over a large number of nodes
(in addition to cElementTree's parsing being what appears to be about
10x faster).

I think someone should go over the page and sort this out and make it
clear what the user can expect. (I don't think it's fair to
overgeneralize to things like "iterables" if the module is really meant
to be making a commitment to a list or a generator.) I also think that
the differences in the results of methods returned in the Python and C
versions of the module should be highlighted.

I stumbled on this trying to parses and extract individual bits of
information out of large XML files. I full well realize there are better
ways to do this (SAX, e.g.) and better ways to search than just iterate
over all the tags of the type I'm interested in, but I should still know
what to expect from ElementTree, especially because it is so wonderful!

History
Date	User	Action	Args
2009-07-13 00:55:53	MLModel	set	recipients: + MLModel, georg.brandl
2009-07-13 00:55:53	MLModel	set	messageid: <1247446553.79.0.919218325186.issue6472@psf.upfronthosting.co.za>
2009-07-13 00:55:52	MLModel	link	issue6472 messages
2009-07-13 00:55:50	MLModel	create