classification
Title: Update ElementTree with upstream changes
Type: behavior Stage: resolved
Components: Documentation, Library (Lib) Versions: Python 3.2, Python 2.7
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: effbot Nosy List: MLModel, effbot, flox, georg.brandl, milko.krachounov, pitrou
Priority: normal Keywords: patch

Created on 2009-07-13 00:55 by MLModel, last changed 2010-03-14 01:45 by flox. This issue is now closed.

Files
File name Uploaded Description Edit
issue6472_upstream_docs.diff flox, 2009-12-14 08:44 Patch for documentation.
issue6472_upstream_py3k_v3.diff flox, 2010-03-12 12:03 Patch, apply to 3.x
Messages (20)
msg90465 - (view) Author: Mitchell Model (MLModel) Date: 2009-07-13 00:55
I can't quite sort this out, because it's difficult to see what is
intended. The documentation of xml.etree.ElementTree (19.11 in the
Library doc) uses terms like "iterator", "tree iterator", "iterable",
"list" in vague and perhaps not quite accurate ways. I can't tell from
the documentation which functions/methods return lists, which return a
generator, which return an unspecified kind of iterable, and so on.
Moreover, the results are different using ElementTree than they are
using cElementTree. In particular, getiterator() returns a list in
ElementTree and a generator in cElementTree. This can make a substantial
difference in performance when iterating over a large number of nodes
(in addition to cElementTree's parsing being what appears to be about
10x faster).

I think someone should go over the page and sort this out and make it
clear what the user can expect. (I don't think it's fair to
overgeneralize to things like "iterables" if the module is really meant
to be making a commitment to a list or a generator.) I also think that
the differences in the results of methods returned in the Python and C
versions of the module should be highlighted.

I stumbled on this trying to parses and extract individual bits of
information out of large XML files. I full well realize there are better
ways to do this (SAX, e.g.) and better ways to search than just iterate
over all the tags of the type I'm interested in, but I should still know
what to expect from ElementTree, especially because it is so wonderful!
msg95990 - (view) Author: Milko Krachounov (milko.krachounov) Date: 2009-12-05 13:19
This isn't just a documentation issue. A function named getiterator(),
for which the docs say that it returns an iterator, should return an
iterator, not just an iterable. They have different semantics and can't
be used interchangeably, so the behaviour of getiterator() in
ElementTree is wrong. I was using this in my program:

iterator = element.getiterator()
next(iterator)
subelement = next(iterator)

Which broke when I tried switching to ElementTree from cElementTree,
even though the docs tell me that I'll get an iterator there.

Also, for findall() and friends, is there any reason why we can't stick
to either an iterator or list, and not both? The API will be more clear
if findall() always returned a list, or always an iterator, regardless
of the implementation. It is currently not clear what will happen if I do:

for x in tree.findall(path):
     mutate_tree(tree, x)
msg96000 - (view) Author: Florent Xicluna (flox) * (Python committer) Date: 2009-12-05 19:32
There's many differences between both implementations.
I don't know if we can live with them or not.

~ $ ./python 
Python 3.1.1+ (release31-maint:76650, Dec  3 2009, 17:14:50) 
[GCC 4.3.2] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> from xml.etree import ElementTree as ET, cElementTree as cET
>>> from io import StringIO
>>> SAMPLE = '<root/>'
>>> IO_SAMPLE = StringIO(SAMPLE)


With ElementTree

>>> elt = ET.XML(SAMPLE)
>>> elt.getiterator()
[<Element root at 15cb920>]
>>> elt.findall('')  # or '.'
[<Element root at 15cb920>]
>>> elt.findall('./')
[<Element root at 15cb920>]
>>> elt.items()
dict_items([])
>>> elt.keys()
dict_keys([])
>>> elt[:]
[]
>>> IO_SAMPLE.seek(0)
>>> next(ET.iterparse(IO_SAMPLE))
('end', <Element root at 15d60d0>)
>>> IO_SAMPLE.seek(0)
>>> list(ET.iterparse(IO_SAMPLE))
[('end', <Element root at 15583e0>)]


With cElementTree

>>> elt_c = cET.XML(SAMPLE)
>>> elt_c.getiterator()
<generator object getiterator at 0x15baae0>
>>> elt_c.findall('')
[]
>>> elt_c.findall('./')
[<Element 'root' at 0x15cf3a0>]
>>> elt_c.items()
[]
>>> elt_c.keys()
[]
>>> elt_c[:]
Traceback (most recent call last):
TypeError: sequence index must be integer, not 'slice'
>>> IO_SAMPLE.seek(0)
>>> next(cET.iterparse(IO_SAMPLE))
Traceback (most recent call last):
TypeError: iterparse object is not an iterator
>>> IO_SAMPLE.seek(0)
>>> list(cET.iterparse(IO_SAMPLE))
[(b'end', <Element 'root' at 0x15cf940>)]
msg96023 - (view) Author: Florent Xicluna (flox) * (Python committer) Date: 2009-12-06 11:51
Proposed patch fixes most of the discrepancies between both implementations.

It restores some features that were lost with Python 3:
 * cElement slicing and extended slicing
 * iterparse, cET.getiterator and cET.findall return an iterator
   (as documented)

Some tests were added to check these issues.
msg96040 - (view) Author: Florent Xicluna (flox) * (Python committer) Date: 2009-12-06 21:16
I fixed it differently, using the upstream modules (Thank you Fredrik).
 * ElementTree 1.3a3-20070912
 * cElementTree 1.0.6-20090110

It works.
And it closes issue1143, too.
msg96048 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2009-12-07 11:10
The patch should have doc updates for new functionality, if any.
msg96049 - (view) Author: Florent Xicluna (flox) * (Python committer) Date: 2009-12-07 12:38
I see some new features in the changelog.
I will try to update the documentation during the week.

(patch "py3k" fixed: support assignment of arbitrary sequences)
msg96181 - (view) Author: Florent Xicluna (flox) * (Python committer) Date: 2009-12-09 21:30
Patch for the documentation. (source: upstream documentation)
msg96373 - (view) Author: Florent Xicluna (flox) * (Python committer) Date: 2009-12-14 08:27
Small update of the patch for 3.2: the __cmp__method is replaced with
__eq__ method (on CommentProxy and PIProxy).
msg97607 - (view) Author: Florent Xicluna (flox) * (Python committer) Date: 2010-01-11 21:40
It would be nice to upgrade ElementTree for 2.7 and 3.2, at least.
msg99137 - (view) Author: Florent Xicluna (flox) * (Python committer) Date: 2010-02-09 22:28
Patch updated, with upstream packages:
 * ElementTree 1.3a3-20070912
 * cElementTree 1.0.6-20090110

Now all tests are identical for the ElementTree part:
 - ElementTree 2.x
 - cElementTree 2.x
 - ElementTree 3.x
 - cElementTree 3.x

Waiting for some developer kind enough to review and merge in 2.7 and 3.2.
msg99138 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2010-02-09 23:22
Given the size of the patch, it's very difficult to review properly.
In any case, could you upload it to http://codereview.appspot.com/ ?
msg99139 - (view) Author: Florent Xicluna (flox) * (Python committer) Date: 2010-02-09 23:31
Ok, will do the upload to rietveld.

In addition to the straight review of the patch itself, you could:
 - diff against the upstream source code (very few changes)
 - diff between 2.x and 3.x
 - review the test_suite (there's only additions, no real change)
 - hunt refleaks

Btw, I've backported the last tests (#2746, #6233) to all 4 test files (ET and cET, 2.x and 3.x).
msg99140 - (view) Author: Florent Xicluna (flox) * (Python committer) Date: 2010-02-09 23:51
Here it is:
 * http://codereview.appspot.com/207048/show
msg99449 - (view) Author: Florent Xicluna (flox) * (Python committer) Date: 2010-02-16 23:21
Update the 2.x patch with the last version uploaded to rietveld (patch set 5).

Improved test coverage with upstream tests and tests cases provided by Neil on issue #6232.

Note: the patch for 3.x is obsolete.
msg99466 - (view) Author: Florent Xicluna (flox) * (Python committer) Date: 2010-02-17 11:48
Strip out the experimental C API.
msg100856 - (view) Author: Florent Xicluna (flox) * (Python committer) Date: 2010-03-11 14:40
Fixed on trunk with r78838.
Some extra work is required to port it to 3.x.

Thank you Fredrik and Antoine for reviewing this patch.
msg100881 - (view) Author: Fredrik Lundh (effbot) * (Python committer) Date: 2010-03-11 19:02
W00t!
msg100928 - (view) Author: Florent Xicluna (flox) * (Python committer) Date: 2010-03-12 12:03
Patch to merge ElementTree 1.3 in 3.x.
msg101037 - (view) Author: Florent Xicluna (flox) * (Python committer) Date: 2010-03-14 01:45
Merged in 3.x with r78942 and r78945.

See #8047 for a discussion about the `encoding` argument of the serializer (used for .write() method and tostring() tostringlist() functions).
Currently the output is not encoded by default in 3.1 and 3.x.
It is encoded to ASCII in 2.6 and 2.x.
History
Date User Action Args
2010-03-14 01:45:21floxsetstatus: open -> closed

messages: + msg101037
2010-03-12 12:04:06floxsetfiles: - issue6472_etree_upstream_v5a.diff
2010-03-12 12:03:59floxsetfiles: - issue6472_etree_upstream_py3k_v2.diff
2010-03-12 12:03:40floxsetfiles: + issue6472_upstream_py3k_v3.diff

messages: + msg100928
2010-03-11 19:02:19effbotsetmessages: + msg100881
2010-03-11 15:57:40floxlinkissue6266 superseder
2010-03-11 15:57:40floxunlinkissue6266 dependencies
2010-03-11 15:01:45floxlinkissue6232 superseder
2010-03-11 15:01:45floxunlinkissue6232 dependencies
2010-03-11 15:00:15floxlinkissue6265 superseder
2010-03-11 15:00:15floxunlinkissue6265 dependencies
2010-03-11 14:59:13floxlinkissue6230 superseder
2010-03-11 14:59:13floxunlinkissue6230 dependencies
2010-03-11 14:57:27floxlinkissue6565 superseder
2010-03-11 14:57:27floxunlinkissue6565 dependencies
2010-03-11 14:53:28floxlinkissue3151 superseder
2010-03-11 14:53:28floxunlinkissue3151 dependencies
2010-03-11 14:51:35floxunlinkissue3475 dependencies
2010-03-11 14:51:35floxlinkissue3475 superseder
2010-03-11 14:49:26floxlinkissue1538691 superseder
2010-03-11 14:49:26floxunlinkissue1538691 dependencies
2010-03-11 14:40:17floxsetresolution: fixed
messages: + msg100856
stage: patch review -> resolved
2010-02-23 15:48:04floxlinkissue7990 dependencies
2010-02-17 11:49:01floxsetfiles: + issue6472_etree_upstream_v5a.diff

messages: + msg99466
2010-02-17 11:47:35floxsetfiles: - issue6472_etree_upstream_v5.diff
2010-02-16 23:21:46floxsetfiles: + issue6472_etree_upstream_v5.diff

messages: + msg99449
2010-02-16 23:19:42floxsetfiles: - issue6472_etree_upstream_v2.diff
2010-02-16 21:58:35floxlinkissue6266 dependencies
2010-02-16 13:17:50floxlinkissue6232 dependencies
2010-02-16 13:13:41floxlinkissue6265 dependencies
2010-02-16 13:11:28floxlinkissue6230 dependencies
2010-02-16 12:13:29floxlinkissue6565 dependencies
2010-02-16 11:58:48floxlinkissue3151 dependencies
2010-02-16 11:46:13floxlinkissue1777 superseder
2010-02-16 11:43:34floxlinkissue1767933 dependencies
2010-02-13 16:01:18floxlinkissue1538691 dependencies
2010-02-13 15:57:38floxlinkissue3475 dependencies
2010-02-10 12:16:40pitrousettitle: Inconsistent use of "iterator" in ElementTree doc & diff between Py and C modules -> Update ElementTree with upstream changes
2010-02-09 23:51:19floxsetmessages: + msg99140
2010-02-09 23:31:53floxsetmessages: + msg99139
2010-02-09 23:22:20pitrousetmessages: + msg99138
2010-02-09 22:29:17floxsetfiles: + issue6472_etree_upstream_py3k_v2.diff
2010-02-09 22:28:22floxsetfiles: + issue6472_etree_upstream_v2.diff

messages: + msg99137
2010-02-09 22:22:16floxsetfiles: - issue6472_upstream_py3k_v2.diff
2010-02-09 22:22:10floxsetfiles: - issue6472_upstream.diff
2010-01-11 21:40:18floxsetmessages: + msg97607
versions: - Python 2.6, Python 3.1
2009-12-14 08:44:14floxsetfiles: + issue6472_upstream_docs.diff
2009-12-14 08:43:14floxsetfiles: - issue6472_upstream_docs.diff
2009-12-14 08:28:11floxsetfiles: - issue6472_upstream_py3k.diff
2009-12-14 08:27:48floxsetfiles: + issue6472_upstream_py3k_v2.diff

messages: + msg96373
2009-12-09 21:31:02floxsetfiles: + issue6472_upstream_docs.diff

messages: + msg96181
2009-12-07 12:39:11floxsetfiles: - issue6472_upstream_py3k.diff
2009-12-07 12:38:59floxsetfiles: + issue6472_upstream_py3k.diff

messages: + msg96049
2009-12-07 11:11:13pitroulinkissue1143 superseder
2009-12-07 11:10:42pitrousetpriority: normal

nosy: + pitrou
messages: + msg96048

stage: patch review
2009-12-07 08:22:19floxsetfiles: - issue6472.diff
2009-12-07 08:22:14floxsetfiles: - issue6472_py3k.diff
2009-12-07 08:21:36floxsetfiles: + issue6472_upstream_py3k.diff
versions: - Python 3.0
2009-12-06 21:16:33floxsetfiles: + issue6472_upstream.diff

messages: + msg96040
2009-12-06 11:51:56floxsetfiles: + issue6472_py3k.diff
2009-12-06 11:51:21floxsetfiles: + issue6472.diff
keywords: + patch
messages: + msg96023
2009-12-05 19:32:49floxsetmessages: + msg96000
2009-12-05 17:11:01floxsetnosy: + flox
2009-12-05 13:19:11milko.krachounovsetversions: + Python 2.6, Python 2.7
nosy: + milko.krachounov

messages: + msg95990

components: + Library (Lib)
type: behavior
2009-07-13 02:24:16benjamin.petersonsetassignee: georg.brandl -> effbot
2009-07-13 01:32:33jcsalteregosetnosy: + effbot
2009-07-13 00:55:52MLModelcreate