msg90465 - (view) |
Author: Mitchell Model (MLModel) |
Date: 2009-07-13 00:55 |
I can't quite sort this out, because it's difficult to see what is
intended. The documentation of xml.etree.ElementTree (19.11 in the
Library doc) uses terms like "iterator", "tree iterator", "iterable",
"list" in vague and perhaps not quite accurate ways. I can't tell from
the documentation which functions/methods return lists, which return a
generator, which return an unspecified kind of iterable, and so on.
Moreover, the results are different using ElementTree than they are
using cElementTree. In particular, getiterator() returns a list in
ElementTree and a generator in cElementTree. This can make a substantial
difference in performance when iterating over a large number of nodes
(in addition to cElementTree's parsing being what appears to be about
10x faster).
I think someone should go over the page and sort this out and make it
clear what the user can expect. (I don't think it's fair to
overgeneralize to things like "iterables" if the module is really meant
to be making a commitment to a list or a generator.) I also think that
the differences in the results of methods returned in the Python and C
versions of the module should be highlighted.
I stumbled on this trying to parses and extract individual bits of
information out of large XML files. I full well realize there are better
ways to do this (SAX, e.g.) and better ways to search than just iterate
over all the tags of the type I'm interested in, but I should still know
what to expect from ElementTree, especially because it is so wonderful!
|
msg95990 - (view) |
Author: Milko Krachounov (milko.krachounov) |
Date: 2009-12-05 13:19 |
This isn't just a documentation issue. A function named getiterator(),
for which the docs say that it returns an iterator, should return an
iterator, not just an iterable. They have different semantics and can't
be used interchangeably, so the behaviour of getiterator() in
ElementTree is wrong. I was using this in my program:
iterator = element.getiterator()
next(iterator)
subelement = next(iterator)
Which broke when I tried switching to ElementTree from cElementTree,
even though the docs tell me that I'll get an iterator there.
Also, for findall() and friends, is there any reason why we can't stick
to either an iterator or list, and not both? The API will be more clear
if findall() always returned a list, or always an iterator, regardless
of the implementation. It is currently not clear what will happen if I do:
for x in tree.findall(path):
mutate_tree(tree, x)
|
msg96000 - (view) |
Author: Florent Xicluna (flox) *  |
Date: 2009-12-05 19:32 |
There's many differences between both implementations.
I don't know if we can live with them or not.
~ $ ./python
Python 3.1.1+ (release31-maint:76650, Dec 3 2009, 17:14:50)
[GCC 4.3.2] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> from xml.etree import ElementTree as ET, cElementTree as cET
>>> from io import StringIO
>>> SAMPLE = '<root/>'
>>> IO_SAMPLE = StringIO(SAMPLE)
With ElementTree
>>> elt = ET.XML(SAMPLE)
>>> elt.getiterator()
[<Element root at 15cb920>]
>>> elt.findall('') # or '.'
[<Element root at 15cb920>]
>>> elt.findall('./')
[<Element root at 15cb920>]
>>> elt.items()
dict_items([])
>>> elt.keys()
dict_keys([])
>>> elt[:]
[]
>>> IO_SAMPLE.seek(0)
>>> next(ET.iterparse(IO_SAMPLE))
('end', <Element root at 15d60d0>)
>>> IO_SAMPLE.seek(0)
>>> list(ET.iterparse(IO_SAMPLE))
[('end', <Element root at 15583e0>)]
With cElementTree
>>> elt_c = cET.XML(SAMPLE)
>>> elt_c.getiterator()
<generator object getiterator at 0x15baae0>
>>> elt_c.findall('')
[]
>>> elt_c.findall('./')
[<Element 'root' at 0x15cf3a0>]
>>> elt_c.items()
[]
>>> elt_c.keys()
[]
>>> elt_c[:]
Traceback (most recent call last):
TypeError: sequence index must be integer, not 'slice'
>>> IO_SAMPLE.seek(0)
>>> next(cET.iterparse(IO_SAMPLE))
Traceback (most recent call last):
TypeError: iterparse object is not an iterator
>>> IO_SAMPLE.seek(0)
>>> list(cET.iterparse(IO_SAMPLE))
[(b'end', <Element 'root' at 0x15cf940>)]
|
msg96023 - (view) |
Author: Florent Xicluna (flox) *  |
Date: 2009-12-06 11:51 |
Proposed patch fixes most of the discrepancies between both implementations.
It restores some features that were lost with Python 3:
* cElement slicing and extended slicing
* iterparse, cET.getiterator and cET.findall return an iterator
(as documented)
Some tests were added to check these issues.
|
msg96040 - (view) |
Author: Florent Xicluna (flox) *  |
Date: 2009-12-06 21:16 |
I fixed it differently, using the upstream modules (Thank you Fredrik).
* ElementTree 1.3a3-20070912
* cElementTree 1.0.6-20090110
It works.
And it closes issue1143, too.
|
msg96048 - (view) |
Author: Antoine Pitrou (pitrou) *  |
Date: 2009-12-07 11:10 |
The patch should have doc updates for new functionality, if any.
|
msg96049 - (view) |
Author: Florent Xicluna (flox) *  |
Date: 2009-12-07 12:38 |
I see some new features in the changelog.
I will try to update the documentation during the week.
(patch "py3k" fixed: support assignment of arbitrary sequences)
|
msg96181 - (view) |
Author: Florent Xicluna (flox) *  |
Date: 2009-12-09 21:30 |
Patch for the documentation. (source: upstream documentation)
|
msg96373 - (view) |
Author: Florent Xicluna (flox) *  |
Date: 2009-12-14 08:27 |
Small update of the patch for 3.2: the __cmp__method is replaced with
__eq__ method (on CommentProxy and PIProxy).
|
msg97607 - (view) |
Author: Florent Xicluna (flox) *  |
Date: 2010-01-11 21:40 |
It would be nice to upgrade ElementTree for 2.7 and 3.2, at least.
|
msg99137 - (view) |
Author: Florent Xicluna (flox) *  |
Date: 2010-02-09 22:28 |
Patch updated, with upstream packages:
* ElementTree 1.3a3-20070912
* cElementTree 1.0.6-20090110
Now all tests are identical for the ElementTree part:
- ElementTree 2.x
- cElementTree 2.x
- ElementTree 3.x
- cElementTree 3.x
Waiting for some developer kind enough to review and merge in 2.7 and 3.2.
|
msg99138 - (view) |
Author: Antoine Pitrou (pitrou) *  |
Date: 2010-02-09 23:22 |
Given the size of the patch, it's very difficult to review properly.
In any case, could you upload it to http://codereview.appspot.com/ ?
|
msg99139 - (view) |
Author: Florent Xicluna (flox) *  |
Date: 2010-02-09 23:31 |
Ok, will do the upload to rietveld.
In addition to the straight review of the patch itself, you could:
- diff against the upstream source code (very few changes)
- diff between 2.x and 3.x
- review the test_suite (there's only additions, no real change)
- hunt refleaks
Btw, I've backported the last tests (#2746, #6233) to all 4 test files (ET and cET, 2.x and 3.x).
|
msg99140 - (view) |
Author: Florent Xicluna (flox) *  |
Date: 2010-02-09 23:51 |
Here it is:
* http://codereview.appspot.com/207048/show
|
msg99449 - (view) |
Author: Florent Xicluna (flox) *  |
Date: 2010-02-16 23:21 |
Update the 2.x patch with the last version uploaded to rietveld (patch set 5).
Improved test coverage with upstream tests and tests cases provided by Neil on issue #6232.
Note: the patch for 3.x is obsolete.
|
msg99466 - (view) |
Author: Florent Xicluna (flox) *  |
Date: 2010-02-17 11:48 |
Strip out the experimental C API.
|
msg100856 - (view) |
Author: Florent Xicluna (flox) *  |
Date: 2010-03-11 14:40 |
Fixed on trunk with r78838.
Some extra work is required to port it to 3.x.
Thank you Fredrik and Antoine for reviewing this patch.
|
msg100881 - (view) |
Author: Fredrik Lundh (effbot) *  |
Date: 2010-03-11 19:02 |
W00t!
|
msg100928 - (view) |
Author: Florent Xicluna (flox) *  |
Date: 2010-03-12 12:03 |
Patch to merge ElementTree 1.3 in 3.x.
|
msg101037 - (view) |
Author: Florent Xicluna (flox) *  |
Date: 2010-03-14 01:45 |
Merged in 3.x with r78942 and r78945.
See #8047 for a discussion about the `encoding` argument of the serializer (used for .write() method and tostring() tostringlist() functions).
Currently the output is not encoded by default in 3.1 and 3.x.
It is encoded to ASCII in 2.6 and 2.x.
|
|
Date |
User |
Action |
Args |
2022-04-11 14:56:50 | admin | set | github: 50721 |
2010-03-14 01:45:21 | flox | set | status: open -> closed
messages:
+ msg101037 |
2010-03-12 12:04:06 | flox | set | files:
- issue6472_etree_upstream_v5a.diff |
2010-03-12 12:03:59 | flox | set | files:
- issue6472_etree_upstream_py3k_v2.diff |
2010-03-12 12:03:40 | flox | set | files:
+ issue6472_upstream_py3k_v3.diff
messages:
+ msg100928 |
2010-03-11 19:02:19 | effbot | set | messages:
+ msg100881 |
2010-03-11 15:57:40 | flox | link | issue6266 superseder |
2010-03-11 15:57:40 | flox | unlink | issue6266 dependencies |
2010-03-11 15:01:45 | flox | link | issue6232 superseder |
2010-03-11 15:01:45 | flox | unlink | issue6232 dependencies |
2010-03-11 15:00:15 | flox | link | issue6265 superseder |
2010-03-11 15:00:15 | flox | unlink | issue6265 dependencies |
2010-03-11 14:59:13 | flox | link | issue6230 superseder |
2010-03-11 14:59:13 | flox | unlink | issue6230 dependencies |
2010-03-11 14:57:27 | flox | link | issue6565 superseder |
2010-03-11 14:57:27 | flox | unlink | issue6565 dependencies |
2010-03-11 14:53:28 | flox | link | issue3151 superseder |
2010-03-11 14:53:28 | flox | unlink | issue3151 dependencies |
2010-03-11 14:51:35 | flox | unlink | issue3475 dependencies |
2010-03-11 14:51:35 | flox | link | issue3475 superseder |
2010-03-11 14:49:26 | flox | link | issue1538691 superseder |
2010-03-11 14:49:26 | flox | unlink | issue1538691 dependencies |
2010-03-11 14:40:17 | flox | set | resolution: fixed messages:
+ msg100856 stage: patch review -> resolved |
2010-02-23 15:48:04 | flox | link | issue7990 dependencies |
2010-02-17 11:49:01 | flox | set | files:
+ issue6472_etree_upstream_v5a.diff
messages:
+ msg99466 |
2010-02-17 11:47:35 | flox | set | files:
- issue6472_etree_upstream_v5.diff |
2010-02-16 23:21:46 | flox | set | files:
+ issue6472_etree_upstream_v5.diff
messages:
+ msg99449 |
2010-02-16 23:19:42 | flox | set | files:
- issue6472_etree_upstream_v2.diff |
2010-02-16 21:58:35 | flox | link | issue6266 dependencies |
2010-02-16 13:17:50 | flox | link | issue6232 dependencies |
2010-02-16 13:13:41 | flox | link | issue6265 dependencies |
2010-02-16 13:11:28 | flox | link | issue6230 dependencies |
2010-02-16 12:13:29 | flox | link | issue6565 dependencies |
2010-02-16 11:58:48 | flox | link | issue3151 dependencies |
2010-02-16 11:46:13 | flox | link | issue1777 superseder |
2010-02-16 11:43:34 | flox | link | issue1767933 dependencies |
2010-02-13 16:01:18 | flox | link | issue1538691 dependencies |
2010-02-13 15:57:38 | flox | link | issue3475 dependencies |
2010-02-10 12:16:40 | pitrou | set | title: Inconsistent use of "iterator" in ElementTree doc & diff between Py and C modules -> Update ElementTree with upstream changes |
2010-02-09 23:51:19 | flox | set | messages:
+ msg99140 |
2010-02-09 23:31:53 | flox | set | messages:
+ msg99139 |
2010-02-09 23:22:20 | pitrou | set | messages:
+ msg99138 |
2010-02-09 22:29:17 | flox | set | files:
+ issue6472_etree_upstream_py3k_v2.diff |
2010-02-09 22:28:22 | flox | set | files:
+ issue6472_etree_upstream_v2.diff
messages:
+ msg99137 |
2010-02-09 22:22:16 | flox | set | files:
- issue6472_upstream_py3k_v2.diff |
2010-02-09 22:22:10 | flox | set | files:
- issue6472_upstream.diff |
2010-01-11 21:40:18 | flox | set | messages:
+ msg97607 versions:
- Python 2.6, Python 3.1 |
2009-12-14 08:44:14 | flox | set | files:
+ issue6472_upstream_docs.diff |
2009-12-14 08:43:14 | flox | set | files:
- issue6472_upstream_docs.diff |
2009-12-14 08:28:11 | flox | set | files:
- issue6472_upstream_py3k.diff |
2009-12-14 08:27:48 | flox | set | files:
+ issue6472_upstream_py3k_v2.diff
messages:
+ msg96373 |
2009-12-09 21:31:02 | flox | set | files:
+ issue6472_upstream_docs.diff
messages:
+ msg96181 |
2009-12-07 12:39:11 | flox | set | files:
- issue6472_upstream_py3k.diff |
2009-12-07 12:38:59 | flox | set | files:
+ issue6472_upstream_py3k.diff
messages:
+ msg96049 |
2009-12-07 11:11:13 | pitrou | link | issue1143 superseder |
2009-12-07 11:10:42 | pitrou | set | priority: normal
nosy:
+ pitrou messages:
+ msg96048
stage: patch review |
2009-12-07 08:22:19 | flox | set | files:
- issue6472.diff |
2009-12-07 08:22:14 | flox | set | files:
- issue6472_py3k.diff |
2009-12-07 08:21:36 | flox | set | files:
+ issue6472_upstream_py3k.diff versions:
- Python 3.0 |
2009-12-06 21:16:33 | flox | set | files:
+ issue6472_upstream.diff
messages:
+ msg96040 |
2009-12-06 11:51:56 | flox | set | files:
+ issue6472_py3k.diff |
2009-12-06 11:51:21 | flox | set | files:
+ issue6472.diff keywords:
+ patch messages:
+ msg96023
|
2009-12-05 19:32:49 | flox | set | messages:
+ msg96000 |
2009-12-05 17:11:01 | flox | set | nosy:
+ flox
|
2009-12-05 13:19:11 | milko.krachounov | set | versions:
+ Python 2.6, Python 2.7 nosy:
+ milko.krachounov
messages:
+ msg95990
components:
+ Library (Lib) type: behavior |
2009-07-13 02:24:16 | benjamin.peterson | set | assignee: georg.brandl -> effbot |
2009-07-13 01:32:33 | jcsalterego | set | nosy:
+ effbot
|
2009-07-13 00:55:52 | MLModel | create | |