Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update ElementTree with upstream changes #50721

Closed
MLModel mannequin opened this issue Jul 13, 2009 · 20 comments
Closed

Update ElementTree with upstream changes #50721

MLModel mannequin opened this issue Jul 13, 2009 · 20 comments
Labels
docs Documentation in the Doc dir stdlib Python modules in the Lib dir type-bug An unexpected behavior, bug, or error

Comments

@MLModel
Copy link
Mannequin

MLModel mannequin commented Jul 13, 2009

BPO 6472
Nosy @birkenfeld, @pitrou, @MLModel, @florentx
Files
  • issue6472_upstream_docs.diff: Patch for documentation.
  • issue6472_upstream_py3k_v3.diff: Patch, apply to 3.x
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = None
    closed_at = <Date 2010-03-14.01:45:21.541>
    created_at = <Date 2009-07-13.00:55:52.322>
    labels = ['type-bug', 'library', 'docs']
    title = 'Update ElementTree with upstream changes'
    updated_at = <Date 2010-03-14.01:45:21.539>
    user = 'https://github.com/MLModel'

    bugs.python.org fields:

    activity = <Date 2010-03-14.01:45:21.539>
    actor = 'flox'
    assignee = 'effbot'
    closed = True
    closed_date = <Date 2010-03-14.01:45:21.541>
    closer = 'flox'
    components = ['Documentation', 'Library (Lib)']
    creation = <Date 2009-07-13.00:55:52.322>
    creator = 'MLModel'
    dependencies = []
    files = ['15553', '16528']
    hgrepos = []
    issue_num = 6472
    keywords = ['patch']
    message_count = 20.0
    messages = ['90465', '95990', '96000', '96023', '96040', '96048', '96049', '96181', '96373', '97607', '99137', '99138', '99139', '99140', '99449', '99466', '100856', '100881', '100928', '101037']
    nosy_count = 6.0
    nosy_names = ['effbot', 'georg.brandl', 'pitrou', 'MLModel', 'flox', 'milko.krachounov']
    pr_nums = []
    priority = 'normal'
    resolution = 'fixed'
    stage = 'resolved'
    status = 'closed'
    superseder = None
    type = 'behavior'
    url = 'https://bugs.python.org/issue6472'
    versions = ['Python 2.7', 'Python 3.2']

    @MLModel
    Copy link
    Mannequin Author

    MLModel mannequin commented Jul 13, 2009

    I can't quite sort this out, because it's difficult to see what is
    intended. The documentation of xml.etree.ElementTree (19.11 in the
    Library doc) uses terms like "iterator", "tree iterator", "iterable",
    "list" in vague and perhaps not quite accurate ways. I can't tell from
    the documentation which functions/methods return lists, which return a
    generator, which return an unspecified kind of iterable, and so on.
    Moreover, the results are different using ElementTree than they are
    using cElementTree. In particular, getiterator() returns a list in
    ElementTree and a generator in cElementTree. This can make a substantial
    difference in performance when iterating over a large number of nodes
    (in addition to cElementTree's parsing being what appears to be about
    10x faster).

    I think someone should go over the page and sort this out and make it
    clear what the user can expect. (I don't think it's fair to
    overgeneralize to things like "iterables" if the module is really meant
    to be making a commitment to a list or a generator.) I also think that
    the differences in the results of methods returned in the Python and C
    versions of the module should be highlighted.

    I stumbled on this trying to parses and extract individual bits of
    information out of large XML files. I full well realize there are better
    ways to do this (SAX, e.g.) and better ways to search than just iterate
    over all the tags of the type I'm interested in, but I should still know
    what to expect from ElementTree, especially because it is so wonderful!

    @MLModel MLModel mannequin assigned birkenfeld Jul 13, 2009
    @MLModel MLModel mannequin added the docs Documentation in the Doc dir label Jul 13, 2009
    @benjaminp benjaminp assigned effbot and unassigned birkenfeld Jul 13, 2009
    @milkokrachounov
    Copy link
    Mannequin

    milkokrachounov mannequin commented Dec 5, 2009

    This isn't just a documentation issue. A function named getiterator(),
    for which the docs say that it returns an iterator, should return an
    iterator, not just an iterable. They have different semantics and can't
    be used interchangeably, so the behaviour of getiterator() in
    ElementTree is wrong. I was using this in my program:

    iterator = element.getiterator()
    next(iterator)
    subelement = next(iterator)

    Which broke when I tried switching to ElementTree from cElementTree,
    even though the docs tell me that I'll get an iterator there.

    Also, for findall() and friends, is there any reason why we can't stick
    to either an iterator or list, and not both? The API will be more clear
    if findall() always returned a list, or always an iterator, regardless
    of the implementation. It is currently not clear what will happen if I do:

    for x in tree.findall(path):
         mutate_tree(tree, x)

    @milkokrachounov milkokrachounov mannequin added stdlib Python modules in the Lib dir type-bug An unexpected behavior, bug, or error labels Dec 5, 2009
    @florentx
    Copy link
    Mannequin

    florentx mannequin commented Dec 5, 2009

    There's many differences between both implementations.
    I don't know if we can live with them or not.

    ~ $ ./python 
    Python 3.1.1+ (release31-maint:76650, Dec  3 2009, 17:14:50) 
    [GCC 4.3.2] on linux2
    Type "help", "copyright", "credits" or "license" for more information.
    >>> from xml.etree import ElementTree as ET, cElementTree as cET
    >>> from io import StringIO
    >>> SAMPLE = '<root/>'
    >>> IO_SAMPLE = StringIO(SAMPLE)

    With ElementTree

    >>> elt = ET.XML(SAMPLE)
    >>> elt.getiterator()
    [<Element root at 15cb920>]
    >>> elt.findall('')  # or '.'
    [<Element root at 15cb920>]
    >>> elt.findall('./')
    [<Element root at 15cb920>]
    >>> elt.items()
    dict_items([])
    >>> elt.keys()
    dict_keys([])
    >>> elt[:]
    []
    >>> IO_SAMPLE.seek(0)
    >>> next(ET.iterparse(IO_SAMPLE))
    ('end', <Element root at 15d60d0>)
    >>> IO_SAMPLE.seek(0)
    >>> list(ET.iterparse(IO_SAMPLE))
    [('end', <Element root at 15583e0>)]

    With cElementTree

    >>> elt_c = cET.XML(SAMPLE)
    >>> elt_c.getiterator()
    <generator object getiterator at 0x15baae0>
    >>> elt_c.findall('')
    []
    >>> elt_c.findall('./')
    [<Element 'root' at 0x15cf3a0>]
    >>> elt_c.items()
    []
    >>> elt_c.keys()
    []
    >>> elt_c[:]
    Traceback (most recent call last):
    TypeError: sequence index must be integer, not 'slice'
    >>> IO_SAMPLE.seek(0)
    >>> next(cET.iterparse(IO_SAMPLE))
    Traceback (most recent call last):
    TypeError: iterparse object is not an iterator
    >>> IO_SAMPLE.seek(0)
    >>> list(cET.iterparse(IO_SAMPLE))
    [(b'end', <Element 'root' at 0x15cf940>)]

    @florentx
    Copy link
    Mannequin

    florentx mannequin commented Dec 6, 2009

    Proposed patch fixes most of the discrepancies between both implementations.

    It restores some features that were lost with Python 3:

    • cElement slicing and extended slicing
    • iterparse, cET.getiterator and cET.findall return an iterator
      (as documented)

    Some tests were added to check these issues.

    @florentx
    Copy link
    Mannequin

    florentx mannequin commented Dec 6, 2009

    I fixed it differently, using the upstream modules (Thank you Fredrik).

    • ElementTree 1.3a3-20070912
    • cElementTree 1.0.6-20090110

    It works.
    And it closes bpo-1143, too.

    @pitrou
    Copy link
    Member

    pitrou commented Dec 7, 2009

    The patch should have doc updates for new functionality, if any.

    @florentx
    Copy link
    Mannequin

    florentx mannequin commented Dec 7, 2009

    I see some new features in the changelog.
    I will try to update the documentation during the week.

    (patch "py3k" fixed: support assignment of arbitrary sequences)

    @florentx
    Copy link
    Mannequin

    florentx mannequin commented Dec 9, 2009

    Patch for the documentation. (source: upstream documentation)

    @florentx
    Copy link
    Mannequin

    florentx mannequin commented Dec 14, 2009

    Small update of the patch for 3.2: the __cmp__method is replaced with
    __eq__ method (on CommentProxy and PIProxy).

    @florentx
    Copy link
    Mannequin

    florentx mannequin commented Jan 11, 2010

    It would be nice to upgrade ElementTree for 2.7 and 3.2, at least.

    @florentx
    Copy link
    Mannequin

    florentx mannequin commented Feb 9, 2010

    Patch updated, with upstream packages:

    • ElementTree 1.3a3-20070912
    • cElementTree 1.0.6-20090110

    Now all tests are identical for the ElementTree part:

    • ElementTree 2.x
    • cElementTree 2.x
    • ElementTree 3.x
    • cElementTree 3.x

    Waiting for some developer kind enough to review and merge in 2.7 and 3.2.

    @pitrou
    Copy link
    Member

    pitrou commented Feb 9, 2010

    Given the size of the patch, it's very difficult to review properly.
    In any case, could you upload it to http://codereview.appspot.com/ ?

    @florentx
    Copy link
    Mannequin

    florentx mannequin commented Feb 9, 2010

    Ok, will do the upload to rietveld.

    In addition to the straight review of the patch itself, you could:

    • diff against the upstream source code (very few changes)
    • diff between 2.x and 3.x
    • review the test_suite (there's only additions, no real change)
    • hunt refleaks

    Btw, I've backported the last tests (bpo-2746, bpo-6233) to all 4 test files (ET and cET, 2.x and 3.x).

    @florentx
    Copy link
    Mannequin

    florentx mannequin commented Feb 9, 2010

    Here it is:

    @pitrou pitrou changed the title Inconsistent use of "iterator" in ElementTree doc & diff between Py and C modules Update ElementTree with upstream changes Feb 10, 2010
    @florentx
    Copy link
    Mannequin

    florentx mannequin commented Feb 16, 2010

    Update the 2.x patch with the last version uploaded to rietveld (patch set 5).

    Improved test coverage with upstream tests and tests cases provided by Neil on issue bpo-6232.

    Note: the patch for 3.x is obsolete.

    @florentx
    Copy link
    Mannequin

    florentx mannequin commented Feb 17, 2010

    Strip out the experimental C API.

    @florentx
    Copy link
    Mannequin

    florentx mannequin commented Mar 11, 2010

    Fixed on trunk with r78838.
    Some extra work is required to port it to 3.x.

    Thank you Fredrik and Antoine for reviewing this patch.

    @effbot
    Copy link
    Mannequin

    effbot mannequin commented Mar 11, 2010

    W00t!

    @florentx
    Copy link
    Mannequin

    florentx mannequin commented Mar 12, 2010

    Patch to merge ElementTree 1.3 in 3.x.

    @florentx
    Copy link
    Mannequin

    florentx mannequin commented Mar 14, 2010

    Merged in 3.x with r78942 and r78945.

    See bpo-8047 for a discussion about the encoding argument of the serializer (used for .write() method and tostring() tostringlist() functions).
    Currently the output is not encoded by default in 3.1 and 3.x.
    It is encoded to ASCII in 2.6 and 2.x.

    @florentx florentx mannequin closed this as completed Mar 14, 2010
    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    docs Documentation in the Doc dir stdlib Python modules in the Lib dir type-bug An unexpected behavior, bug, or error
    Projects
    None yet
    Development

    No branches or pull requests

    2 participants