This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author Martin Hosken
Recipients Martin Hosken, eli.bendersky, scoder, serhiy.storchaka
Date 2018-09-10.03:46:23
SpamBayes Score -1.0
Marked as misclassified Yes
Message-id <1536551184.54.0.56676864532.issue34600@psf.upfronthosting.co.za>
In-reply-to
Content
Sorry. This test is rather long because it is 3 tests:

from __future__ import print_function
import sys
import xml.etree.ElementTree as et
import xml.etree.cElementTree as cet
from io import StringIO

teststr = u"""<?xml version="1"?>
<root>
    <child>
        Hello <!-- Greeting --> World
    </child>
</root>"""
testf = StringIO(teststr)

if len(sys.argv) >= 2 and 'a' in sys.argv[1]:
    testf.seek(0)
    for event, elem in et.iterparse(testf, events=["end", "comment"]):
        if event == 'end':
            print(elem.tag + ": " + str(elem.text))
        elif event == 'comment':
            print("comment: " + elem.text)

if len(sys.argv) < 2 or 'b' in sys.argv[1]:
    testf.seek(0)
    def doComment(data):
        parser.parser.StartElementHandler("!--", ('text', data))
        parser.parser.EndElementHandler("!--")
    parser = et.XMLParser()
    parser.parser.CommentHandler = doComment
    for event, elem in et.iterparse(testf, parser=parser):
        if hasattr(elem, 'text'):
            print(elem.tag + ": " + str(elem.text))
        else:
            print(elem.tag + ": " + elem.get('text', ""))

if len(sys.argv) < 2 or 'c' in sys.argv[1] or 'd' in sys.argv[1]:
    testf.seek(0)
    useet = et if len(sys.argv) < 2 or 'c' in sys.argv[1] else cet
    class CommentingTb(useet.TreeBuilder):
        def __init__(self):
            self.parser = None
        def comment(self, data):
            self.parser.parser.StartElementHandler("!--", ('text', data))
            self.parser.parser.EndElementHandler("!--")
    tb = CommentingTb()
    parser = useet.XMLParser(target=tb)
    tb.parser = parser
    kw = {'parser': parser} if len(sys.argv) < 2 or 'c' in sys.argv[1] else {}
    for event, elem in useet.iterparse(testf, **kw):
        if hasattr(elem, 'text'):
            print(elem.tag + ": " + str(elem.text))
        else:
            print(elem.tag + ": " + elem.get('text', ""))

Test 'a' is how I would like to write the solution to my problem. Not sure why 'comment' isn't supported by iterparse directly, but hey.

Test 'b' is how I solved in it python2

Test 'c' is how I would have to solve it in python3 if it worked

Test 'd' is the same as 'c' but uses cElementTree rather than ElementTree.

Results:

Success output for a test is:
```
!--: None
child: 
        Hello 
root: 
    
```

Python2:
a    Fails (obviously)
b    Succeeds
c    Succeeds
d    Fails: can't inherit from cElementTree.TreeBuilder

Python3:
a    Fails (obviously)
b    Fails: XMLParser has no attribute 'parser'
c    Fails: event handling only supported for ElementTree.TreeBuilder targets
d    Fails: Gives output but no initial comment component (line 1)

The key failure here is Python3 'c'. This is what stops any hope of comment handling using the et.XMLParser. The only way I could get around it was to use my own copy from the source code.
History
Date User Action Args
2018-09-10 03:46:24Martin Hoskensetrecipients: + Martin Hosken, scoder, eli.bendersky, serhiy.storchaka
2018-09-10 03:46:24Martin Hoskensetmessageid: <1536551184.54.0.56676864532.issue34600@psf.upfronthosting.co.za>
2018-09-10 03:46:24Martin Hoskenlinkissue34600 messages
2018-09-10 03:46:23Martin Hoskencreate