Title: ElementTree won't parse comments
Type: enhancement Stage: resolved
Components: XML Versions: Python 3.2
Status: closed Resolution: wont fix
Dependencies: Superseder:
Assigned To: Nosy List: amaury.forgeotdarc, effbot, flox, poke, scoder
Priority: normal Keywords:

Created on 2010-04-01 01:37 by poke, last changed 2011-10-29 02:35 by flox. This issue is now closed.

Messages (5)
msg102051 - (view) Author: Patrick Westerhoff (poke) Date: 2010-04-01 01:37
When using xml.etree.ElementTree to parse external XML files, all XML comments within that file are being stripped out. I guess that happens because there is no comment handler in the expat parser.


  <nodeA />
  <!-- some comment -->
  <nodeB />
from xml.etree import ElementTree
with open( 'test.xml', 'r' ) as f:
    xml = ElementTree.parse( f )
ElementTree.dump( xml )

  <nodeA />

  <nodeB />
msg102078 - (view) Author: Amaury Forgeot d'Arc (amaury.forgeotdarc) * (Python committer) Date: 2010-04-01 09:01
ElementTree does parse comments, it just omit them in the tree.
A quick search lead me to this page:
which can be further simplified:

from xml.etree import ElementTree
class MyTreeBuilder(ElementTree.TreeBuilder):
   def comment(self, data):
       self.start(ElementTree.Comment, {})
with open('c:/temp/t.xml', 'r') as f:
   xml = ElementTree.parse(
       f, parser=ElementTree.XMLParser(target=MyTreeBuilder()))

Now, should ElementTree do this by default? It's not certain, see how effbot's sample needs to wrap the entire file into another 'document' element.
msg102110 - (view) Author: Patrick Westerhoff (poke) Date: 2010-04-01 17:24
Thanks for your reply, Amaury. That page really might mean that it was not intended for ElementTree to parse such things by default. Although it might be nice if there was some easy way to simply enable it, instead of having to hack it into there and depending on details of some internal code (which might change in the future).

Your code btw. didn't work for me, but based on it and on that effbot page, I came up with the following solution, which works fine.
from xml.etree import ElementTree

class CommentedTreeBuilder ( ElementTree.XMLTreeBuilder ):
    def __init__ ( self, html = 0, target = None ):
        ElementTree.XMLTreeBuilder.__init__( self, html, target )
        self._parser.CommentHandler = self.handle_comment
    def handle_comment ( self, data ):
        self._target.start( ElementTree.Comment, {} ) data )
        self._target.end( ElementTree.Comment )

with open( 'test.xml', 'r' ) as f:
    xml = ElementTree.parse( f, parser = CommentedTreeBuilder() )
ElementTree.dump( xml )
msg102112 - (view) Author: Amaury Forgeot d'Arc (amaury.forgeotdarc) * (Python committer) Date: 2010-04-01 17:29
yes, my code uses the newer version of ElementTree which will be included with 2.7 and 3.2
msg113322 - (view) Author: Florent Xicluna (flox) * (Python committer) Date: 2010-08-08 21:06
IIUC it works like that by design.
The ElementTree 1.3 (which is part of Python 2.7 and 3.2) allows to define your own parser which parses comments (see previous comments).

Close as "won't fix"?
Date User Action Args
2011-10-29 02:35:58floxsetstatus: open -> closed
2010-08-08 21:06:58floxsettype: behavior -> enhancement
versions: + Python 3.2, - Python 3.1
nosy: + scoder

messages: + msg113322
resolution: wont fix
stage: resolved
2010-04-01 17:29:22amaury.forgeotdarcsetmessages: + msg102112
2010-04-01 17:24:05pokesetmessages: + msg102110
2010-04-01 13:35:26brian.curtinsetnosy: + flox
2010-04-01 09:01:50amaury.forgeotdarcsetnosy: + amaury.forgeotdarc, effbot
messages: + msg102078
2010-04-01 01:37:44pokecreate