Title: minidom parses comments wrongly
Type: Stage: resolved
Components: XML Versions: Python 2.7
Status: closed Resolution: not a bug
Dependencies: Superseder:
Assigned To: Nosy List: Wanat, ned.deily
Priority: normal Keywords:

Created on 2015-05-14 22:09 by Wanat, last changed 2022-04-11 14:58 by admin. This issue is now closed.

Messages (2)
msg243222 - (view) Author: Paweł (Wanat) Date: 2015-05-14 22:09
from xml.dom import minidom

html = """<html>
    <!-- <img src="/images/obraz--super.jpg"/> -->


Traceback (most recent call last):
  File "", line 10, in <module>
  File "/usr/lib/python2.7/xml/dom/", line 1928, in parseString
    return expatbuilder.parseString(string)
  File "/usr/lib/python2.7/xml/dom/", line 940, in parseString
    return builder.parseString(string)
  File "/usr/lib/python2.7/xml/dom/", line 223, in parseString
    parser.Parse(string, True)
xml.parsers.expat.ExpatError: not well-formed (invalid token): line 3, column 34

Tested versions:
2.7.6, 2.7.3

-- between obraz and super;
msg243241 - (view) Author: Ned Deily (ned.deily) * (Python committer) Date: 2015-05-15 02:31
Thanks for your report.  Alas, according to the W3C XML 1.0 specification:

"For compatibility, the string " -- " (double-hyphen) MUST NOT occur within comments." 

So, it appears minidom (and other XML parsers) are correct in rejecting your example as not well-formed XML.
