This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: minidom parses comments wrongly
Type: Stage: resolved
Components: XML Versions: Python 2.7
process
Status: closed Resolution: not a bug
Dependencies: Superseder:
Assigned To: Nosy List: Wanat, ned.deily
Priority: normal Keywords:

Created on 2015-05-14 22:09 by Wanat, last changed 2022-04-11 14:58 by admin. This issue is now closed.

Messages (2)
msg243222 - (view) Author: Paweł (Wanat) Date: 2015-05-14 22:09
from xml.dom import minidom

html = """<html>
  <body>
    <!-- <img src="/images/obraz--super.jpg"/> -->
  </body>
</html>"""


minidom.parseString(html)


Result:
Traceback (most recent call last):
  File "minidom.py", line 10, in <module>
    minidom.parseString(html)
  File "/usr/lib/python2.7/xml/dom/minidom.py", line 1928, in parseString
    return expatbuilder.parseString(string)
  File "/usr/lib/python2.7/xml/dom/expatbuilder.py", line 940, in parseString
    return builder.parseString(string)
  File "/usr/lib/python2.7/xml/dom/expatbuilder.py", line 223, in parseString
    parser.Parse(string, True)
xml.parsers.expat.ExpatError: not well-formed (invalid token): line 3, column 34


Tested versions:
2.7.6, 2.7.3

Reason:
-- between obraz and super;
msg243241 - (view) Author: Ned Deily (ned.deily) * (Python committer) Date: 2015-05-15 02:31
Thanks for your report.  Alas, according to the W3C XML 1.0 specification:

"For compatibility, the string " -- " (double-hyphen) MUST NOT occur within comments." 

So, it appears minidom (and other XML parsers) are correct in rejecting your example as not well-formed XML.

http://www.w3.org/TR/xml/#sec-comments
History
Date User Action Args
2022-04-11 14:58:16adminsetgithub: 68385
2015-05-15 02:31:39ned.deilysetstatus: open -> closed

type: crash ->

nosy: + ned.deily
messages: + msg243241
resolution: not a bug
stage: resolved
2015-05-14 22:09:14Wanatcreate