classification
Title: Document Object Model API - validation
Type: behavior Stage: test needed
Components: Library (Lib) Versions: Python 2.7
process
Status: open Resolution:
Dependencies: Superseder:
Assigned To: Nosy List: Kyle.Keating, terry.reedy
Priority: normal Keywords:

Created on 2011-05-20 22:02 by Kyle.Keating, last changed 2011-06-02 17:13 by Kyle.Keating.

Messages (4)
msg136402 - (view) Author: Kyle Keating (Kyle.Keating) Date: 2011-05-20 22:02
I was doing some tests on using this library and I noticed xml elements and attribute names could be created with mal-formed xml because special characters which can break validation are not cleaned or converted from their literal forms. Only the attribute values are cleaned, but not the names.

For example

import xml.dom

...
doc.createElement("p></p>") 
...

will just embed a pair of p tags in the xml result. I thought that the xml spec did not permit <, >, &, \n etc. in the element name or attribute name? Could I get some clarification on this, thanks!
msg137142 - (view) Author: Terry J. Reedy (terry.reedy) * (Python committer) Date: 2011-05-28 18:35
I suspect you are right, but do not know the rules, and have never used the module. There is no particular person maintaining xml.dom.X at present.

Could you please fill in the ... after the import to give a complete minimal example that fails? Someone could then test it on 3.2
msg137487 - (view) Author: Kyle Keating (Kyle.Keating) Date: 2011-06-02 17:10
This looks to break pretty good... I did confirm this on 3.0, I'm guessing 3.2 is the same.

import sys
import xml.dom

doc = xml.dom.getDOMImplementation().createDocument(None, 'xml', None)
doc.firstChild.appendChild(doc.createElement('element00'))

element01 = doc.createElement('element01')
element01.setAttribute('attribute', "script><![CDATA[alert('script!');]]></script>")
doc.firstChild.appendChild(element01)

element02 = doc.createElement("script><![CDATA[alert('script!');]]></script>")
doc.firstChild.appendChild(element02)

element03 = doc.createElement("new line \n")

element03.setAttribute('attribute-name','new line \n')
doc.firstChild.appendChild(element03)

print doc.toprettyxml(indent="  ")

>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
output:

<?xml version="1.0" ?>
<xml>
  <element/>
  <element01 attribute="script&gt;&lt;![CDATA[alert('script!');]]&gt;&lt;/script
&gt;"/>
  <script><![CDATA[alert('script!');]]></script>/>
  <new line
 attribute-name="new line
"/>
</xml>
msg137488 - (view) Author: Kyle Keating (Kyle.Keating) Date: 2011-06-02 17:13
oops, the first xml element in the output should read  "<element00/>" not "<element/>"

just a typo! don't get confused!
History
Date User Action Args
2011-06-02 17:13:17Kyle.Keatingsetmessages: + msg137488
2011-06-02 17:10:39Kyle.Keatingsetmessages: + msg137487
2011-05-28 18:35:29terry.reedysetnosy: + terry.reedy

messages: + msg137142
stage: test needed
2011-05-20 22:02:10Kyle.Keatingcreate