classification
Title: Document Object Model API - validation
Type: behavior Stage:
Components: XML Versions: Python 3.6, Python 3.5, Python 2.7
process
Status: open Resolution:
Dependencies: Superseder:
Assigned To: Nosy List: Kyle.Keating, jocassid, martin.panter, pdeep5693, terry.reedy
Priority: normal Keywords:

Created on 2011-05-20 22:02 by Kyle.Keating, last changed 2016-12-23 08:39 by pdeep5693.

Files
File name Uploaded Description Edit
xmlNameVerification.py jocassid, 2013-07-28 02:49 code to validate xml element/attribute names
Messages (7)
msg136402 - (view) Author: Kyle Keating (Kyle.Keating) Date: 2011-05-20 22:02
I was doing some tests on using this library and I noticed xml elements and attribute names could be created with mal-formed xml because special characters which can break validation are not cleaned or converted from their literal forms. Only the attribute values are cleaned, but not the names.

For example

import xml.dom

...
doc.createElement("p></p>") 
...

will just embed a pair of p tags in the xml result. I thought that the xml spec did not permit <, >, &, \n etc. in the element name or attribute name? Could I get some clarification on this, thanks!
msg137142 - (view) Author: Terry J. Reedy (terry.reedy) * (Python committer) Date: 2011-05-28 18:35
I suspect you are right, but do not know the rules, and have never used the module. There is no particular person maintaining xml.dom.X at present.

Could you please fill in the ... after the import to give a complete minimal example that fails? Someone could then test it on 3.2
msg137487 - (view) Author: Kyle Keating (Kyle.Keating) Date: 2011-06-02 17:10
This looks to break pretty good... I did confirm this on 3.0, I'm guessing 3.2 is the same.

import sys
import xml.dom

doc = xml.dom.getDOMImplementation().createDocument(None, 'xml', None)
doc.firstChild.appendChild(doc.createElement('element00'))

element01 = doc.createElement('element01')
element01.setAttribute('attribute', "script><![CDATA[alert('script!');]]></script>")
doc.firstChild.appendChild(element01)

element02 = doc.createElement("script><![CDATA[alert('script!');]]></script>")
doc.firstChild.appendChild(element02)

element03 = doc.createElement("new line \n")

element03.setAttribute('attribute-name','new line \n')
doc.firstChild.appendChild(element03)

print doc.toprettyxml(indent="  ")

>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
output:

<?xml version="1.0" ?>
<xml>
  <element/>
  <element01 attribute="script&gt;&lt;![CDATA[alert('script!');]]&gt;&lt;/script
&gt;"/>
  <script><![CDATA[alert('script!');]]></script>/>
  <new line
 attribute-name="new line
"/>
</xml>
msg137488 - (view) Author: Kyle Keating (Kyle.Keating) Date: 2011-06-02 17:13
oops, the first xml element in the output should read  "<element00/>" not "<element/>"

just a typo! don't get confused!
msg193804 - (view) Author: John Cassidy (jocassid) Date: 2013-07-28 02:49
I added the line print(str(doc)) after the call to getDomImplementation and verified that the errors that I'm seeing are coming from the xml.dom.minidom implemenation of xml.dom.  Checking minidom.py I did not see any validation on the tagName that gets passed to createElement.  http://www.w3.org/TR/xml11/#NT-NameStartChar lists the format of allowed names.  Attached is a file containing the functions I was working on.  My thinking is that if the tagName is not valid a ValueError should be thrown.
msg258344 - (view) Author: Martin Panter (martin.panter) * (Python committer) Date: 2016-01-16 00:57
My limited understanding is that xml.dom and minidom are supposed to implement particular interfaces. So do these DOM interfaces specify if this validation should be done? If so, this would be a bug. Or is it just a question of whether Python should do extra validation not specified by the underlying DOM API?
msg283873 - (view) Author: Pradeep (pdeep5693) Date: 2016-12-23 08:39
xml minidom.py needs extra validation in setAttributes for certain special characters depending on the attribute name. Attribute values cannot have special characters like <,> and cant be nested as described in the example below

element01 = doc.createElement('element01')
element01.setAttribute('attribute', "script><![CDATA[alert('script!');]]></script>")
doc.firstChild.appendChild(element01)

script shouldn't be allowed as a value for an attribute and I feel it should throw an exception (Value Exception) and as described above <,> shouldn't be allowed as attributes are more like key-value pairs. Could someone tell me if this is right? If it is, then minidom.py needs this extra level of validation for the same
History
Date User Action Args
2016-12-23 08:39:36pdeep5693setnosy: + pdeep5693
messages: + msg283873
2016-01-16 00:57:27martin.pantersetversions: + Python 3.5, Python 3.6
nosy: + martin.panter

messages: + msg258344

components: + XML, - Library (Lib)
stage: test needed ->
2016-01-16 00:44:53martin.panterlinkissue5166 dependencies
2013-07-28 02:49:49jocassidsetfiles: + xmlNameVerification.py
nosy: + jocassid
messages: + msg193804

2011-06-02 17:13:17Kyle.Keatingsetmessages: + msg137488
2011-06-02 17:10:39Kyle.Keatingsetmessages: + msg137487
2011-05-28 18:35:29terry.reedysetnosy: + terry.reedy

messages: + msg137142
stage: test needed
2011-05-20 22:02:10Kyle.Keatingcreate