Message 338681 - Python tracker

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author	vsurjaninov
Recipients	vsurjaninov
Date	2019-03-23.15:38:48
SpamBayes Score	-1.0
Marked as misclassified	Yes
Message-id	<1553355529.11.0.873645774038.issue36407@roundup.psfhosted.org>
In-reply-to

Content
If we are writing xml with CDATA section and leaving non-empty indentation and new-line parameters, a parent node of the section will contain useless indentation, that will be parsed as a text. Example: >>>doc = minidom.Document() >>>root = doc.createElement('root') >>>doc.appendChild(root) >>>node = doc.createElement('node') >>>root.appendChild(node) >>>data = doc.createCDATASection('</data>') >>>node.appendChild(data) >>>print(doc.toprettyxml(indent=‘ ‘ * 4) <?xml version="1.0" ?> <root> <node> <![CDATA[</data>]]> </node> </root> If we try to parse this output doc, we won’t get CDATA value correctly. Following code returns a string that contains only indentation characters: >>>doc = minidom.parseString(xml_text) >>>doc.getElementsByTagName('node')[0].firstChild.nodeValue Returns a string with CDATA value and indentation characters: >>>doc.getElementsByTagName('node')[0].firstChild.wholeText But we have a workaround: >>>data.nodeType = data.TEXT_NODE … >>>print(doc.toprettyxml(indent=‘ ‘ * 4) <?xml version="1.0" ?> <root> <node><![CDATA[</data>]]></node> </root> It will be parsed correctly: >>>doc.getElementsByTagName('node')[0].firstChild.nodeValue </data> But I think it will be better if we fix the writing function, which would set this as default behavior.

If we are writing xml with CDATA section and leaving non-empty indentation and new-line parameters, a parent node of the section will contain useless indentation, that will be parsed as a text.

Example:
>>>doc = minidom.Document()
>>>root = doc.createElement('root')
>>>doc.appendChild(root)
>>>node = doc.createElement('node')
>>>root.appendChild(node)
>>>data = doc.createCDATASection('</data>')
>>>node.appendChild(data)
>>>print(doc.toprettyxml(indent=‘  ‘ * 4)
<?xml version="1.0" ?>
<root>
    <node>
<![CDATA[</data>]]>    </node>
</root>

If we try to parse this output doc, we won’t get CDATA value correctly.

Following code returns a string that contains only indentation characters:
>>>doc = minidom.parseString(xml_text)
>>>doc.getElementsByTagName('node')[0].firstChild.nodeValue

Returns a string with CDATA value and indentation characters:
>>>doc.getElementsByTagName('node')[0].firstChild.wholeText


But we have a workaround:
>>>data.nodeType = data.TEXT_NODE
…
>>>print(doc.toprettyxml(indent=‘  ‘ * 4)
<?xml version="1.0" ?>
<root>
    <node><![CDATA[</data>]]></node>
</root>

It will be parsed correctly:
>>>doc.getElementsByTagName('node')[0].firstChild.nodeValue
</data>

But I think it will be better if we fix the writing function, which would set this as default behavior.

History
Date	User	Action	Args
2019-03-23 15:38:49	vsurjaninov	set	recipients: + vsurjaninov
2019-03-23 15:38:49	vsurjaninov	set	messageid: <1553355529.11.0.873645774038.issue36407@roundup.psfhosted.org>
2019-03-23 15:38:49	vsurjaninov	link	issue36407 messages
2019-03-23 15:38:48	vsurjaninov	create