Issue 36407: xml.dom.minidom wrong indentation writing for CDATA section

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

This issue has been migrated to GitHub: https://github.com/python/cpython/issues/80588

classification

Title:	xml.dom.minidom wrong indentation writing for CDATA section
Type:	enhancement	Stage:	resolved
Components:	XML	Versions:	Python 3.8

process

Status:	closed	Resolution:	fixed
Dependencies:		Superseder:
Assigned To:		Nosy List:	eli.bendersky, scoder, serhiy.storchaka, vsurjaninov
Priority:	normal	Keywords:	patch

Created on 2019-03-23 15:38 by vsurjaninov, last changed 2022-04-11 14:59 by admin. This issue is now closed.

Pull Requests
URL	Status	Linked	Edit
PR 12514	merged	vsurjaninov, 2019-03-23 16:00
PR 12578	closed	miss-islington, 2019-03-27 06:19

Messages (5)
msg338681 - (view)	Author: Vladimir Surjaninov (vsurjaninov) *	Date: 2019-03-23 15:38
If we are writing xml with CDATA section and leaving non-empty indentation and new-line parameters, a parent node of the section will contain useless indentation, that will be parsed as a text. Example: >>>doc = minidom.Document() >>>root = doc.createElement('root') >>>doc.appendChild(root) >>>node = doc.createElement('node') >>>root.appendChild(node) >>>data = doc.createCDATASection('</data>') >>>node.appendChild(data) >>>print(doc.toprettyxml(indent=‘ ‘ * 4) <?xml version="1.0" ?> <root> <node> <![CDATA[</data>]]> </node> </root> If we try to parse this output doc, we won’t get CDATA value correctly. Following code returns a string that contains only indentation characters: >>>doc = minidom.parseString(xml_text) >>>doc.getElementsByTagName('node')[0].firstChild.nodeValue Returns a string with CDATA value and indentation characters: >>>doc.getElementsByTagName('node')[0].firstChild.wholeText But we have a workaround: >>>data.nodeType = data.TEXT_NODE … >>>print(doc.toprettyxml(indent=‘ ‘ * 4) <?xml version="1.0" ?> <root> <node><![CDATA[</data>]]></node> </root> It will be parsed correctly: >>>doc.getElementsByTagName('node')[0].firstChild.nodeValue </data> But I think it will be better if we fix the writing function, which would set this as default behavior.
msg338701 - (view)	Author: Stefan Behnel (scoder) *	Date: 2019-03-23 21:33
Yes, this case is incorrect. Pretty printing should not change character content inside of a simple tag. The PR looks good to me.
msg338936 - (view)	Author: Serhiy Storchaka (serhiy.storchaka) *	Date: 2019-03-27 05:59
New changeset 384b81d923addd52125e94470b11d2574ca266a9 by Serhiy Storchaka (Vladimir Surjaninov) in branch 'master': bpo-36407: Fix writing indentations of CDATA section (xml.dom.minidom). (GH-12514) https://github.com/python/cpython/commit/384b81d923addd52125e94470b11d2574ca266a9
msg338939 - (view)	Author: Serhiy Storchaka (serhiy.storchaka) *	Date: 2019-03-27 06:19
Should we backport this change? I am not sure.
msg338943 - (view)	Author: Stefan Behnel (scoder) *	Date: 2019-03-27 07:04
I don't think this should be backported. Pretty-printing is not a production relevant feature, more of a "debugging, diffing and help users see what they get" kind of feature. It's good to have it fixed for the future, but we shouldn't bother users with it during a point release.

History
Date	User	Action	Args
2022-04-11 14:59:12	admin	set	github: 80588
2019-03-27 12:08:27	serhiy.storchaka	set	status: open -> closed resolution: fixed stage: patch review -> resolved
2019-03-27 07:04:43	scoder	set	messages: + msg338943
2019-03-27 06:19:42	serhiy.storchaka	set	messages: + msg338939
2019-03-27 06:19:22	miss-islington	set	pull_requests: + pull_request12522
2019-03-27 05:59:02	serhiy.storchaka	set	messages: + msg338936
2019-03-23 21:33:28	scoder	set	messages: + msg338701 versions: + Python 3.8
2019-03-23 16:00:14	vsurjaninov	set	keywords: + patch stage: patch review pull_requests: + pull_request12465
2019-03-23 15:40:39	xtreak	set	nosy: + scoder, eli.bendersky, serhiy.storchaka
2019-03-23 15:38:49	vsurjaninov	create