classification
Title: Allow for ]]> in CDATA in minidom
Type: behavior Stage: test needed
Components: XML Versions: Python 3.5
process
Status: closed Resolution: wont fix
Dependencies: Superseder:
Assigned To: Nosy List: arturcz, eric.araujo, loewis, peter.otten, serhiy.storchaka
Priority: normal Keywords: patch

Created on 2014-02-21 02:04 by arturcz, last changed 2014-02-24 09:27 by peter.otten. This issue is now closed.

Repositories containing patches
https://blabluga.hell.pl/hg/pub/cpython#issue20714
Messages (10)
msg211791 - (view) Author: Artur R. Czechowski (arturcz) * Date: 2014-02-21 02:04
Current support for ]]> inside CDATA is to raise an Exception. However, it could be solved by dividing the ]]> to two strings:
- ]]
- >
and each one is a separate CDATA inside elemement. So, to put ]]> inside CDATA one can write:
<element>
<![CDATA[]]]]><![CDATA[>]]>
</element>
msg211821 - (view) Author: Éric Araujo (eric.araujo) * (Python committer) Date: 2014-02-21 08:15
Does the XML spec allow that?
msg211842 - (view) Author: Artur R. Czechowski (arturcz) * Date: 2014-02-21 10:14
Eric, I'm not sure what exactly your concern is, but I'll try to address two issues I can see.

First: both strings <![CDATA[]]]]> and <![CDATA[>]]> are a correct and valid examples of CDATA usage as per specification[1].

Second: is it allowed to have two occurences of CDATA inside one element? The same specification says only that ‟CDATA sections may occur anywhere character data may occur”. There is nothing said if multiple occurrences are allowed or disallowed.
Wikipedia suggests in [2] that it is OK, giving the same example of embedding ]]> inside CDATA. There is no hints in Talk page that this solution doesn't work for someone.
In other example [3] there is explicitly stated that: ‟the [...] application shouldn't care about the difference between abc and <![CDATA[abc]]> and <![CDATA[a]]><![CDATA[bc]]>”.

Last but not least: using following schema:

<?xml version="1.0" encoding="UTF-8"?>
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema">
    <xs:element name="foo"/>
</xs:schema>

following XML file:

<?xml version="1.0" ?>
<foo>
<![CDATA[]]]]><![CDATA[>]]>
</foo>

validates correctly with xmllint:
$ xmllint -noout --schema schema.xsd t.xml
t.xml validates

I hope it dissolves your concerns.

PS. I noticed I missed one ] in provided patch. There should be four of them in second parameter of replace.

[1] http://www.w3.org/TR/REC-xml/#sec-cdata-sect
[2] http://en.wikipedia.org/wiki/CDATA#Nesting
[3] http://oxygenxml.com/archives/xsl-list/200502/msg00787.html
msg212000 - (view) Author: Artur R. Czechowski (arturcz) * Date: 2014-02-23 15:32
Proper patch with tests available in remote hg repo attached to this comment.
msg212015 - (view) Author: Martin v. Löwis (loewis) * (Python committer) Date: 2014-02-23 18:28
Artur: Please provide the follwing information (in this bug report, or in any other bug report you create in the future)
1. this is what I did
2. this is what happened
3. this is what should have happened instead
In this case, a Python script that ought to work but currently fails would be appreciated.

I fail to see how "two CDATA sections in one element" are related to "support for ]]> inside CDATA". In your example, there is no ]]> inside CDATA, just two subsequent CDATA sections.
msg212025 - (view) Author: Artur R. Czechowski (arturcz) * Date: 2014-02-23 20:08
Martin, the exact information you need are:

1. this is what I did:

#!/usr/bin/env python
import unittest
import xmlrunner

class Foo(unittest.TestCase):
    def testFoo(self):
        self.assertTrue(False, ']]>')

unittest.main(testRunner=xmlrunner.XMLTestRunner(output='test-reports'))

2. this is what happened:
arturcz@szczaw:/tmp$ ./cdata.py 

Running tests...
----------------------------------------------------------------------
F
======================================================================
FAIL [0.000s]: testFoo (__main__.Foo)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "./cdata.py", line 7, in testFoo
    self.assertTrue(False, ']]>')
AssertionError: ]]>

----------------------------------------------------------------------
Ran 1 test in 0.001s

FAILED (failures=1)

Generating XML reports...
Traceback (most recent call last):
  File "./cdata.py", line 9, in <module>
    unittest.main(testRunner=xmlrunner.XMLTestRunner(output='test-reports'))
  File "/usr/lib/python2.7/unittest/main.py", line 95, in __init__
    self.runTests()
  File "/usr/lib/python2.7/unittest/main.py", line 232, in runTests
    self.result = testRunner.run(self.test)
  File "/usr/lib/python2.7/dist-packages/xmlrunner/__init__.py", line 415, in run
    result.generate_reports(self)
  File "/usr/lib/python2.7/dist-packages/xmlrunner/__init__.py", line 312, in generate_reports
    xml_content = doc.toprettyxml(indent='\t')
  File "/usr/lib/python2.7/xml/dom/minidom.py", line 58, in toprettyxml
    self.writexml(writer, "", indent, newl, encoding)
  File "/usr/lib/python2.7/xml/dom/minidom.py", line 1749, in writexml
    node.writexml(writer, indent, addindent, newl)
  File "/usr/lib/python2.7/xml/dom/minidom.py", line 814, in writexml
    node.writexml(writer, indent+addindent, addindent, newl)
  File "/usr/lib/python2.7/xml/dom/minidom.py", line 814, in writexml
    node.writexml(writer, indent+addindent, addindent, newl)
  File "/usr/lib/python2.7/xml/dom/minidom.py", line 814, in writexml
    node.writexml(writer, indent+addindent, addindent, newl)
  File "/usr/lib/python2.7/xml/dom/minidom.py", line 1150, in writexml
    raise ValueError("']]>' not allowed in a CDATA section")
ValueError: ']]>' not allowed in a CDATA section

and empty directory test-reports has been created.

3. this is what should have happened instead:
arturcz@szczaw:/tmp$ ./cdata.py 

Running tests...
----------------------------------------------------------------------
F
======================================================================
FAIL [0.000s]: testFoo (__main__.Foo)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "./cdata.py", line 7, in testFoo
    self.assertTrue(False, ']]>')
AssertionError: ]]>

----------------------------------------------------------------------
Ran 1 test in 0.001s

FAILED (failures=1)

Generating XML reports...

and file test-reports/TEST-Foo-${timestamp}.xml is created with following content:
<?xml version="1.0" ?>
<testsuite errors="0" failures="1" name="Foo-20140223203423" tests="1" time="0.000">
        <testcase classname="Foo" name="testFoo" time="0.000">
                <failure message="]]&gt;" type="AssertionError">
<![CDATA[Traceback (most recent call last):
  File "./cdata.py", line 7, in testFoo
    self.assertTrue(False, ']]]]><![CDATA[>')
AssertionError: ]]]]><![CDATA[>
]]>             </failure>
        </testcase>
        <system-out>
<![CDATA[]]>    </system-out>
        <system-err>
<![CDATA[]]>    </system-err>
</testsuite>

however, on the level of minidom.py module, there is an exact test provided in attached repository.

PS. I removed the patch by purpose - it's wrong and someone could be misleaded by it. The correct solution I propose is in the attached repository.
msg212027 - (view) Author: Martin v. Löwis (loewis) * (Python committer) Date: 2014-02-23 21:11
I fail to see the bug in Python. minidom is behaving correctly. The error is in xmlrunner, which does

  error_info = str(test_result.get_error_info())
  failureText = xml_document.createCDATASection(error_info)

This is incorrect - it would have to check that error_info does not contain ]]> (since CDATA sections must not contain ]]>). If it finds ]]> in the error_info, it would have to create two CDATA sections, e.g. one up to and including ]], and the second one starting at > (repeated if there is more than one occurrence of ]]> in error_info.

Alternatively, it should just create a text node, since writing a text node will properly escape all special characters in error_info.
msg212032 - (view) Author: Artur R. Czechowski (arturcz) * Date: 2014-02-23 21:49
Martin,
I partially agree with you.
After rethinking the issue I agree that changing the behavior of createCDATASection maybe is not a good idea.
On the other hand anyone in need of adding ]]> into CDATA must write a few lines of code, which will be essentially the same for each application. This part of code could be embedded into minidom module and it could be provided to each needing party. Please consider extending the interface of minidom module with method creating CDATA sections regardless of the provided data, splitting it into two CDATA if necessary.

If you disagree, feel free to close this issue.
msg212033 - (view) Author: Martin v. Löwis (loewis) * (Python committer) Date: 2014-02-23 21:59
I don't think your proposal can be implemented. createXYZ, in DOM, only creates a single node, which is not yet attached at all into a tree. So if you would want a convenience function to create multiple CDATA section, the application would then still have to loop over them to insert them individually into the tree.

Personally, I would not have my application create multiple CDATA sections. Instead, I would use

if ']]>' in error_info:
    failureText = xml_document.createTextNode(error_info)
else:
    failureText = xml_document.createCDATASection(error_info)

instead. So I doubt that all application authors would really agree on a single solution to this problem.

Closing this as "won't fix".
msg212078 - (view) Author: Peter Otten (peter.otten) * Date: 2014-02-24 09:27
Perhaps a look at the competition is still in order: Java silently breaks such an invalid CDATA in two, as suggested.

http://www.w3.org/TR/2004/REC-DOM-Level-3-Core-20040407/core.htm says
"""
No lexical check is done on the content of a CDATA section and it is therefore possible to have the character sequence "]]>" in the content, which is illegal in a CDATA section per section 2.7 of [XML 1.0]. The presence of this character sequence must generate a fatal error during serialization or the cdata section must be splitted before the serialization (see also the parameter "split-cdata-sections" in the DOMConfiguration interface).
"""

The change Artur suggested would be covered by this.
History
Date User Action Args
2014-02-24 09:27:37peter.ottensetmessages: + msg212078
2014-02-23 21:59:56loewissetstatus: open -> closed
resolution: wont fix
messages: + msg212033
2014-02-23 21:49:04arturczsetmessages: + msg212032
2014-02-23 21:11:50loewissetmessages: + msg212027
2014-02-23 20:56:50loewissetfiles: - minidom.patch
2014-02-23 20:08:49arturczsetmessages: + msg212025
2014-02-23 19:24:00peter.ottensetnosy: + peter.otten
2014-02-23 18:28:31loewissetnosy: + loewis
messages: + msg212015
2014-02-23 18:23:24loewissetfiles: + minidom.patch
2014-02-23 15:33:04arturczsetfiles: - minidom.patch
2014-02-23 15:32:34arturczsethgrepos: + hgrepo217
messages: + msg212000
2014-02-21 12:28:28serhiy.storchakasetnosy: + serhiy.storchaka

stage: test needed
2014-02-21 10:14:53arturczsetmessages: + msg211842
2014-02-21 08:15:41eric.araujosetnosy: + eric.araujo
title: Please allow for ]]> in CDATA in minidom.py -> Allow for ]]> in CDATA in minidom
messages: + msg211821

versions: + Python 3.5, - Python 2.7, Python 3.3
2014-02-21 02:04:09arturczcreate