Message 273937 - Python tracker

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author	fruch
Recipients	fruch
Date	2016-08-30.17:18:25
SpamBayes Score	-1.0
Marked as misclassified	Yes
Message-id	<1472577505.85.0.700751415342.issue27899@psf.upfronthosting.co.za>
In-reply-to

Content
Both on python2.7 and python3.4 >>> from xml.etree import cElementTree as ET >>> text = '<end>its > < & '</end>' >>> root = ET.fromstring(text.encode('utf-8')) >>> ET.tostring(root, method="xml") <end>its > < & '</end> I would expected to return the same as the input to be a complient XML 1.0 I would understand why for html it would return something diffrent, see: http://stackoverflow.com/questions/2083754/why-shouldnt-apos-be-used-to-escape-single-quotes as a workaround I had to path ElementTree: from xml.etree.ElementTree import _escape_cdata ,_raise_serialization_error from mock import patch def _escape_cdata(text): # escape character data try: # it's worth avoiding do-nothing calls for strings that are # shorter than 500 character, or so. assume that's, by far, # the most common case in most applications. if "&" in text: text = text.replace("&", "&") if "<" in text: text = text.replace("<", "<") if ">" in text: text = text.replace(">", ">") if "'" in text: text = text.replace("'", "'") return text except (TypeError, AttributeError): _raise_serialization_error(text) from xml.etree import cElementTree as ET text = '<end>its > < & '</end>' root = ET.fromstring(text.encode('utf-8')) with patch('xml.etree.ElementTree._escape_cdata', new=_escape_cdata): s = ET.tostring(root, encoding='unicode', method="xml") print(s)

Both on python2.7 and python3.4
>>> from xml.etree import cElementTree as ET
>>> text = '<end>its &gt; &lt; &amp; &apos;</end>'
>>> root = ET.fromstring(text.encode('utf-8'))
>>> ET.tostring(root, method="xml")
<end>its &gt; &lt; &amp; '</end>

I would expected to return the same as the input to be a complient XML 1.0

I would understand why for html it would return something diffrent, see:
http://stackoverflow.com/questions/2083754/why-shouldnt-apos-be-used-to-escape-single-quotes

as a workaround I had to path ElementTree:

from xml.etree.ElementTree import _escape_cdata ,_raise_serialization_error
from mock import patch

def _escape_cdata(text):
    # escape character data
    try:
        # it's worth avoiding do-nothing calls for strings that are
        # shorter than 500 character, or so.  assume that's, by far,
        # the most common case in most applications.
        if "&" in text:
            text = text.replace("&", "&amp;")
        if "<" in text:
            text = text.replace("<", "&lt;")
        if ">" in text:
            text = text.replace(">", "&gt;")
        if "'" in text:
            text = text.replace("'", "&apos;")
        return text
    except (TypeError, AttributeError):
        _raise_serialization_error(text)

from xml.etree import cElementTree as ET

text = '<end>its &gt; &lt; &amp; &apos;</end>'
root = ET.fromstring(text.encode('utf-8'))

with patch('xml.etree.ElementTree._escape_cdata', new=_escape_cdata):

    s = ET.tostring(root, encoding='unicode', method="xml")
print(s)

History
Date	User	Action	Args
2016-08-30 17:18:25	fruch	set	recipients: + fruch
2016-08-30 17:18:25	fruch	set	messageid: <1472577505.85.0.700751415342.issue27899@psf.upfronthosting.co.za>
2016-08-30 17:18:25	fruch	link	issue27899 messages
2016-08-30 17:18:25	fruch	create