This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: Apostrophe is not replace with ' ElementTree.tostring (also in Element.write)
Type: behavior Stage: resolved
Components: XML Versions: Python 3.4, Python 2.7
process
Status: closed Resolution: duplicate
Dependencies: Superseder: XML munges apos entity in tag content
View: 2647
Assigned To: Nosy List: fruch, scoder, serhiy.storchaka
Priority: normal Keywords:

Created on 2016-08-30 17:18 by fruch, last changed 2022-04-11 14:58 by admin. This issue is now closed.

Messages (3)
msg273937 - (view) Author: Israel Fruchter (fruch) Date: 2016-08-30 17:18
Both on python2.7 and python3.4
>>> from xml.etree import cElementTree as ET
>>> text = '<end>its &gt; &lt; &amp; &apos;</end>'
>>> root = ET.fromstring(text.encode('utf-8'))
>>> ET.tostring(root, method="xml")
<end>its &gt; &lt; &amp; '</end>

I would expected to return the same as the input to be a complient XML 1.0

I would understand why for html it would return something diffrent, see:
http://stackoverflow.com/questions/2083754/why-shouldnt-apos-be-used-to-escape-single-quotes

as a workaround I had to path ElementTree:

from xml.etree.ElementTree import _escape_cdata ,_raise_serialization_error
from mock import patch

def _escape_cdata(text):
    # escape character data
    try:
        # it's worth avoiding do-nothing calls for strings that are
        # shorter than 500 character, or so.  assume that's, by far,
        # the most common case in most applications.
        if "&" in text:
            text = text.replace("&", "&amp;")
        if "<" in text:
            text = text.replace("<", "&lt;")
        if ">" in text:
            text = text.replace(">", "&gt;")
        if "'" in text:
            text = text.replace("'", "&apos;")
        return text
    except (TypeError, AttributeError):
        _raise_serialization_error(text)

from xml.etree import cElementTree as ET

text = '<end>its &gt; &lt; &amp; &apos;</end>'
root = ET.fromstring(text.encode('utf-8'))

with patch('xml.etree.ElementTree._escape_cdata', new=_escape_cdata):

    s = ET.tostring(root, encoding='unicode', method="xml")
print(s)
msg273941 - (view) Author: Israel Fruchter (fruch) Date: 2016-08-30 17:33
I've now found http://bugs.python.org/issue2647, and seem like this was classify as not a bug.

maybe documetion should say it ? or anther way to actuly decide about how to output those
msg275569 - (view) Author: Stefan Behnel (scoder) * (Python committer) Date: 2016-09-10 05:13
Definitely not a bug since this isn't required by the XML spec. As said in issue 2647, you shouldn't rely on exact lexical characteristics of an XML byte stream, unless you request canonical serialisation (C14N).
History
Date User Action Args
2022-04-11 14:58:35adminsetgithub: 72086
2016-09-10 05:13:24scodersetmessages: + msg275569
2016-09-07 17:58:27ned.deilysetnosy: + scoder, - skrah
2016-08-30 18:23:29rhettingersetnosy: + skrah
2016-08-30 18:21:18serhiy.storchakasetstatus: open -> closed

nosy: + serhiy.storchaka
resolution: duplicate
superseder: XML munges apos entity in tag content
stage: resolved
2016-08-30 17:33:57fruchsetmessages: + msg273941
2016-08-30 17:18:25fruchcreate