Author mefistotelis
Recipients mefistotelis
Date 2019-12-09.23:40:49
SpamBayes Score -1.0
Marked as misclassified Yes
Message-id <1575934850.24.0.848331688216.issue39011@roundup.psfhosted.org>
In-reply-to
Content
TLDR:
If I place "\r" in an Element attribute, it is handled and idiomized to "&#10;" in the XML file. But wait - \r is not really code 10, right?

Real description:

If I create ElementTree and read it just after creation, I'm getting what I put there - "\r". But if I save and re-load, it transforms into "\n". The character is incorrectly converted before being idiomized, and saved XML file has invalid value stored.

Quick repro:

# python3 -i
Python 3.8.0 (default, Oct 25 2019, 06:23:40)  [GCC 9.2.0 64 bit (AMD64)] on win32
Type "help", "copyright", "credits" or "license" for more information.
>>> import xml.etree.ElementTree as ET
>>> elem = ET.Element('TEST')
>>> elem.set("Attt", "a\x0db")
>>> tree = ET.ElementTree(elem)
>>> with open("_test1.xml", "wb") as xml_fh:
...     tree.write(xml_fh, encoding='utf-8', xml_declaration=True)
...
>>> tree.getroot().get("Attt")
'a\rb'
>>> tree = ET.parse("_test1.xml")
>>> tree.getroot().get("Attt")
'a\nb'
>>>

Related issue: https://bugs.python.org/issue5752
(keeping this one separate as it seem to be a simple bug, easy to fix outside of the discussion there)

If there's a good workaround - please let me know.

Tested on Windows, v3.8 and v3.6
History
Date User Action Args
2019-12-09 23:40:50mefistotelissetrecipients: + mefistotelis
2019-12-09 23:40:50mefistotelissetmessageid: <1575934850.24.0.848331688216.issue39011@roundup.psfhosted.org>
2019-12-09 23:40:50mefistotelislinkissue39011 messages
2019-12-09 23:40:49mefistoteliscreate