classification
Title: xml.etree.ElementTree.ElementTree.write(): encoding handling problems
Type: behavior Stage: test needed
Components: XML Versions: Python 3.3, Python 3.2, Python 2.7
process
Status: open Resolution:
Dependencies: Superseder:
Assigned To: Nosy List: dabrahams, effbot, eric.araujo, flox, haypo, kune
Priority: normal Keywords: patch

Created on 2010-08-02 21:02 by kune, last changed 2012-04-21 01:46 by eric.araujo.

Files
File name Uploaded Description Edit
bugs.tar.gz kune, 2010-08-03 19:13 Tar container with bug examples
ElementTree.patch kune, 2010-08-03 19:48 Patch for xml.etree.ElementTree.py
Messages (7)
msg112550 - (view) Author: Uli Kunitz (kune) Date: 2010-08-02 21:02
If one wants to use the encoding parameter of ElementTree.write() the file must be opened with "wb". Without encoding parameter normal files can be used, but the should be opened with the encoding "UTF-8", because otherwise this may create an error.

Probably comparable problems exist with the parser side of things.
msg112568 - (view) Author: Éric Araujo (eric.araujo) * (Python committer) Date: 2010-08-03 09:41
Is this a behavior bug or a doc bug?
msg112653 - (view) Author: Uli Kunitz (kune) Date: 2010-08-03 19:13
I believe handling of TextIOWrapper streams is broken in xml.etree.ElementTree.ElementTree.write().

First example:

import sys
from xml.etree import ElementTree

element = ElementTree.fromstring("""<foo><bar>foobar</bar></foo>""")
element_tree = ElementTree.ElementTree(element)

assert sys.stdout.encoding == "UTF-8"
element_tree.write(sys.stdout, encoding="UTF-8")
print()

I don't think that write a tree into a stream with the correct encoding should generate any problem at all.

The output looks like this:

Traceback (most recent call last):
  File "/home/kunitz/test/lib/python3.2/xml/etree/ElementTree.py", line 825, in write
    "xmlcharrefreplace"))
TypeError: must be str, not bytes

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "bug1.py", line 9, in <module>
    element_tree.write(sys.stdout, encoding="UTF-8")
  File "/home/kunitz/test/lib/python3.2/xml/etree/ElementTree.py", line 843, in write
    write("<?xml version='1.0' encoding='%s'?>\n" % encoding_)
  File "/home/kunitz/test/lib/python3.2/xml/etree/ElementTree.py", line 827, in write
    _raise_serialization_error(text)
  File "/home/kunitz/test/lib/python3.2/xml/etree/ElementTree.py", line 1077, in _raise_serialization_error
    "cannot serialize %r (type %s)" % (text, type(text).__name__)
TypeError: cannot serialize "<?xml version='1.0' encoding='UTF-8'?>\n" (type str)

Example 2:
import sys
from xml.etree import ElementTree

element = ElementTree.fromstring("""<foo><bar>fööbar</bar></foo>""")
element_tree = ElementTree.ElementTree(element)

with open("bug2.xml", "w", encoding="US-ASCII") as f:
    element_tree.write(f)

The first ö umlaut generates an UnicodeEncodeError here, while the method could use XML character references. One could argue this, but the method could take care of the problem.

Third example:
import sys
from xml.etree import ElementTree

element = ElementTree.fromstring("""<foo><bar>fööbar</bar></foo>""")
element_tree = ElementTree.ElementTree(element)

with open("bug3.xml", "w", encoding="ISO-8859-1",
          errors="xmlcharrefreplace") as f:
    element_tree.write(f, xml_declaration=True)

This creates finally an ISO-8859-1 encoded XML file, but without XML declaration. Didn't we request one?

Example 4: Try to do the right thing.
import sys
from xml.etree import ElementTree

element = ElementTree.fromstring("""<foo><bar>fööbar</bar></foo>""")
element_tree = ElementTree.ElementTree(element)

with open("bug4.xml", "w", encoding="ISO-8859-1",
          errors="xmlcharrefreplace") as f:
    element_tree.write(f, encoding="ISO-8859-1", xml_declaration=True)

Here we get the same exception as example 1 of course.

All the files can be found in the tar container below.
msg112658 - (view) Author: Uli Kunitz (kune) Date: 2010-08-03 19:48
Here is a patch that handles all 4 examples in the last comment correctly and survives the Python test suite on Linux (Ubuntu 9.04 x86-64).
msg148280 - (view) Author: Éric Araujo (eric.araujo) * (Python committer) Date: 2011-11-24 16:43
Thanks for the patch.  The examples in your message need to be converted to a patch that applies to 3.2 or 2.7, so that we can reproduce the bug before fixing it.
msg158903 - (view) Author: Dave Abrahams (dabrahams) Date: 2012-04-21 01:42
These bugs are annoying.  How does one convert a set of examples into a patch?  Do you mean you want these to become test cases?
msg158905 - (view) Author: Éric Araujo (eric.araujo) * (Python committer) Date: 2012-04-21 01:46
Yes.  See the devguide if you need more info.
History
Date User Action Args
2012-04-21 01:46:04eric.araujosetmessages: + msg158905
2012-04-21 01:42:29dabrahamssetnosy: + dabrahams
messages: + msg158903
2011-11-24 16:44:04hayposetnosy: + haypo
2011-11-24 16:43:24eric.araujosetstage: test needed
messages: + msg148280
versions: + Python 2.7, Python 3.3
2010-08-03 19:48:18kunesetfiles: + ElementTree.patch
keywords: + patch
messages: + msg112658
2010-08-03 19:14:00kunesetfiles: + bugs.tar.gz
type: behavior
messages: + msg112653

title: xml.etree.ElementTree.write(): encoding handling problems -> xml.etree.ElementTree.ElementTree.write(): encoding handling problems
2010-08-03 09:49:11floxsetnosy: + effbot, flox
2010-08-03 09:41:23eric.araujosetnosy: + eric.araujo
messages: + msg112568
2010-08-02 21:02:04kunecreate