classification
Title: xml.etree.ElementTree.ElementTree.write(): encoding handling problems
Type: behavior Stage: test needed
Components: Documentation, XML Versions: Python 3.2, Python 3.3, Python 2.7
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: docs@python Nosy List: dabrahams, docs@python, effbot, eli.bendersky, eric.araujo, flox, kune, python-dev, serhiy.storchaka, vstinner
Priority: normal Keywords: patch

Created on 2010-08-02 21:02 by kune, last changed 2012-07-15 03:20 by eli.bendersky. This issue is now closed.

Files
File name Uploaded Description Edit
bugs.tar.gz kune, 2010-08-03 19:13 Tar container with bug examples
ElementTree.patch kune, 2010-08-03 19:48 Patch for xml.etree.ElementTree.py
Messages (12)
msg112550 - (view) Author: Uli Kunitz (kune) Date: 2010-08-02 21:02
If one wants to use the encoding parameter of ElementTree.write() the file must be opened with "wb". Without encoding parameter normal files can be used, but the should be opened with the encoding "UTF-8", because otherwise this may create an error.

Probably comparable problems exist with the parser side of things.
msg112568 - (view) Author: Éric Araujo (eric.araujo) * (Python committer) Date: 2010-08-03 09:41
Is this a behavior bug or a doc bug?
msg112653 - (view) Author: Uli Kunitz (kune) Date: 2010-08-03 19:13
I believe handling of TextIOWrapper streams is broken in xml.etree.ElementTree.ElementTree.write().

First example:

import sys
from xml.etree import ElementTree

element = ElementTree.fromstring("""<foo><bar>foobar</bar></foo>""")
element_tree = ElementTree.ElementTree(element)

assert sys.stdout.encoding == "UTF-8"
element_tree.write(sys.stdout, encoding="UTF-8")
print()

I don't think that write a tree into a stream with the correct encoding should generate any problem at all.

The output looks like this:

Traceback (most recent call last):
  File "/home/kunitz/test/lib/python3.2/xml/etree/ElementTree.py", line 825, in write
    "xmlcharrefreplace"))
TypeError: must be str, not bytes

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "bug1.py", line 9, in <module>
    element_tree.write(sys.stdout, encoding="UTF-8")
  File "/home/kunitz/test/lib/python3.2/xml/etree/ElementTree.py", line 843, in write
    write("<?xml version='1.0' encoding='%s'?>\n" % encoding_)
  File "/home/kunitz/test/lib/python3.2/xml/etree/ElementTree.py", line 827, in write
    _raise_serialization_error(text)
  File "/home/kunitz/test/lib/python3.2/xml/etree/ElementTree.py", line 1077, in _raise_serialization_error
    "cannot serialize %r (type %s)" % (text, type(text).__name__)
TypeError: cannot serialize "<?xml version='1.0' encoding='UTF-8'?>\n" (type str)

Example 2:
import sys
from xml.etree import ElementTree

element = ElementTree.fromstring("""<foo><bar>fööbar</bar></foo>""")
element_tree = ElementTree.ElementTree(element)

with open("bug2.xml", "w", encoding="US-ASCII") as f:
    element_tree.write(f)

The first ö umlaut generates an UnicodeEncodeError here, while the method could use XML character references. One could argue this, but the method could take care of the problem.

Third example:
import sys
from xml.etree import ElementTree

element = ElementTree.fromstring("""<foo><bar>fööbar</bar></foo>""")
element_tree = ElementTree.ElementTree(element)

with open("bug3.xml", "w", encoding="ISO-8859-1",
          errors="xmlcharrefreplace") as f:
    element_tree.write(f, xml_declaration=True)

This creates finally an ISO-8859-1 encoded XML file, but without XML declaration. Didn't we request one?

Example 4: Try to do the right thing.
import sys
from xml.etree import ElementTree

element = ElementTree.fromstring("""<foo><bar>fööbar</bar></foo>""")
element_tree = ElementTree.ElementTree(element)

with open("bug4.xml", "w", encoding="ISO-8859-1",
          errors="xmlcharrefreplace") as f:
    element_tree.write(f, encoding="ISO-8859-1", xml_declaration=True)

Here we get the same exception as example 1 of course.

All the files can be found in the tar container below.
msg112658 - (view) Author: Uli Kunitz (kune) Date: 2010-08-03 19:48
Here is a patch that handles all 4 examples in the last comment correctly and survives the Python test suite on Linux (Ubuntu 9.04 x86-64).
msg148280 - (view) Author: Éric Araujo (eric.araujo) * (Python committer) Date: 2011-11-24 16:43
Thanks for the patch.  The examples in your message need to be converted to a patch that applies to 3.2 or 2.7, so that we can reproduce the bug before fixing it.
msg158903 - (view) Author: Dave Abrahams (dabrahams) Date: 2012-04-21 01:42
These bugs are annoying.  How does one convert a set of examples into a patch?  Do you mean you want these to become test cases?
msg158905 - (view) Author: Éric Araujo (eric.araujo) * (Python committer) Date: 2012-04-21 01:46
Yes.  See the devguide if you need more info.
msg164714 - (view) Author: Eli Bendersky (eli.bendersky) * (Python committer) Date: 2012-07-06 03:12
Please make sure that the patch(es) apply cleanly to 3.3, since this is the version I'll be focusing on.
msg164745 - (view) Author: Dave Abrahams (dabrahams) Date: 2012-07-06 19:30
I won't get to this, FYI.
msg165378 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2012-07-13 12:40
ElementTree write works with two kinds of output -- binary and text. The difference between them is only determined by encoding argument. If encoding is "unicode", then output is text, else it is binary. There is no other way for filename or general file-like object to determine kind of output. If these are not explained in the documentation, then the documentation should be improved.

The patch can cause data corruption because direct writing to underlying file by fileno conflicts with TextIOBase/BufferedIOBase internal buffering. And not every file-like object have fileno. With patch the behavior becomes less obvious and will lead to confusion.

I don't see a behavior bug which should be fixed.

Only one thing can be enhanced -- error diagnostic in some corner cases. When we can determines that file object is instance of RawIOBase or TextIOBase and it is conflicts with encoding argument value, it will be helpful for novices to raise a descriptive exception. This is of course not eliminate all causes for confusing.
msg165493 - (view) Author: Roundup Robot (python-dev) (Python triager) Date: 2012-07-15 03:20
New changeset 51b5ee7cfa3b by Eli Bendersky in branch 'default':
Issue #9458: clarify the documentation of ElementTree.write with regards to the type of the stream expected for a given encoding
http://hg.python.org/cpython/rev/51b5ee7cfa3b
msg165494 - (view) Author: Eli Bendersky (eli.bendersky) * (Python committer) Date: 2012-07-15 03:20
I agree with Serhiy that this is more of a documentation/understanding issue than a real bug. I've clarified the doc of ElementTree.write a bit to make it explicit what stream is expected for 'write'.
History
Date User Action Args
2012-07-15 03:20:58eli.benderskysetstatus: open -> closed

assignee: docs@python
components: + Documentation

nosy: + docs@python
messages: + msg165494
resolution: fixed
2012-07-15 03:20:03python-devsetnosy: + python-dev
messages: + msg165493
2012-07-13 12:40:04serhiy.storchakasetnosy: + serhiy.storchaka
messages: + msg165378
2012-07-06 19:30:57dabrahamssetnosy: effbot, vstinner, eric.araujo, kune, eli.bendersky, flox, dabrahams
messages: + msg164745
2012-07-06 03:12:36eli.benderskysetmessages: + msg164714
2012-06-17 03:19:24eli.benderskysetnosy: + eli.bendersky
2012-04-21 01:46:04eric.araujosetmessages: + msg158905
2012-04-21 01:42:29dabrahamssetnosy: + dabrahams
messages: + msg158903
2011-11-24 16:44:04vstinnersetnosy: + vstinner
2011-11-24 16:43:24eric.araujosetstage: test needed
messages: + msg148280
versions: + Python 2.7, Python 3.3
2010-08-03 19:48:18kunesetfiles: + ElementTree.patch
keywords: + patch
messages: + msg112658
2010-08-03 19:14:00kunesetfiles: + bugs.tar.gz
type: behavior
messages: + msg112653

title: xml.etree.ElementTree.write(): encoding handling problems -> xml.etree.ElementTree.ElementTree.write(): encoding handling problems
2010-08-03 09:49:11floxsetnosy: + effbot, flox
2010-08-03 09:41:23eric.araujosetnosy: + eric.araujo
messages: + msg112568
2010-08-02 21:02:04kunecreate