classification
Title: xml.etree.ElementTree.tostring returns type bytes, expected type str
Type: behavior Stage: needs patch
Components: Documentation, XML Versions: Python 3.1
process
Status: open Resolution:
Dependencies: Superseder:
Assigned To: docs@python Nosy List: Gunnar.Eikman, JTMoon79, docs@python, r.david.murray
Priority: normal Keywords:

Created on 2011-01-19 00:12 by JTMoon79, last changed 2012-04-23 14:42 by Gunnar.Eikman.

Messages (4)
msg126506 - (view) Author: JamesThomasMoon1979 (JTMoon79) Date: 2011-01-19 00:12
method xml.etree.ElementTree.tostring from module returns type bytes.
The documentation reads
"""Returns an encoded string containing the XML data."""
(from http://docs.python.org/py3k/library/xml.etree.elementtree.html#xml.etree.ElementTree.tostring as of 2011-01-18)

=======================================================
Here is a test program:
-------------------------------------------------------
#!/usr/bin/python
# created for python 3.1

import sys
print(sys.version) # for help verifying version tested
from xml.etree import ElementTree

sampleinput = """<?xml version="1.0"?><Hello></Hello>"""
xmlobj = ElementTree.fromstring(sampleinput)
type(xmlobj)
xmlstr = ElementTree.tostring(xmlobj,'utf-8')
print("xmlstr value is '", xmlstr, "'", sep="")
print("xmlstr type is '", type(xmlstr), "'", sep="")
-------------------------------------------------------
test program output:
-------------------------------------------------------
3.1.3 (r313:86834, Nov 27 2010, 18:30:53) [MSC v.1500 32 bit (Intel)]
xmlstr value is 'b'<Hello />''
xmlstr type is '<class 'bytes'>'
=======================================================

This cheap "fix" for this bug may be simply be a change in documentation.
However, a method called "tostring" really should return something nearer to the built-in str.
msg126507 - (view) Author: JamesThomasMoon1979 (JTMoon79) Date: 2011-01-19 00:14
Some other bugs affecting the tostring method (for consideration by the reviewer):
http://bugs.python.org/issue6233#msg89718
http://bugs.python.org/msg101037
http://bugs.python.org/issue9692
msg126517 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2011-01-19 02:10
This is indeed a doc problem, although there was some discussion of working toward a method rename.  See issue 8047 (but be prepared to read a novel to understand why tostring returns bytes...)  The doc for 3.2 is slightly clearer, but both 3.1 and 3.2 could be made clearer by referring to an 'encoded byte string' rather than just an 'encoded string'.  (An encoded string has to be a byte string, but that isn't obvious unless you've dealt with encode/decode a bunch.)

Technically this could be closed as a duplicate of issue 8047, since that issue proposes that the API fix (which would include the doc change) be backported to 3.1.  But no one has proposed a patch there, so at a minimum the 3.1 docs should be clarified.
msg159021 - (view) Author: Gunnar Eikman (Gunnar.Eikman) Date: 2012-04-23 14:42
I moved a working script from Ubuntu (Python 3.1.2) to Windows (Python 3.2.3) today.

Had to revise script. The tostring method returns a string on Linux (contradicts this issue), but bytes on Windows (as described in this issue)...

I used tostring with a single argument "tostring(theXml)"

Is there an explanation for this? I am not an advanced Python hacker...

Be careful when moving from one environment to another!
History
Date User Action Args
2012-04-23 14:42:06Gunnar.Eikmansetnosy: + Gunnar.Eikman
messages: + msg159021
2011-01-19 02:10:56r.david.murraysetnosy: + r.david.murray

messages: + msg126517
stage: needs patch
2011-01-19 00:14:28JTMoon79setmessages: + msg126507
2011-01-19 00:12:25JTMoon79create