classification
Title: ElementTree tostring error when method='text'
Type: behavior Stage: resolved
Components: XML Versions: Python 3.4, Python 3.3
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: Nosy List: Frank, eli.bendersky, ezio.melotti, python-dev, serhiy.storchaka
Priority: normal Keywords: 3.3regression, patch

Created on 2013-01-10 05:20 by Frank, last changed 2013-01-10 14:31 by eli.bendersky. This issue is now closed.

Files
File name Uploaded Description Edit
etree_itertext.patch serhiy.storchaka, 2013-01-10 09:10 review
Messages (8)
msg179523 - (view) Author: Frank (Frank) Date: 2013-01-10 05:20
Since upgrading to python 3.3 the tostring method fails when the output method is requested as text. Code like this:

with open(fp, mode='rt') as f:
    data = f.read()
tree, idmap = ET.XMLID(data)
print(ET.tostring(tree, method='text', encoding='unicode'))

Generates the following error:

Traceback (most recent call last):
  File "/home/john/Desktop/docs/Pear/pear.py", line 64, in pass_four
    print(ET.tostring(tree, method='text', encoding='unicode'))
  File "/usr/lib/python3.3/xml/etree/ElementTree.py", line 1171, in tostring
    ElementTree(element).write(stream, encoding, method=method)
  File "/usr/lib/python3.3/xml/etree/ElementTree.py", line 824, in write
    _serialize_text(write, self._root)
  File "/usr/lib/python3.3/xml/etree/ElementTree.py", line 1057, in _serialize_text
    write(part)
TypeError: string argument expected, got 'list'

Whereas it used to return plain text with formatting tags stripped from the root element on prior versions of python.
msg179531 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2013-01-10 08:10
Can you please provide an example of data for which the tostring method fails? I can't reproduce this on simple data.

>>> import xml.etree.ElementTree as ET
>>> ET.tostring(ET.XML('<root><b>q</b>werty</root>'), method='text', encoding='unicode')
'qwerty'
msg179532 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2013-01-10 08:21
I found such example. It happens when the data contains XML entity.

>>> ET.tostring(ET.XML('<root>a&amp;</root>'), method='text', encoding='unicode')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/serhiy/py/cpython/Lib/xml/etree/ElementTree.py", line 1171, in tostring
    ElementTree(element).write(stream, encoding, method=method)
  File "/home/serhiy/py/cpython/Lib/xml/etree/ElementTree.py", line 824, in write
    _serialize_text(write, self._root)
  File "/home/serhiy/py/cpython/Lib/xml/etree/ElementTree.py", line 1057, in _serialize_text
    write(part)
TypeError: string argument expected, got 'list'


Indeed, itertext() returns a list of lists instead of list of strings.

>>> list(ET.XML('<root>a&amp;</root>').itertext())
[['a', '&']]

The bug is in the C implementation of itertext().
msg179534 - (view) Author: Frank (Frank) Date: 2013-01-10 08:34
It happens whenever the method is called, regardless of input. I'm using HTML that has been tidied first with HTML entities (if any) converted to unicode values.
msg179535 - (view) Author: Frank (Frank) Date: 2013-01-10 08:57
Scratch that, it happens whenever there are XML entities (&lt;, &quot; and friends) that are appearing the text as you pointed out.
msg179536 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2013-01-10 09:10
Here is a patch for 3.3+. 2.7 and 3.2 are not affected.
msg179550 - (view) Author: Roundup Robot (python-dev) Date: 2013-01-10 14:31
New changeset d965ff47cf94 by Eli Bendersky in branch '3.3':
Issue #16913: Fix Element.itertext()'s handling of text with XML entities.
http://hg.python.org/cpython/rev/d965ff47cf94

New changeset 9ab8632e7213 by Eli Bendersky in branch 'default':
Issue #16913: Fix Element.itertext()'s handling of text with XML entities.
http://hg.python.org/cpython/rev/9ab8632e7213
msg179551 - (view) Author: Eli Bendersky (eli.bendersky) * (Python committer) Date: 2013-01-10 14:31
Fixed. Thanks.
History
Date User Action Args
2013-01-10 14:31:55eli.benderskysetstatus: open -> closed
resolution: fixed
messages: + msg179551

stage: patch review -> resolved
2013-01-10 14:31:27python-devsetnosy: + python-dev
messages: + msg179550
2013-01-10 09:10:56serhiy.storchakasetfiles: + etree_itertext.patch
versions: + Python 3.4
messages: + msg179536

keywords: + patch
stage: needs patch -> patch review
2013-01-10 08:57:58Franksetmessages: + msg179535
2013-01-10 08:34:44Franksetmessages: + msg179534
2013-01-10 08:21:07serhiy.storchakasetmessages: + msg179532
2013-01-10 08:10:28serhiy.storchakasetnosy: + serhiy.storchaka
messages: + msg179531
2013-01-10 05:21:22ezio.melottisetkeywords: + 3.3regression
nosy: + ezio.melotti, eli.bendersky

type: behavior
stage: needs patch
2013-01-10 05:20:09Frankcreate