classification
Title: ElementTree incorrectly parses strings with declared encoding not UTF-8
Type: behavior Stage: resolved
Components: XML Versions: Python 3.4, Python 3.3
process
Status: closed Resolution: fixed
Dependencies: 17089 Superseder:
Assigned To: serhiy.storchaka Nosy List: eli.bendersky, python-dev, serhiy.storchaka
Priority: normal Keywords: patch

Created on 2013-01-17 16:54 by serhiy.storchaka, last changed 2013-05-22 18:29 by eli.bendersky. This issue is now closed.

Files
File name Uploaded Description Edit
etree_parse_str.patch serhiy.storchaka, 2013-02-25 15:41 review
etree_parse_str_2.patch serhiy.storchaka, 2013-05-22 13:42 review
Messages (10)
msg180143 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2013-01-17 16:54
>>> import xml.etree.ElementTree
>>> data = '<?xml version="1.0" encoding="iso-8859-1"?>\n<money value="$\xa3\u20ac\U0001017b">$\xa3\u20ac\U0001017b</money>'
>>> xml.etree.ElementTree.tostring(xml.etree.ElementTree.fromstring(data), 'unicode')
'<money value="$£â\x82¬ð\x90\x85»">$£â\x82¬ð\x90\x85»</money>'
msg180144 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2013-01-17 16:54
Patch for issue10590 fixes this for Python implementation of ElementTree, but not for C implementation.
msg182950 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2013-02-25 15:41
Here is a patch for C implementation. Python implementation was fixed in issue17089.
msg183456 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2013-03-04 14:13
Eli, this issue no longer has open pre-requisites. Issue10590 was replaced by issue17089 which closed now. Issue17089 fixed Python interface to expat parser, but cElementTree uses C interface of expat directly and the proposed pathes fix it.
msg189816 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2013-05-22 13:42
Here is an updated patch.
msg189817 - (view) Author: Eli Bendersky (eli.bendersky) * (Python committer) Date: 2013-05-22 13:48
LGTM
msg189819 - (view) Author: Roundup Robot (python-dev) Date: 2013-05-22 14:21
New changeset 7781ccae7b9a by Serhiy Storchaka in branch '3.3':
Issue #16986: ElementTree now correctly parses a string input not only when
http://hg.python.org/cpython/rev/7781ccae7b9a

New changeset 659c1ce8ed2f by Serhiy Storchaka in branch 'default':
Issue #16986: ElementTree now correctly parses a string input not only when
http://hg.python.org/cpython/rev/659c1ce8ed2f
msg189820 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2013-05-22 14:44
Oh, 2.7 still uses old doctests. It's a challenge to backport tests for this issue.
msg189832 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2013-05-22 18:17
Due to the fact that ElementTree's documentation doesn't promise parsing Unicode string perhaps it shouldn't be backported to 2.7. At least I hadn't backported corresponded pyexpat changes (which affects pure Python ElementTree) to 2.7.
msg189833 - (view) Author: Eli Bendersky (eli.bendersky) * (Python committer) Date: 2013-05-22 18:29
Agreed re 2.7; the problem is not important enough to warrant such a backport, due to the state of maintenance of 2.7 at this point.
History
Date User Action Args
2013-05-22 18:29:51eli.benderskysetmessages: + msg189833
2013-05-22 18:17:43serhiy.storchakasetstatus: open -> closed
versions: - Python 2.7
messages: + msg189832

assignee: serhiy.storchaka
resolution: fixed
stage: needs patch -> resolved
2013-05-22 14:44:30serhiy.storchakasetmessages: + msg189820
versions: - Python 3.2
2013-05-22 14:21:35python-devsetnosy: + python-dev
messages: + msg189819
2013-05-22 13:57:33serhiy.storchakaunlinkissue13612 dependencies
2013-05-22 13:48:07eli.benderskysetmessages: + msg189817
2013-05-22 13:42:12serhiy.storchakasetfiles: + etree_parse_str_2.patch

messages: + msg189816
2013-05-22 07:59:11serhiy.storchakalinkissue13612 dependencies
2013-03-04 14:13:08serhiy.storchakasetmessages: + msg183456
2013-02-25 15:41:05serhiy.storchakasetfiles: + etree_parse_str.patch
keywords: + patch
dependencies: + Expat parser parses strings only when XML encoding is UTF-8, - Parameter type error for xml.sax.parseString(string, ...)
messages: + msg182950
2013-01-17 16:54:46serhiy.storchakasetdependencies: + Parameter type error for xml.sax.parseString(string, ...)
messages: + msg180144
2013-01-17 16:54:19serhiy.storchakacreate