ElementTree incorrectly parses strings with declared encoding not UTF-8 #61190

serhiy-storchaka · 2013-01-17T16:54:19Z

BPO	16986
Nosy	@serhiy-storchaka
Dependencies	bpo-17089: Expat parser parses strings only when XML encoding is UTF-8
Files	etree_parse_str.patch etree_parse_str_2.patch

^{Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.}

Show more details

GitHub fields:

assignee = 'https://github.com/serhiy-storchaka'
closed_at = <Date 2013-05-22.18:17:43.288>
created_at = <Date 2013-01-17.16:54:18.910>
labels = ['expert-XML', 'type-bug']
title = 'ElementTree incorrectly parses strings with declared encoding not UTF-8'
updated_at = <Date 2013-05-22.18:29:51.731>
user = 'https://github.com/serhiy-storchaka'

bugs.python.org fields:

activity = <Date 2013-05-22.18:29:51.731>
actor = 'eli.bendersky'
assignee = 'serhiy.storchaka'
closed = True
closed_date = <Date 2013-05-22.18:17:43.288>
closer = 'serhiy.storchaka'
components = ['XML']
creation = <Date 2013-01-17.16:54:18.910>
creator = 'serhiy.storchaka'
dependencies = ['17089']
files = ['29233', '30341']
hgrepos = []
issue_num = 16986
keywords = ['patch']
message_count = 10.0
messages = ['180143', '180144', '182950', '183456', '189816', '189817', '189819', '189820', '189832', '189833']
nosy_count = 3.0
nosy_names = ['eli.bendersky', 'python-dev', 'serhiy.storchaka']
pr_nums = []
priority = 'normal'
resolution = 'fixed'
stage = 'resolved'
status = 'closed'
superseder = None
type = 'behavior'
url = 'https://bugs.python.org/issue16986'
versions = ['Python 3.3', 'Python 3.4']

serhiy-storchaka · 2013-01-17T16:54:18Z

>>> import xml.etree.ElementTree
>>> data = '<?xml version="1.0" encoding="iso-8859-1"?>\n<money value="$\xa3\u20ac\U0001017b">$\xa3\u20ac\U0001017b</money>'
>>> xml.etree.ElementTree.tostring(xml.etree.ElementTree.fromstring(data), 'unicode')
'<money value="$Â£â\x82¬ð\x90\x85»">$Â£â\x82¬ð\x90\x85»</money>'

serhiy-storchaka · 2013-01-17T16:54:46Z

Patch for bpo-10590 fixes this for Python implementation of ElementTree, but not for C implementation.

serhiy-storchaka · 2013-02-25T15:41:05Z

Here is a patch for C implementation. Python implementation was fixed in bpo-17089.

serhiy-storchaka · 2013-03-04T14:13:09Z

Eli, this issue no longer has open pre-requisites. bpo-10590 was replaced by bpo-17089 which closed now. bpo-17089 fixed Python interface to expat parser, but cElementTree uses C interface of expat directly and the proposed pathes fix it.

serhiy-storchaka · 2013-05-22T13:42:11Z

Here is an updated patch.

elibendersky · 2013-05-22T13:48:08Z

LGTM

python-dev · 2013-05-22T14:21:36Z

New changeset 7781ccae7b9a by Serhiy Storchaka in branch '3.3':
Issue bpo-16986: ElementTree now correctly parses a string input not only when
http://hg.python.org/cpython/rev/7781ccae7b9a

New changeset 659c1ce8ed2f by Serhiy Storchaka in branch 'default':
Issue bpo-16986: ElementTree now correctly parses a string input not only when
http://hg.python.org/cpython/rev/659c1ce8ed2f

serhiy-storchaka · 2013-05-22T14:44:31Z

Oh, 2.7 still uses old doctests. It's a challenge to backport tests for this issue.

serhiy-storchaka · 2013-05-22T18:17:43Z

Due to the fact that ElementTree's documentation doesn't promise parsing Unicode string perhaps it shouldn't be backported to 2.7. At least I hadn't backported corresponded pyexpat changes (which affects pure Python ElementTree) to 2.7.

elibendersky · 2013-05-22T18:29:52Z

Agreed re 2.7; the problem is not important enough to warrant such a backport, due to the state of maintenance of 2.7 at this point.

serhiy-storchaka added topic-XML type-bug An unexpected behavior, bug, or error labels Jan 17, 2013

serhiy-storchaka closed this as completed May 22, 2013

serhiy-storchaka self-assigned this May 22, 2013

ezio-melotti transferred this issue from another repository Apr 10, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ElementTree incorrectly parses strings with declared encoding not UTF-8 #61190

ElementTree incorrectly parses strings with declared encoding not UTF-8 #61190

serhiy-storchaka commented Jan 17, 2013

serhiy-storchaka commented Jan 17, 2013

serhiy-storchaka commented Jan 17, 2013

serhiy-storchaka commented Feb 25, 2013

serhiy-storchaka commented Mar 4, 2013

serhiy-storchaka commented May 22, 2013

elibendersky mannequin commented May 22, 2013

python-dev mannequin commented May 22, 2013

serhiy-storchaka commented May 22, 2013

serhiy-storchaka commented May 22, 2013

elibendersky mannequin commented May 22, 2013

Navigation Menu

ElementTree incorrectly parses strings with declared encoding not UTF-8 #61190

ElementTree incorrectly parses strings with declared encoding not UTF-8 #61190

Comments

serhiy-storchaka commented Jan 17, 2013

serhiy-storchaka commented Jan 17, 2013

serhiy-storchaka commented Jan 17, 2013

serhiy-storchaka commented Feb 25, 2013

serhiy-storchaka commented Mar 4, 2013

serhiy-storchaka commented May 22, 2013

elibendersky mannequin commented May 22, 2013

python-dev mannequin commented May 22, 2013

serhiy-storchaka commented May 22, 2013

serhiy-storchaka commented May 22, 2013

elibendersky mannequin commented May 22, 2013