New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ElementTree incorrectly parses strings with declared encoding not UTF-8 #61190
Comments
>>> import xml.etree.ElementTree
>>> data = '<?xml version="1.0" encoding="iso-8859-1"?>\n<money value="$\xa3\u20ac\U0001017b">$\xa3\u20ac\U0001017b</money>'
>>> xml.etree.ElementTree.tostring(xml.etree.ElementTree.fromstring(data), 'unicode')
'<money value="$£â\x82¬ð\x90\x85»">$£â\x82¬ð\x90\x85»</money>' |
Patch for bpo-10590 fixes this for Python implementation of ElementTree, but not for C implementation. |
Here is a patch for C implementation. Python implementation was fixed in bpo-17089. |
Here is an updated patch. |
LGTM |
New changeset 7781ccae7b9a by Serhiy Storchaka in branch '3.3': New changeset 659c1ce8ed2f by Serhiy Storchaka in branch 'default': |
Oh, 2.7 still uses old doctests. It's a challenge to backport tests for this issue. |
Due to the fact that ElementTree's documentation doesn't promise parsing Unicode string perhaps it shouldn't be backported to 2.7. At least I hadn't backported corresponded pyexpat changes (which affects pure Python ElementTree) to 2.7. |
Agreed re 2.7; the problem is not important enough to warrant such a backport, due to the state of maintenance of 2.7 at this point. |
Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.
Show more details
GitHub fields:
bugs.python.org fields:
The text was updated successfully, but these errors were encountered: