Navigation Menu

Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ElementTree incorrectly parses strings with declared encoding not UTF-8 #61190

Closed
serhiy-storchaka opened this issue Jan 17, 2013 · 10 comments
Closed
Assignees
Labels
topic-XML type-bug An unexpected behavior, bug, or error

Comments

@serhiy-storchaka
Copy link
Member

BPO 16986
Nosy @serhiy-storchaka
Dependencies
  • bpo-17089: Expat parser parses strings only when XML encoding is UTF-8
  • Files
  • etree_parse_str.patch
  • etree_parse_str_2.patch
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = 'https://github.com/serhiy-storchaka'
    closed_at = <Date 2013-05-22.18:17:43.288>
    created_at = <Date 2013-01-17.16:54:18.910>
    labels = ['expert-XML', 'type-bug']
    title = 'ElementTree incorrectly parses strings with declared encoding not UTF-8'
    updated_at = <Date 2013-05-22.18:29:51.731>
    user = 'https://github.com/serhiy-storchaka'

    bugs.python.org fields:

    activity = <Date 2013-05-22.18:29:51.731>
    actor = 'eli.bendersky'
    assignee = 'serhiy.storchaka'
    closed = True
    closed_date = <Date 2013-05-22.18:17:43.288>
    closer = 'serhiy.storchaka'
    components = ['XML']
    creation = <Date 2013-01-17.16:54:18.910>
    creator = 'serhiy.storchaka'
    dependencies = ['17089']
    files = ['29233', '30341']
    hgrepos = []
    issue_num = 16986
    keywords = ['patch']
    message_count = 10.0
    messages = ['180143', '180144', '182950', '183456', '189816', '189817', '189819', '189820', '189832', '189833']
    nosy_count = 3.0
    nosy_names = ['eli.bendersky', 'python-dev', 'serhiy.storchaka']
    pr_nums = []
    priority = 'normal'
    resolution = 'fixed'
    stage = 'resolved'
    status = 'closed'
    superseder = None
    type = 'behavior'
    url = 'https://bugs.python.org/issue16986'
    versions = ['Python 3.3', 'Python 3.4']

    @serhiy-storchaka
    Copy link
    Member Author

    >>> import xml.etree.ElementTree
    >>> data = '<?xml version="1.0" encoding="iso-8859-1"?>\n<money value="$\xa3\u20ac\U0001017b">$\xa3\u20ac\U0001017b</money>'
    >>> xml.etree.ElementTree.tostring(xml.etree.ElementTree.fromstring(data), 'unicode')
    '<money value="$£â\x82¬ð\x90\x85»">$£â\x82¬ð\x90\x85»</money>'

    @serhiy-storchaka serhiy-storchaka added topic-XML type-bug An unexpected behavior, bug, or error labels Jan 17, 2013
    @serhiy-storchaka
    Copy link
    Member Author

    Patch for bpo-10590 fixes this for Python implementation of ElementTree, but not for C implementation.

    @serhiy-storchaka
    Copy link
    Member Author

    Here is a patch for C implementation. Python implementation was fixed in bpo-17089.

    @serhiy-storchaka
    Copy link
    Member Author

    Eli, this issue no longer has open pre-requisites. bpo-10590 was replaced by bpo-17089 which closed now. bpo-17089 fixed Python interface to expat parser, but cElementTree uses C interface of expat directly and the proposed pathes fix it.

    @serhiy-storchaka
    Copy link
    Member Author

    Here is an updated patch.

    @elibendersky
    Copy link
    Mannequin

    elibendersky mannequin commented May 22, 2013

    LGTM

    @python-dev
    Copy link
    Mannequin

    python-dev mannequin commented May 22, 2013

    New changeset 7781ccae7b9a by Serhiy Storchaka in branch '3.3':
    Issue bpo-16986: ElementTree now correctly parses a string input not only when
    http://hg.python.org/cpython/rev/7781ccae7b9a

    New changeset 659c1ce8ed2f by Serhiy Storchaka in branch 'default':
    Issue bpo-16986: ElementTree now correctly parses a string input not only when
    http://hg.python.org/cpython/rev/659c1ce8ed2f

    @serhiy-storchaka
    Copy link
    Member Author

    Oh, 2.7 still uses old doctests. It's a challenge to backport tests for this issue.

    @serhiy-storchaka
    Copy link
    Member Author

    Due to the fact that ElementTree's documentation doesn't promise parsing Unicode string perhaps it shouldn't be backported to 2.7. At least I hadn't backported corresponded pyexpat changes (which affects pure Python ElementTree) to 2.7.

    @elibendersky
    Copy link
    Mannequin

    elibendersky mannequin commented May 22, 2013

    Agreed re 2.7; the problem is not important enough to warrant such a backport, due to the state of maintenance of 2.7 at this point.

    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    topic-XML type-bug An unexpected behavior, bug, or error
    Projects
    None yet
    Development

    No branches or pull requests

    1 participant