Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

xml.etree.ElementTree encoding declaration should be capital ('UTF-8') rather than lowercase ('utf-8') #69235

Closed
zimeon mannequin opened this issue Sep 9, 2015 · 7 comments
Assignees
Labels
topic-XML type-bug An unexpected behavior, bug, or error

Comments

@zimeon
Copy link
Mannequin

zimeon mannequin commented Sep 9, 2015

BPO 25047
Nosy @scoder, @berkerpeksag, @vadmium
Files
  • etree-encoding.patch
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = 'https://github.com/vadmium'
    closed_at = <Date 2015-09-23.02:14:46.849>
    created_at = <Date 2015-09-09.19:43:38.872>
    labels = ['expert-XML', 'type-bug']
    title = "xml.etree.ElementTree encoding declaration should be capital ('UTF-8') rather than lowercase ('utf-8')"
    updated_at = <Date 2015-09-23.02:14:46.848>
    user = 'https://bugs.python.org/zimeon'

    bugs.python.org fields:

    activity = <Date 2015-09-23.02:14:46.848>
    actor = 'martin.panter'
    assignee = 'martin.panter'
    closed = True
    closed_date = <Date 2015-09-23.02:14:46.849>
    closer = 'martin.panter'
    components = ['XML']
    creation = <Date 2015-09-09.19:43:38.872>
    creator = 'zimeon'
    dependencies = []
    files = ['40496']
    hgrepos = []
    issue_num = 25047
    keywords = ['patch']
    message_count = 7.0
    messages = ['250328', '250345', '250930', '251211', '251223', '251224', '251392']
    nosy_count = 6.0
    nosy_names = ['scoder', 'Arfrever', 'python-dev', 'berker.peksag', 'martin.panter', 'zimeon']
    pr_nums = []
    priority = 'normal'
    resolution = 'fixed'
    stage = 'resolved'
    status = 'closed'
    superseder = None
    type = 'behavior'
    url = 'https://bugs.python.org/issue25047'
    versions = ['Python 3.4', 'Python 3.5', 'Python 3.6']

    @zimeon
    Copy link
    Mannequin Author

    zimeon mannequin commented Sep 9, 2015

    Seems that in python3 the XML encoding declaration from xml.etree.ElementTree has changed from 2.x in that it is now lowercased, e.g. 'utf-8'. While the XML spec [1] says that decoders _SHOULD_ understand this, the encoding string _SHOULD_ be 'UTF-8'. It seems that keeping to the standard in the vein of being strictly conformant in encoding, lax in decoding will give maximum compatibility.

    It also seems like an unhelpful change for 2.x to 3.x migration though that is perhaps a minor issue (but how I noticed it).

    Can show with:

    cat a.py
    from xml.etree.ElementTree import ElementTree, Element
    import os, sys
    print(sys.version_info)
    if sys.version_info > (3, 0):
    fp = os.fdopen(sys.stdout.fileno(), 'wb')
    else:
    fp = sys.stdout
    root = Element('hello',{'beer':'good'})
    ElementTree(root).write(fp, encoding='UTF-8', xml_declaration=True)
    fp.write(b"\n")

    python a.py
    sys.version_info(major=2, minor=7, micro=5, releaselevel='final', serial=0)
    <?xml version='1.0' encoding='UTF-8'?>
    <hello beer="good" />

    python3 a.py
    sys.version_info(major=3, minor=4, micro=2, releaselevel='final', serial=0)
    <?xml version='1.0' encoding='utf-8'?>
    <hello beer="good" />

    Cheers,
    Simeon

    [1] <http://www.w3.org/TR/2006/REC-xml11-20060816/#NT-EncName\> "In an encoding declaration, the values "UTF-8", "UTF-16", ... should be used for the various encodings and transformations of Unicode" and then later "XML processors should match character encoding names in a case-insensitive way".

    @zimeon zimeon mannequin added topic-XML type-bug An unexpected behavior, bug, or error labels Sep 9, 2015
    @vadmium
    Copy link
    Member

    vadmium commented Sep 10, 2015

    I agree that Python should not be converting the supplied encoding name to lowercase, although I guess reverting this has the potential to upset people’s output (e.g. if they depend on the checksum or something).

    @vadmium
    Copy link
    Member

    vadmium commented Sep 18, 2015

    Here is a patch which changes the code to respect the letter case specified by the user, although it still compares the special strings "unicode", "us-ascii", and "utf-8" case-insensitively, and the default encoding is still lowercase. Let me know what you think.

    >>> tree = ElementTree(Element('hello', {'beer': 'good'}))
    >>> tree.write(stdout.buffer, encoding="UTF-8", xml_declaration=True); print()
    <?xml version='1.0' encoding='UTF-8'?>
    <hello beer="good" />
    >>> tree.write(stdout.buffer, encoding="UTF-8"); print()
    <hello beer="good" />
    >>> tree.write(stdout.buffer, xml_declaration=True); print()
    <?xml version='1.0' encoding='us-ascii'?>
    <hello beer="good" />

    @scoder
    Copy link
    Contributor

    scoder commented Sep 21, 2015

    LGTM

    @zimeon
    Copy link
    Mannequin Author

    zimeon mannequin commented Sep 21, 2015

    Path looks fine and seems to work as expected -- Simeon

    @zimeon
    Copy link
    Mannequin Author

    zimeon mannequin commented Sep 21, 2015

    s/Path/Patch/

    @vadmium vadmium self-assigned this Sep 23, 2015
    @python-dev
    Copy link
    Mannequin

    python-dev mannequin commented Sep 23, 2015

    New changeset ff7aba08ada6 by Martin Panter in branch '3.4':
    Issue bpo-25047: Respect case writing XML encoding declarations
    https://hg.python.org/cpython/rev/ff7aba08ada6

    New changeset 9c248233754c by Martin Panter in branch '3.5':
    Issue bpo-25047: Merge Element Tree encoding from 3.4 into 3.5
    https://hg.python.org/cpython/rev/9c248233754c

    New changeset 409bab2181d3 by Martin Panter in branch 'default':
    Issue bpo-25047: Merge Element Tree encoding from 3.5
    https://hg.python.org/cpython/rev/409bab2181d3

    @vadmium vadmium closed this as completed Sep 23, 2015
    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    topic-XML type-bug An unexpected behavior, bug, or error
    Projects
    None yet
    Development

    No branches or pull requests

    2 participants