classification
Title: xml.etree.ElementTree encoding declaration should be capital ('UTF-8') rather than lowercase ('utf-8')
Type: behavior Stage: resolved
Components: XML Versions: Python 3.6, Python 3.4, Python 3.5
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: martin.panter Nosy List: Arfrever, berker.peksag, martin.panter, python-dev, scoder, zimeon
Priority: normal Keywords: patch

Created on 2015-09-09 19:43 by zimeon, last changed 2015-09-23 02:14 by martin.panter. This issue is now closed.

Files
File name Uploaded Description Edit
etree-encoding.patch martin.panter, 2015-09-18 02:39 review
Messages (7)
msg250328 - (view) Author: Simeon Warner (zimeon) Date: 2015-09-09 19:43
Seems that in python3 the XML encoding declaration from xml.etree.ElementTree has changed from 2.x in that it is now lowercased, e.g. 'utf-8'. While the XML spec [1] says that decoders _SHOULD_ understand this, the encoding string _SHOULD_ be 'UTF-8'. It seems that keeping to the standard in the vein of being strictly conformant in encoding, lax in decoding will give maximum compatibility.

It also seems like an unhelpful change for 2.x to 3.x migration though that is perhaps a minor issue (but how I noticed it).

Can show with:

>cat a.py
from xml.etree.ElementTree import ElementTree, Element
import os, sys
print(sys.version_info)
if sys.version_info > (3, 0):
    fp = os.fdopen(sys.stdout.fileno(), 'wb')
else:
    fp = sys.stdout
root = Element('hello',{'beer':'good'})
ElementTree(root).write(fp, encoding='UTF-8', xml_declaration=True)
fp.write(b"\n")

>python a.py
sys.version_info(major=2, minor=7, micro=5, releaselevel='final', serial=0)
<?xml version='1.0' encoding='UTF-8'?>
<hello beer="good" />

>python3 a.py
sys.version_info(major=3, minor=4, micro=2, releaselevel='final', serial=0)
<?xml version='1.0' encoding='utf-8'?>
<hello beer="good" />

Cheers,
Simeon

[1] <http://www.w3.org/TR/2006/REC-xml11-20060816/#NT-EncName> "In an encoding declaration, the values "UTF-8", "UTF-16", ... should be used for the various encodings and transformations of Unicode" and then later "XML processors should match character encoding names in a case-insensitive way".
msg250345 - (view) Author: Martin Panter (martin.panter) * (Python committer) Date: 2015-09-10 02:14
I agree that Python should not be converting the supplied encoding name to lowercase, although I guess reverting this has the potential to upset people’s output (e.g. if they depend on the checksum or something).
msg250930 - (view) Author: Martin Panter (martin.panter) * (Python committer) Date: 2015-09-18 02:39
Here is a patch which changes the code to respect the letter case specified by the user, although it still compares the special strings "unicode", "us-ascii", and "utf-8" case-insensitively, and the default encoding is still lowercase. Let me know what you think.

>>> tree = ElementTree(Element('hello', {'beer': 'good'}))
>>> tree.write(stdout.buffer, encoding="UTF-8", xml_declaration=True); print()
<?xml version='1.0' encoding='UTF-8'?>
<hello beer="good" />
>>> tree.write(stdout.buffer, encoding="UTF-8"); print()
<hello beer="good" />
>>> tree.write(stdout.buffer, xml_declaration=True); print()
<?xml version='1.0' encoding='us-ascii'?>
<hello beer="good" />
msg251211 - (view) Author: Stefan Behnel (scoder) * (Python committer) Date: 2015-09-21 07:14
LGTM
msg251223 - (view) Author: Simeon Warner (zimeon) Date: 2015-09-21 13:00
Path looks fine and seems to work as expected -- Simeon
msg251224 - (view) Author: Simeon Warner (zimeon) Date: 2015-09-21 13:00
s/Path/Patch/
msg251392 - (view) Author: Roundup Robot (python-dev) (Python triager) Date: 2015-09-23 02:08
New changeset ff7aba08ada6 by Martin Panter in branch '3.4':
Issue #25047: Respect case writing XML encoding declarations
https://hg.python.org/cpython/rev/ff7aba08ada6

New changeset 9c248233754c by Martin Panter in branch '3.5':
Issue #25047: Merge Element Tree encoding from 3.4 into 3.5
https://hg.python.org/cpython/rev/9c248233754c

New changeset 409bab2181d3 by Martin Panter in branch 'default':
Issue #25047: Merge Element Tree encoding from 3.5
https://hg.python.org/cpython/rev/409bab2181d3
History
Date User Action Args
2015-09-23 02:14:46martin.pantersetstatus: open -> closed
resolution: fixed
stage: commit review -> resolved
2015-09-23 02:08:32python-devsetnosy: + python-dev
messages: + msg251392
2015-09-23 02:07:47martin.pantersetassignee: martin.panter

nosy: + berker.peksag
stage: patch review -> commit review
2015-09-21 13:00:35zimeonsetmessages: + msg251224
2015-09-21 13:00:15zimeonsetmessages: + msg251223
2015-09-21 12:16:41Arfreversetnosy: + Arfrever
2015-09-21 07:14:49scodersetnosy: + scoder
messages: + msg251211
2015-09-18 02:39:41martin.pantersetfiles: + etree-encoding.patch
keywords: + patch
messages: + msg250930

stage: needs patch -> patch review
2015-09-11 02:26:28martin.pantersetstage: needs patch
versions: + Python 3.5, Python 3.6
2015-09-10 02:14:02martin.pantersetnosy: + martin.panter
messages: + msg250345
2015-09-09 19:43:38zimeoncreate