Message 250328 - Python tracker

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author	zimeon
Recipients	zimeon
Date	2015-09-09.19:43:38
SpamBayes Score	-1.0
Marked as misclassified	Yes
Message-id	<1441827818.89.0.928682692456.issue25047@psf.upfronthosting.co.za>
In-reply-to

Content
Seems that in python3 the XML encoding declaration from xml.etree.ElementTree has changed from 2.x in that it is now lowercased, e.g. 'utf-8'. While the XML spec [1] says that decoders _SHOULD_ understand this, the encoding string _SHOULD_ be 'UTF-8'. It seems that keeping to the standard in the vein of being strictly conformant in encoding, lax in decoding will give maximum compatibility. It also seems like an unhelpful change for 2.x to 3.x migration though that is perhaps a minor issue (but how I noticed it). Can show with: >cat a.py from xml.etree.ElementTree import ElementTree, Element import os, sys print(sys.version_info) if sys.version_info > (3, 0): fp = os.fdopen(sys.stdout.fileno(), 'wb') else: fp = sys.stdout root = Element('hello',{'beer':'good'}) ElementTree(root).write(fp, encoding='UTF-8', xml_declaration=True) fp.write(b"\n") >python a.py sys.version_info(major=2, minor=7, micro=5, releaselevel='final', serial=0) <?xml version='1.0' encoding='UTF-8'?> <hello beer="good" /> >python3 a.py sys.version_info(major=3, minor=4, micro=2, releaselevel='final', serial=0) <?xml version='1.0' encoding='utf-8'?> <hello beer="good" /> Cheers, Simeon [1] <http://www.w3.org/TR/2006/REC-xml11-20060816/#NT-EncName> "In an encoding declaration, the values "UTF-8", "UTF-16", ... should be used for the various encodings and transformations of Unicode" and then later "XML processors should match character encoding names in a case-insensitive way".

Seems that in python3 the XML encoding declaration from xml.etree.ElementTree has changed from 2.x in that it is now lowercased, e.g. 'utf-8'. While the XML spec [1] says that decoders _SHOULD_ understand this, the encoding string _SHOULD_ be 'UTF-8'. It seems that keeping to the standard in the vein of being strictly conformant in encoding, lax in decoding will give maximum compatibility.

It also seems like an unhelpful change for 2.x to 3.x migration though that is perhaps a minor issue (but how I noticed it).

Can show with:

>cat a.py
from xml.etree.ElementTree import ElementTree, Element
import os, sys
print(sys.version_info)
if sys.version_info > (3, 0):
    fp = os.fdopen(sys.stdout.fileno(), 'wb')
else:
    fp = sys.stdout
root = Element('hello',{'beer':'good'})
ElementTree(root).write(fp, encoding='UTF-8', xml_declaration=True)
fp.write(b"\n")

>python a.py
sys.version_info(major=2, minor=7, micro=5, releaselevel='final', serial=0)
<?xml version='1.0' encoding='UTF-8'?>
<hello beer="good" />

>python3 a.py
sys.version_info(major=3, minor=4, micro=2, releaselevel='final', serial=0)
<?xml version='1.0' encoding='utf-8'?>
<hello beer="good" />

Cheers,
Simeon

[1] <http://www.w3.org/TR/2006/REC-xml11-20060816/#NT-EncName> "In an encoding declaration, the values "UTF-8", "UTF-16", ... should be used for the various encodings and transformations of Unicode" and then later "XML processors should match character encoding names in a case-insensitive way".

History
Date	User	Action	Args
2015-09-09 19:43:38	zimeon	set	recipients: + zimeon
2015-09-09 19:43:38	zimeon	set	messageid: <1441827818.89.0.928682692456.issue25047@psf.upfronthosting.co.za>
2015-09-09 19:43:38	zimeon	link	issue25047 messages
2015-09-09 19:43:38	zimeon	create