classification
Title: xml.etree.ElementTree.tostring violates W3 standards allowing encoding='unicode' without error
Type: behavior Stage:
Components: XML Versions: Python 3.8, Python 3.7, Python 3.6, Python 3.5, Python 3.4
process
Status: open Resolution:
Dependencies: Superseder:
Assigned To: Nosy List: Zim, scoder
Priority: normal Keywords:

Created on 2018-12-06 18:08 by Zim, last changed 2018-12-07 11:19 by scoder.

Messages (2)
msg331242 - (view) Author: EZ (Zim) Date: 2018-12-06 18:08
The documentation[0] for 3.x of xml.etree.ElementTree.tostring is quite clear:

> Use encoding="unicode" to generate a Unicode string.

See also the creation of the problem:
https://bugs.python.org/issue10942

This is a violation of W3 standards, referenced by the ElementTree documentation[1] claiming it must conform to these standards, which state:

...it is a fatal error for an entity including an encoding declaration to be presented to the XML processor in an encoding other than that named in the declaration....

Encoding for 'unicode' does not appear in the named declarations (https://www.iana.org/assignments/character-sets/character-sets.xhtml) referenced by the same documentation[1].

Handling of a fatal error, must, in part: 

Once a fatal error is detected, however, the processor MUST NOT continue normal processing (i.e., it MUST NOT continue to pass character data and information about the document's logical structure to the application in the normal way)

[0] https://docs.python.org/3.2/library/xml.etree.elementtree.html
[1] The encoding string included in XML output should conform to the appropriate standards. For example, “UTF-8” is valid, but “UTF8” is not. See http://www.w3.org/TR/2006/REC-xml11-20060816/#NT-EncodingDecl and http://www.iana.org/assignments/character-sets.
msg331290 - (view) Author: Stefan Behnel (scoder) * Date: 2018-12-07 11:19
What exactly is the problem here? encoding='unicode' will never appear in the XML declaration, and thus will never be "presented to XML processors". It is up to the user to deal with encodings in this case, which I think is fine. It's them who asked for the non-encoded result, after all.

The XML spec does not forbid XML tools to grow convenience features, and that's what I think this is. Is there any problem with this feature, besides not being covered by the XML spec?
History
Date User Action Args
2018-12-07 11:19:24scodersetnosy: + scoder
messages: + msg331290
2018-12-06 18:08:42Zimcreate