classification
Title: xml.etree: Avoid XML declaration for the "ascii" encoding
Type: enhancement Stage: resolved
Components: XML Versions: Python 3.6
process
Status: closed Resolution: out of date
Dependencies: Superseder:
Assigned To: Nosy List: scoder, serhiy.storchaka, vstinner
Priority: normal Keywords: patch

Created on 2016-09-02 10:57 by vstinner, last changed 2017-06-28 01:04 by vstinner. This issue is now closed.

Files
File name Uploaded Description Edit
etree_xml_declaration.patch vstinner, 2016-09-02 10:57 review
etree_xml_declaration-2.patch vstinner, 2016-09-06 00:30 review
Messages (6)
msg274227 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2016-09-02 10:57
The ElementTree module (xml.etree) avoids the XML declaration for "utf-8" and "us-ascii" codecs, but not for the "ascii" encoding.

Attached patch avoids the XML declaration for the "ascii" codec since it's a subset of UTF-8 and UTF-8 is the default encoding of XML.

The patch also normalizes the encoding name to handle aliases like "utf8" (UTF-8) or "us_ascii" (ASCII).

The patch adds unit tests.

--

By the way, I'm surprised that the special encoding "unicode" relies on the *current* locale encoding when the XML declaration is requested. Why not alway susing UTF-8 for *unicode* instead of the locale encoding?

My unit test tests different locale encodings.
msg274228 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2016-09-02 11:00
Note: I found the "us-ascii" special case when reviewing the issue #27915 which proposed to replace "us-ascii" with "ascii" in the xml.etree module to use the Python fast-path for performance.
msg274231 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2016-09-02 11:26
> By the way, I'm surprised that the special encoding "unicode" relies on the *current* locale encoding when the XML declaration is requested. Why not alway susing UTF-8 for *unicode* instead of the locale encoding?

Because it is usually outputs to sys.stdout or a file opened with default encoding. Agreed, the current locale encoding is not the best choice. It would be better to look at the encoding attribute of the file and fallback to utf-8 or ascii.
msg274250 - (view) Author: Stefan Behnel (scoder) * (Python committer) Date: 2016-09-02 15:40
> By the way, I'm surprised that the special encoding "unicode" relies on the *current* locale encoding when the XML declaration is requested.

That seems a weird choice. Since it serialises to a Unicode string, it shouldn't have any XML declaration at all, if only to make it easy for users to add one themselves if they feel like it.

I guess it's too late to change that now, though...
msg274480 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2016-09-06 00:30
New patch:

* Avoid codecs.lookup() for method != "xml"
* More unit tests
msg297098 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2017-06-28 01:04
I lost track of this change. I'm not sure why I proposed it, and I don't feel confortable to change the ElementTree code :-/ I fear a regression, so I prefer to abandon my change.
History
Date User Action Args
2017-06-28 01:04:05vstinnersetstatus: open -> closed
resolution: out of date
messages: + msg297098

stage: resolved
2016-09-06 00:30:11vstinnersetfiles: + etree_xml_declaration-2.patch

messages: + msg274480
2016-09-02 15:40:20scodersetnosy: + scoder
messages: + msg274250
2016-09-02 11:26:18serhiy.storchakasetnosy: + serhiy.storchaka
messages: + msg274231
2016-09-02 11:00:26vstinnersetmessages: + msg274228
2016-09-02 10:57:59vstinnercreate