Issue 27940: xml.etree: Avoid XML declaration for the "ascii" encoding

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

This issue has been migrated to GitHub: https://github.com/python/cpython/issues/72127

classification

Title:	xml.etree: Avoid XML declaration for the "ascii" encoding
Type:	enhancement	Stage:	resolved
Components:	XML	Versions:	Python 3.6

process

Status:	closed	Resolution:	out of date
Dependencies:		Superseder:
Assigned To:		Nosy List:	scoder, serhiy.storchaka, vstinner
Priority:	normal	Keywords:	patch

Created on 2016-09-02 10:57 by vstinner, last changed 2022-04-11 14:58 by admin. This issue is now closed.

Files
File name	Uploaded	Description	Edit
etree_xml_declaration.patch	vstinner, 2016-09-02 10:57		review
etree_xml_declaration-2.patch	vstinner, 2016-09-06 00:30		review

Messages (6)
msg274227 - (view)	Author: STINNER Victor (vstinner) *	Date: 2016-09-02 10:57
The ElementTree module (xml.etree) avoids the XML declaration for "utf-8" and "us-ascii" codecs, but not for the "ascii" encoding. Attached patch avoids the XML declaration for the "ascii" codec since it's a subset of UTF-8 and UTF-8 is the default encoding of XML. The patch also normalizes the encoding name to handle aliases like "utf8" (UTF-8) or "us_ascii" (ASCII). The patch adds unit tests. -- By the way, I'm surprised that the special encoding "unicode" relies on the current locale encoding when the XML declaration is requested. Why not alway susing UTF-8 for unicode instead of the locale encoding? My unit test tests different locale encodings.
msg274228 - (view)	Author: STINNER Victor (vstinner) *	Date: 2016-09-02 11:00
Note: I found the "us-ascii" special case when reviewing the issue #27915 which proposed to replace "us-ascii" with "ascii" in the xml.etree module to use the Python fast-path for performance.
msg274231 - (view)	Author: Serhiy Storchaka (serhiy.storchaka) *	Date: 2016-09-02 11:26
> By the way, I'm surprised that the special encoding "unicode" relies on the current locale encoding when the XML declaration is requested. Why not alway susing UTF-8 for unicode instead of the locale encoding? Because it is usually outputs to sys.stdout or a file opened with default encoding. Agreed, the current locale encoding is not the best choice. It would be better to look at the encoding attribute of the file and fallback to utf-8 or ascii.
msg274250 - (view)	Author: Stefan Behnel (scoder) *	Date: 2016-09-02 15:40
> By the way, I'm surprised that the special encoding "unicode" relies on the current locale encoding when the XML declaration is requested. That seems a weird choice. Since it serialises to a Unicode string, it shouldn't have any XML declaration at all, if only to make it easy for users to add one themselves if they feel like it. I guess it's too late to change that now, though...
msg274480 - (view)	Author: STINNER Victor (vstinner) *	Date: 2016-09-06 00:30
New patch: * Avoid codecs.lookup() for method != "xml" * More unit tests
msg297098 - (view)	Author: STINNER Victor (vstinner) *	Date: 2017-06-28 01:04
I lost track of this change. I'm not sure why I proposed it, and I don't feel confortable to change the ElementTree code :-/ I fear a regression, so I prefer to abandon my change.

History
Date	User	Action	Args
2022-04-11 14:58:35	admin	set	github: 72127
2017-06-28 01:04:05	vstinner	set	status: open -> closed resolution: out of date messages: + msg297098 stage: resolved
2016-09-06 00:30:11	vstinner	set	files: + etree_xml_declaration-2.patch messages: + msg274480
2016-09-02 15:40:20	scoder	set	nosy: + scoder messages: + msg274250
2016-09-02 11:26:18	serhiy.storchaka	set	nosy: + serhiy.storchaka messages: + msg274231
2016-09-02 11:00:26	vstinner	set	messages: + msg274228
2016-09-02 10:57:59	vstinner	create