Issue 9522: xml.etree.ElementTree forgets the encoding

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

This issue has been migrated to GitHub: https://github.com/python/cpython/issues/53731

classification

Title:	xml.etree.ElementTree forgets the encoding
Type:	enhancement	Stage:
Components:	Library (Lib), XML	Versions:	Python 3.4

process

Status:	open	Resolution:
Dependencies:		Superseder:
Assigned To:		Nosy List:	effbot, flox, mark, scoder, serhiy.storchaka
Priority:	normal	Keywords:

Created on 2010-08-05 09:32 by mark, last changed 2022-04-11 14:57 by admin.

Messages (7)
msg112962 - (view)	Author: Mark Summerfield (mark) *	Date: 2010-08-05 09:32
If you read in an XML file that specifies its encoding and then later on use xml.etree.ElementTree.write(), it is always written using US-ASCII. I think the behaviour should be different: (1) If the XML that was read included an encoding, that encoding should be remembered and used when writing. (2) If there is no encoding the default for writing should be UTF-8 (which is the standard for XML files). (3) For non-XML files use US-ASCII. Naturally, any of these could be overridden using an encoding argument to the write() method.
msg113118 - (view)	Author: Florent Xicluna (flox) *	Date: 2010-08-06 17:29
It behaves as documented. Moved to "feature request". http://docs.python.org/library/xml.etree.elementtree.html
msg113238 - (view)	Author: Stefan Behnel (scoder) *	Date: 2010-08-08 07:44
I think it makes sense to keep input and output separate. After all, the part of the software that outputs a document doesn't necessarily know how it came in, so having the default output encoding depend on the input sounds error prone. Encoding should always be explicit. My advice is to reject this feature request.
msg113663 - (view)	Author: Mark Summerfield (mark) *	Date: 2010-08-12 07:20
Perhaps a useful compromise would be to add an "encoding" attribute that is set to the encoding of the XML file that's read in (and with a default of "ascii"). That way it would be possible to preserve the encoding, e.g.: import xml.etree.ElementTree as etree xml_tree = etree.ElementTree(in_filehandle) # process the tree xml_tree.write(out_filehandle, encoding=xml_tree.encoding)
msg113666 - (view)	Author: Stefan Behnel (scoder) *	Date: 2010-08-12 08:05
lxml.etree has encapsulated this in a 'docinfo' property which also holds the XML 'version', the 'standalone' state and the DOCTYPE (if available). Note that this information is readily available in lxml.etree for any parsed Element (by wrapping it in a new ElementTree), but not in ET where it can only be associated to the ElementTree instance that did the parsing, not one that just wraps a parsed tree of Element objects. I would expect that this is still enough to handle this use case, though. Stefan
msg113667 - (view)	Author: Mark Summerfield (mark) *	Date: 2010-08-12 08:21
I don't see how lxml is relevant here? lxml is a third party library, whereas etree is part of the standard library. And according to the 3.1.2 docs etree doesn't have a docinfo (or any other) property.
msg113670 - (view)	Author: Stefan Behnel (scoder) *	Date: 2010-08-12 09:27
That's why I mention it here to prevent future incompatibilities between the two libraries.

History
Date	User	Action	Args
2022-04-11 14:57:04	admin	set	github: 53731
2013-01-07 16:19:34	serhiy.storchaka	set	versions: + Python 3.4, - Python 3.2, Python 3.3
2012-07-14 18:48:49	serhiy.storchaka	set	nosy: + serhiy.storchaka
2010-08-12 09:27:39	scoder	set	messages: + msg113670
2010-08-12 08:21:29	mark	set	messages: + msg113667
2010-08-12 08:05:17	scoder	set	messages: + msg113666
2010-08-12 07:20:19	mark	set	messages: + msg113663
2010-08-08 07:44:26	scoder	set	messages: + msg113238
2010-08-06 17:32:39	flox	set	nosy: + scoder
2010-08-06 17:29:48	flox	set	type: behavior -> enhancement messages: + msg113118 components: + XML versions: + Python 3.2, Python 3.3, - Python 3.1
2010-08-06 03:23:51	r.david.murray	set	nosy: + effbot, flox
2010-08-05 09:32:08	mark	create