classification
Title: xml.etree.ElementTree: add feature to prettify XML output
Type: enhancement Stage: patch review
Components: Library (Lib), XML Versions: Python 3.8, Python 3.7
process
Status: open Resolution:
Dependencies: Superseder:
Assigned To: serhiy.storchaka Nosy List: Clayton Olney, alex.dzyoba, alex.henderson, eli.bendersky, eric.araujo, eric.snow, loewis, martin.panter, mcepl, rhettinger, santoso.wijaya, scoder, serhiy.storchaka, tshepang, vstinner, wolma
Priority: normal Keywords: patch

Created on 2012-04-01 15:28 by tshepang, last changed 2019-02-12 14:36 by Clayton Olney.

Files
File name Uploaded Description Edit
issue14465.patch alex.henderson, 2013-08-05 21:14 pretty printer patch, as implemented for issue 17372. review
Pull Requests
URL Status Linked Edit
PR 4016 open alex.dzyoba, 2017-10-17 07:47
PR 8933 open mcepl, 2018-08-25 19:22
Messages (13)
msg157299 - (view) Author: Tshepang Lekhonkhobe (tshepang) * Date: 2012-04-01 15:28
I often miss lxml's "pretty_print=True" functionality. Can you implement something similar.
msg157317 - (view) Author: Martin v. Löwis (loewis) * (Python committer) Date: 2012-04-01 16:49
Would you like to provide a patch?
msg157320 - (view) Author: Eli Bendersky (eli.bendersky) * (Python committer) Date: 2012-04-01 17:59
Tshepang,

Frankly, there are a lot of issues to solve in ElementTree (it hasn't been given love in a long time...) and such features would be low priority, as I'm not getting much help and am swamped already.

As Martin said, patches can go a long way here...
msg157325 - (view) Author: Tshepang Lekhonkhobe (tshepang) * Date: 2012-04-01 19:08
Okay, I will try, even though C scares me.
msg157647 - (view) Author: Éric Araujo (eric.araujo) * (Python committer) Date: 2012-04-06 06:24
You may be able to code it entirely in the Python part of the module (adding a new parameter to Element.write and tostring).
msg194313 - (view) Author: Eli Bendersky (eli.bendersky) * (Python committer) Date: 2013-08-03 22:26
A patch exists in the duplicate #17372
msg194508 - (view) Author: Alex Henderson (alex.henderson) * Date: 2013-08-05 21:14
Proposed patch copied over from duplicate issue 17372.
msg194902 - (view) Author: Stefan Behnel (scoder) * Date: 2013-08-11 16:30
Just to reiterate this point, lxml.etree supports a "pretty_print" flag in its tostring() function and ElementTree.write(). It would thus make sense to support the same thing in ET.

http://lxml.de/api.html#serialisation

For completeness, the current signature looks like this:

def tostring(element_or_tree, *, encoding=None, method="xml",
             xml_declaration=None, pretty_print=False,
             with_tail=True, standalone=None, doctype=None,
             exclusive=False, with_comments=True,
             inclusive_ns_prefixes=None):

(The last three options are for C14N serialisation.)
msg304362 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2017-10-13 21:16
For the record, at 2015-04-02, the bpo-23847 has been marked as a duplicate of this issue.
msg304872 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2017-10-24 08:46
My thoughts:

1. Whitespaces are significant in XML. Pretty-printed XML is different from the original XML to an XML parser. For some applications some whitespaces around tags are not significant. But this depends on the application and in different parts of the document whitespaces can have different meaning. For example the document can contain a metadata with insignificant whitespaces and marked up text with significant whitespaces. There is a special attribute named xml:space that can signal the meaning of whitespaces for the part of a document.

https://www.w3.org/TR/xml/#sec-white-space

2. In HTML whitespaces around <P> are insignificant, but whitespaces around <I> are significant. Whitespaces inside <PRE>...</PRE> are significant.

3. If strip whitespaces around tags and insert newlines and indentations, shouldn't we strip whitespaces inside the text context? Or preserve newlines but update indentations?

4. If modify whitespaces on output, it may be worth to add an option to ignore insignificant whitespaces on input.

5. Serialization of ElementTree in the stdlib is much slower than in lxml (see issue25881). Perhaps it should be implemented in C. But it should be kept simple for this. Pretty-printing can be implemented as an outher preprocessing operation (for example the original Eli's code indents the tree in-place: http://effbot.org/zone/element-lib.htm#prettyprint) or as a proxy that indents elements on-fly.
msg323690 - (view) Author: Stefan Behnel (scoder) * Date: 2018-08-18 05:06
> Serialization of ElementTree in the stdlib is much slower than in lxml (see issue25881). Perhaps it should be implemented in C. But it should be kept simple for this.

Should I say it? That's a first class use case for Cython.


> Pretty-printing can be implemented as an outher preprocessing operation

Agreed. And that would actually be much simpler to implement in C.
msg324098 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2018-08-25 20:00
A few more thoughts for consideration:

* We already have a toprettyxml() tool in the minidom package.

* Since whitespace is significant in XML, prettifying changes the content and meaning, so it doesn't round-trip and should only be used for debugging purposes.

* Usually, I recommend using XML viewers such as the one built into the Chrome browser.  That provides indentation without changing meaning. It also lets you run searches and conveniently supports folding and unfolding elements.   I would rather someone use a viewer rather than something like toprettyxml().
msg335306 - (view) Author: Clayton Olney (Clayton Olney) Date: 2019-02-12 14:36
I have a use case where the receiving application is expecting the indentation, and I need to run my code in Lambda. So, lxml is out of the question.
History
Date User Action Args
2019-02-12 14:36:01Clayton Olneysetnosy: + Clayton Olney
messages: + msg335306
2018-08-25 20:00:03rhettingersetnosy: + rhettinger
messages: + msg324098
2018-08-25 19:22:26mceplsetpull_requests: + pull_request8405
2018-08-18 05:06:24scodersetmessages: + msg323690
2018-08-17 15:23:10mceplsetversions: + Python 3.8
2017-10-24 08:46:06serhiy.storchakasetmessages: + msg304872
2017-10-22 20:32:54wolmasetnosy: + wolma
2017-10-18 12:19:04serhiy.storchakasetassignee: serhiy.storchaka

nosy: + serhiy.storchaka
components: + XML
versions: + Python 3.7, - Python 3.4
2017-10-17 07:47:12alex.dzyobasetstage: patch review
pull_requests: + pull_request3990
2017-10-14 17:31:29alex.dzyobasetnosy: + alex.dzyoba
2017-10-13 21:16:29vstinnersetnosy: + vstinner
messages: + msg304362
2015-04-02 08:20:47serhiy.storchakalinkissue23847 superseder
2015-01-27 10:23:02martin.pantersetnosy: + martin.panter
2015-01-27 09:26:05mceplsetnosy: + mcepl
2013-08-11 17:56:38eric.snowsetnosy: + eric.snow
2013-08-11 16:30:21scodersetnosy: + scoder

messages: + msg194902
versions: + Python 3.4, - Python 3.3
2013-08-05 21:14:48alex.hendersonsetfiles: + issue14465.patch

nosy: + alex.henderson
messages: + msg194508

keywords: + patch
2013-08-03 22:26:04eli.benderskysetmessages: + msg194313
2013-08-03 22:25:43eli.benderskylinkissue17372 superseder
2012-04-06 06:24:10eric.araujosetnosy: + eric.araujo
messages: + msg157647
2012-04-05 22:20:48santoso.wijayasetnosy: + santoso.wijaya
2012-04-01 19:08:10tshepangsetmessages: + msg157325
2012-04-01 17:59:20eli.benderskysetmessages: + msg157320
2012-04-01 16:49:32loewissetnosy: + loewis
messages: + msg157317
2012-04-01 16:43:46r.david.murraysettype: enhancement
2012-04-01 15:29:13tshepangsettitle: add feature to prettify XML output -> xml.etree.ElementTree: add feature to prettify XML output
2012-04-01 15:28:03tshepangcreate