Author serhiy.storchaka
Recipients alex.dzyoba, alex.henderson, eli.bendersky, eric.araujo, eric.snow, loewis, martin.panter, mcepl, santoso.wijaya, scoder, serhiy.storchaka, tshepang, vstinner, wolma
Date 2017-10-24.08:46:06
SpamBayes Score -1.0
Marked as misclassified Yes
Message-id <1508834766.56.0.213398074469.issue14465@psf.upfronthosting.co.za>
In-reply-to
Content
My thoughts:

1. Whitespaces are significant in XML. Pretty-printed XML is different from the original XML to an XML parser. For some applications some whitespaces around tags are not significant. But this depends on the application and in different parts of the document whitespaces can have different meaning. For example the document can contain a metadata with insignificant whitespaces and marked up text with significant whitespaces. There is a special attribute named xml:space that can signal the meaning of whitespaces for the part of a document.

https://www.w3.org/TR/xml/#sec-white-space

2. In HTML whitespaces around <P> are insignificant, but whitespaces around <I> are significant. Whitespaces inside <PRE>...</PRE> are significant.

3. If strip whitespaces around tags and insert newlines and indentations, shouldn't we strip whitespaces inside the text context? Or preserve newlines but update indentations?

4. If modify whitespaces on output, it may be worth to add an option to ignore insignificant whitespaces on input.

5. Serialization of ElementTree in the stdlib is much slower than in lxml (see issue25881). Perhaps it should be implemented in C. But it should be kept simple for this. Pretty-printing can be implemented as an outher preprocessing operation (for example the original Eli's code indents the tree in-place: http://effbot.org/zone/element-lib.htm#prettyprint) or as a proxy that indents elements on-fly.
History
Date User Action Args
2017-10-24 08:46:06serhiy.storchakasetrecipients: + serhiy.storchaka, loewis, scoder, vstinner, mcepl, eric.araujo, eli.bendersky, santoso.wijaya, tshepang, eric.snow, martin.panter, alex.henderson, wolma, alex.dzyoba
2017-10-24 08:46:06serhiy.storchakasetmessageid: <1508834766.56.0.213398074469.issue14465@psf.upfronthosting.co.za>
2017-10-24 08:46:06serhiy.storchakalinkissue14465 messages
2017-10-24 08:46:06serhiy.storchakacreate