Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

xml.etree.ElementTree: add feature to prettify XML output #58670

Closed
tshepang mannequin opened this issue Apr 1, 2012 · 16 comments
Closed

xml.etree.ElementTree: add feature to prettify XML output #58670

tshepang mannequin opened this issue Apr 1, 2012 · 16 comments
Assignees
Labels
3.9 only security fixes stdlib Python modules in the Lib dir topic-XML type-feature A feature request or enhancement

Comments

@tshepang
Copy link
Mannequin

tshepang mannequin commented Apr 1, 2012

BPO 14465
Nosy @loewis, @rhettinger, @scoder, @mcepl, @merwok, @mitar, @ericsnowcurrently, @vadmium, @serhiy-storchaka, @wm75, @dzeban, @agrant3d
PRs
  • bpo-14465: xml.etree.ElementTree pretty printing #4016
  • bpo-14465: Provide simple prett printing for XML and ETree API #8933
  • bpo-14465: Add an indent() function to xml.etree.ElementTree to pretty-print XML trees #15200
  • Files
  • issue14465.patch: pretty printer patch, as implemented for issue 17372.
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = 'https://github.com/scoder'
    closed_at = <Date 2019-08-23.14:45:28.917>
    created_at = <Date 2012-04-01.15:28:03.443>
    labels = ['expert-XML', 'type-feature', 'library', '3.9']
    title = 'xml.etree.ElementTree: add feature to prettify XML output'
    updated_at = <Date 2019-08-23.14:45:28.916>
    user = 'https://bugs.python.org/tshepang'

    bugs.python.org fields:

    activity = <Date 2019-08-23.14:45:28.916>
    actor = 'scoder'
    assignee = 'scoder'
    closed = True
    closed_date = <Date 2019-08-23.14:45:28.917>
    closer = 'scoder'
    components = ['Library (Lib)', 'XML']
    creation = <Date 2012-04-01.15:28:03.443>
    creator = 'tshepang'
    dependencies = []
    files = ['31168']
    hgrepos = []
    issue_num = 14465
    keywords = ['patch']
    message_count = 16.0
    messages = ['157299', '157317', '157320', '157325', '157647', '194313', '194508', '194902', '304362', '304872', '323690', '324098', '335306', '349326', '349346', '350301']
    nosy_count = 17.0
    nosy_names = ['loewis', 'rhettinger', 'scoder', 'mcepl', 'eric.araujo', 'eli.bendersky', 'mitar', 'santoso.wijaya', 'tshepang', 'eric.snow', 'martin.panter', 'serhiy.storchaka', 'alex.henderson', 'wolma', 'alex.dzyoba', 'Clayton Olney', 'Andrew Grant']
    pr_nums = ['4016', '8933', '15200']
    priority = 'normal'
    resolution = 'fixed'
    stage = 'resolved'
    status = 'closed'
    superseder = None
    type = 'enhancement'
    url = 'https://bugs.python.org/issue14465'
    versions = ['Python 3.9']

    @tshepang
    Copy link
    Mannequin Author

    tshepang mannequin commented Apr 1, 2012

    I often miss lxml's "pretty_print=True" functionality. Can you implement something similar.

    @tshepang tshepang mannequin added the stdlib Python modules in the Lib dir label Apr 1, 2012
    @tshepang tshepang mannequin changed the title add feature to prettify XML output xml.etree.ElementTree: add feature to prettify XML output Apr 1, 2012
    @bitdancer bitdancer added the type-feature A feature request or enhancement label Apr 1, 2012
    @loewis
    Copy link
    Mannequin

    loewis mannequin commented Apr 1, 2012

    Would you like to provide a patch?

    @elibendersky
    Copy link
    Mannequin

    elibendersky mannequin commented Apr 1, 2012

    Tshepang,

    Frankly, there are a lot of issues to solve in ElementTree (it hasn't been given love in a long time...) and such features would be low priority, as I'm not getting much help and am swamped already.

    As Martin said, patches can go a long way here...

    @tshepang
    Copy link
    Mannequin Author

    tshepang mannequin commented Apr 1, 2012

    Okay, I will try, even though C scares me.

    @merwok
    Copy link
    Member

    merwok commented Apr 6, 2012

    You may be able to code it entirely in the Python part of the module (adding a new parameter to Element.write and tostring).

    @elibendersky
    Copy link
    Mannequin

    elibendersky mannequin commented Aug 3, 2013

    A patch exists in the duplicate bpo-17372

    @alexhenderson
    Copy link
    Mannequin

    alexhenderson mannequin commented Aug 5, 2013

    Proposed patch copied over from duplicate bpo-17372.

    @scoder
    Copy link
    Contributor

    scoder commented Aug 11, 2013

    Just to reiterate this point, lxml.etree supports a "pretty_print" flag in its tostring() function and ElementTree.write(). It would thus make sense to support the same thing in ET.

    http://lxml.de/api.html#serialisation

    For completeness, the current signature looks like this:

    def tostring(element_or_tree, *, encoding=None, method="xml",
                 xml_declaration=None, pretty_print=False,
                 with_tail=True, standalone=None, doctype=None,
                 exclusive=False, with_comments=True,
                 inclusive_ns_prefixes=None):

    (The last three options are for C14N serialisation.)

    @vstinner
    Copy link
    Member

    For the record, at 2015-04-02, the bpo-23847 has been marked as a duplicate of this issue.

    @serhiy-storchaka
    Copy link
    Member

    My thoughts:

    1. Whitespaces are significant in XML. Pretty-printed XML is different from the original XML to an XML parser. For some applications some whitespaces around tags are not significant. But this depends on the application and in different parts of the document whitespaces can have different meaning. For example the document can contain a metadata with insignificant whitespaces and marked up text with significant whitespaces. There is a special attribute named xml:space that can signal the meaning of whitespaces for the part of a document.

    https://www.w3.org/TR/xml/#sec-white-space

    1. In HTML whitespaces around <P> are insignificant, but whitespaces around <I> are significant. Whitespaces inside <PRE>...</PRE> are significant.

    2. If strip whitespaces around tags and insert newlines and indentations, shouldn't we strip whitespaces inside the text context? Or preserve newlines but update indentations?

    3. If modify whitespaces on output, it may be worth to add an option to ignore insignificant whitespaces on input.

    4. Serialization of ElementTree in the stdlib is much slower than in lxml (see bpo-25881). Perhaps it should be implemented in C. But it should be kept simple for this. Pretty-printing can be implemented as an outher preprocessing operation (for example the original Eli's code indents the tree in-place: http://effbot.org/zone/element-lib.htm#prettyprint) or as a proxy that indents elements on-fly.

    @mcepl mcepl mannequin added the 3.8 only security fixes label Aug 17, 2018
    @scoder
    Copy link
    Contributor

    scoder commented Aug 18, 2018

    Serialization of ElementTree in the stdlib is much slower than in lxml (see bpo-25881). Perhaps it should be implemented in C. But it should be kept simple for this.

    Should I say it? That's a first class use case for Cython.

    Pretty-printing can be implemented as an outher preprocessing operation

    Agreed. And that would actually be much simpler to implement in C.

    @rhettinger
    Copy link
    Contributor

    A few more thoughts for consideration:

    • We already have a toprettyxml() tool in the minidom package.

    • Since whitespace is significant in XML, prettifying changes the content and meaning, so it doesn't round-trip and should only be used for debugging purposes.

    • Usually, I recommend using XML viewers such as the one built into the Chrome browser. That provides indentation without changing meaning. It also lets you run searches and conveniently supports folding and unfolding elements. I would rather someone use a viewer rather than something like toprettyxml().

    @ClaytonOlney
    Copy link
    Mannequin

    ClaytonOlney mannequin commented Feb 12, 2019

    I have a use case where the receiving application is expecting the indentation, and I need to run my code in Lambda. So, lxml is out of the question.

    @rhettinger
    Copy link
    Contributor

    FWIW, here is the relevant section of the XML specification, https://www.w3.org/TR/2008/REC-xml-20081126/#sec-white-space :

    """In editing XML documents, it is often convenient to use "white space" (spaces, tabs, and blank lines) to set apart the markup for greater readability. Such white space is typically not intended for inclusion in the delivered version of the document. On the other hand, "significant" white space that should be preserved in the delivered version is common, for example in poetry and source code.

    An XML processor must always pass all characters in a document that are not markup through to the application. A validating XML processor must also inform the application which of these characters constitute white space appearing in element content.
    """

    OTOH, the java TransformerFactory does support a property, OutputKeys.INDENT, so there is a precedent for this feature request.

    Stefan, would you please make a final determination or pronouncement on whether this makes sense for ElementTree or whether it is outside the scope of what the module is trying to accomplish.

    @scoder
    Copy link
    Contributor

    scoder commented Aug 10, 2019

    The spec section that Raymond quoted makes it clear that pretty printing is not for everyone. But there are many use cases where it is 1) helpful, 2) leads to correct results, and 3) does not grow the file size excessively. Whoever wants to make use of it is probably in such a situation. I think adding some kind of support in the standard library would be nice, but it should not hurt "normal" uses, especially when a lot of data is involved.

    I'll send a PR that adds an indent() function to pre-process trees. Comments welcome.

    @scoder scoder added 3.9 only security fixes and removed 3.7 (EOL) end of life 3.8 only security fixes labels Aug 10, 2019
    @scoder scoder assigned scoder and unassigned serhiy-storchaka Aug 10, 2019
    @scoder
    Copy link
    Contributor

    scoder commented Aug 23, 2019

    New changeset b5d3cee by Stefan Behnel in branch 'master':
    bpo-14465: Add an indent() function to xml.etree.ElementTree to pretty-print XML trees (GH-15200)
    b5d3cee

    @scoder scoder closed this as completed Aug 23, 2019
    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    3.9 only security fixes stdlib Python modules in the Lib dir topic-XML type-feature A feature request or enhancement
    Projects
    None yet
    Development

    No branches or pull requests

    6 participants