classification
Title: Add xml.tool to pretty print XML like json.tool
Type: enhancement Stage: needs patch
Components: Library (Lib) Versions: Python 3.9
process
Status: open Resolution:
Dependencies: Superseder:
Assigned To: Nosy List: eli.bendersky, rhettinger, scoder, serhiy.storchaka, xtreak
Priority: normal Keywords:

Created on 2019-08-24 14:55 by xtreak, last changed 2019-08-25 13:24 by xtreak.

Messages (6)
msg350372 - (view) Author: Karthikeyan Singaravelan (xtreak) * (Python triager) Date: 2019-08-24 14:55
Now that XML has pretty print option with issue14465 would it be handy to add a command line tool pretty printer similar to json.tool? This can be written as one-liner similar to json pretty printing but I think it's a good option and having a command line tool also helps in piping the output to other commands like filtering particular tags. I tried searching mailing list and couldn't find any discussions along these lines. There were some concerns around using external tools and  in https://bugs.python.org/issue14465#msg324098 . I thought to open this to gather feedback.

Branch : https://github.com/tirkarthi/cpython/tree/bpo14465-xml-tool


python -m xml.tool /tmp/person.xml
<root>
  <person name="Kate">
    <breakfast>Idly</breakfast>
  </person>
  <person name="John">
    <breakfast>Dosa</breakfast>
  </person>
</root>

# Get all breakfast tags

python -m xml.tool /tmp/person.xml | grep breakfast
    <breakfast>Idly</breakfast>
    <breakfast>Dosa</breakfast>
msg350377 - (view) Author: Stefan Behnel (scoder) * (Python committer) Date: 2019-08-24 16:53
Sounds like a good idea to add something like this.

Have a look here for some more ideas:
https://github.com/lxml/lxml/blob/master/tools/xpathgrep.py

ElementTree should be able to provide most of these features as well these days.
msg350440 - (view) Author: Karthikeyan Singaravelan (xtreak) * (Python triager) Date: 2019-08-25 08:06
Thanks Stefan for the link. XPath support sounds cool to me given that there is already support in stdlib. It could help with filtering using xml.tool itself instead of passing the output to another command to filter. My initial approach was to take it from command line --xpath argument and apply it to root node to pretty print the elements that match the XPath query. I have pushed the xpath changes also to https://github.com/tirkarthi/cpython/tree/bpo14465-xml-tool. I will try to add docstrings with xpath examples and tests to raise a PR for this.

# Sample XML

$ python -m xml.tool /tmp/person.xml
<root>
  <person name="Kate">
    <breakfast available="true">Idly</breakfast>
  </person>
  <person name="John">
    <breakfast available="false">Dosa</breakfast>
  </person>
</root>

# Select person with name as Kate

$ python -m xml.tool --xpath './person[@name="Kate"]' /tmp/person.xml
<person name="Kate">
  <breakfast available="true">Idly</breakfast>
</person>

# Get all unavailable breakfast items

python -m xml.tool --xpath './/breakfast[@available="false"]' /tmp/person.xml
<breakfast available="false">Dosa</breakfast>


It could also mask the traceback to return error when the XPath is invalid and raises exception.


# Error messages

$ python -m xml.tool --xpath './person/[breakfast='Dosa']' /tmp/person.xml
invalid predicate
$ python -m xml.tool --xpath './/[breakfast=Dosa]' /tmp/person.xml
invalid descendant
msg350441 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2019-08-25 08:42
I don't think this should be done:

* Guido didn't want Python to grow into a collection
  of command-line tools
* Browsers like Chrome already provide XML viewers
* If you pretty print JSON, you don't change its meaning,
  but for XML, it adds "text" and "tail" at non-leaf nodes.
  And if leaf text is indented and/or line-wrapped, that
  also changes the stored values.
msg350445 - (view) Author: Stefan Behnel (scoder) * (Python committer) Date: 2019-08-25 09:52
I agree that formatting is not a use case by itself. I like the idea of XPath grepping, though, especially *without* pretty printing, i.e. one result per line.

I admit that there is no strong reason for adding such a command line tool to the stdlib, though.
msg350454 - (view) Author: Karthikeyan Singaravelan (xtreak) * (Python triager) Date: 2019-08-25 13:24
There are several modules that expose some of their uses through command line like json.tool, zipfile, tarfile, gzip, webbrowser etc. The initial proposal was to expose the newly added indent function over the command line to provide the same guarantees and semantics. The lxml link lead me to have xpath search looks more useful to me. I understand that there was always discussion over writing few lines of Python code to do the task and to achieve it via command line. Recent addition were around 

* --json-lines added to json.tool in issue31553 
* Add --fast, --best to gzip CLI in issue34969

There were similar discussion where improvements were merged on a case by case basis as seen to be a good use case. Some where more on the side of rejection like --indent to specify indentation length for json.tool in issue29636. There was no xml.tool in the past so there is more consideration to this. I see it good that xml also can expose some of its tasks via command line and not to be left just because it never had a command line interface from the start. The command line API also exposes only the functions already present so I see the maintenance cost to be minimal with indent and xpath search in this case. I will leave it to you as per the examples and use cases mentioned. If it needs a wider discussion on posting to python-ideas/discourse I would be okay to start a thread .
History
Date User Action Args
2019-08-25 13:24:16xtreaksetmessages: + msg350454
2019-08-25 09:52:41scodersetmessages: + msg350445
2019-08-25 08:42:20rhettingersetmessages: + msg350441
2019-08-25 08:06:22xtreaksetmessages: + msg350440
2019-08-24 16:53:51scodersetmessages: + msg350377
stage: needs patch
2019-08-24 14:55:33xtreakcreate