classification
Title: Improve 19.5. xml.dom.minidom doc
Type: Stage:
Components: Documentation Versions: Python 3.1, Python 3.2
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: docs@python Nosy List: BreamoreBoy, akuchling, docs@python, georg.brandl, terry.reedy
Priority: normal Keywords:

Created on 2010-01-05 02:43 by terry.reedy, last changed 2010-07-26 12:54 by akuchling. This issue is now closed.

Messages (7)
msg97244 - (view) Author: Terry J. Reedy (terry.reedy) * (Python committer) Date: 2010-01-05 02:43
1. "When you are finished with a DOM, you should clean it up. This is necessary because some versions of Python do not support garbage collection of objects that refer to each other in a cycle. Until this restriction is removed from all versions of Python, it is safest to write your code as if cycles would not be cleaned up."

This appears to refer to early 2.x CPython versions without the gc module. Such (cryptic) back references are not appropriate for 3.x docs. Even in 3.x, immediate unlink might be a good idea, especially for CPython (which would then clean up immediately). But none of these issues are specific to DOM objects. Suggested replacement for the above and the current next sentence ("The way to clean up a DOM is to call its unlink() method:")

"When you are finished with a DOM, you can call the unlink method to encourage early cleanup of unneeded objects:"

Anything more is redundant with the doc for the method.
'''
dom1.unlink()
dom2.unlink()
dom3.unlink()
'''
One example at most is quite sufficient.

2. '''Node.toxml([encoding]) 
Return the XML that the DOM represents as a string.

With no argument, the XML header does not specify an encoding, and the result is Unicode string if the default encoding cannot represent all characters in the document. Encoding this string in an encoding other than UTF-8 is likely incorrect, since UTF-8 is the default encoding of XML.

With an explicit encoding [1] argument, the result is a byte string in the specified encoding. It is recommended that this argument is always specified. To avoid UnicodeError exceptions in case of unrepresentable text data, the encoding argument should be specified as “utf-8”.
'''
I find this API a bit confusing.

In 3.x, "Return ... a string." means str (unicode), but the rest implies that 'string' should be 'string or bytes'.

"default encoding": what is it? ascii, utf-8 as almost implied, something in sys module (if so, please specify).

A cleaner API would have been 1. always return str (unicode) or 2. always return bytes, with encoding='utf-i' default or 3. return str if no encoding given or bytes if one is given, with no default.

3. Revision of following antipattern example would be for 2.x also:
'''
def getText(nodelist):
    rc = ""
    for node in nodelist:
        if node.nodeType == node.TEXT_NODE:
            rc = rc + node.data
    return rc
'''
should be (not tested, but pretty straightforward)

def getText(nodelist):
    rc = []
    for node in nodelist:
        if node.nodeType == node.TEXT_NODE:
            rc.append(node.data)
    return ''.join(rc)
msg100262 - (view) Author: A.M. Kuchling (akuchling) * (Python committer) Date: 2010-03-01 19:45
1) changes made to 2.7trunk in rev78559.
msg100267 - (view) Author: A.M. Kuchling (akuchling) * (Python committer) Date: 2010-03-01 20:12
3) change made to 2.7trunk in rev.78562.
msg111546 - (view) Author: Mark Lawrence (BreamoreBoy) * Date: 2010-07-25 13:17
Items 1) and 3) have been committed, only 2) needs to be addressed.
msg111573 - (view) Author: A.M. Kuchling (akuchling) * (Python committer) Date: 2010-07-25 23:24
2) changed in rev83151.  I extensively rearranged the description of toxml(), hopefully making its meaning clearer.
msg111585 - (view) Author: Terry J. Reedy (terry.reedy) * (Python committer) Date: 2010-07-26 00:12
Thank you for the patches, but I do not think this is quite done.

1. "It is recommended that you always specify an encoding; you may use any encoding you like, but an argument of "utf-8" is the most common, avoid :exc:`UnicodeError` exceptions in case of unrepresentable text data."
The phrase after the comma is garbled. I think it means something like "It avoids :exc:`UnicodeError` exceptions for unrepresentable text data."

2. For Node.toprettyxml(indent="", newl="", encoding="")
I think 
"There's also an *encoding* argument, that behaves like the corresponding argument of :meth:`toxml`."
should simply say 
"The ``encoding`` argument behaves like the corresponding argument of :meth:`toxml`."

We already know there is one because it is there in the signature. I suspect saying so might date back to when there either was no signature or encoding was left out of it.
msg111606 - (view) Author: A.M. Kuchling (akuchling) * (Python committer) Date: 2010-07-26 12:54
Thanks for the catch; it was intended to be ', avoiding ...'.  Fixed in rev83162, along with the sentence simplification you suggest.
History
Date User Action Args
2010-07-26 12:54:51akuchlingsetmessages: + msg111606
2010-07-26 00:12:27terry.reedysetmessages: + msg111585
2010-07-25 23:24:32akuchlingsetstatus: open -> closed
resolution: fixed
messages: + msg111573
2010-07-25 13:17:06BreamoreBoysetassignee: georg.brandl -> docs@python

messages: + msg111546
nosy: + docs@python, BreamoreBoy
2010-03-01 20:12:10akuchlingsetmessages: + msg100267
2010-03-01 19:45:42akuchlingsetnosy: + akuchling
messages: + msg100262
2010-01-05 02:43:52terry.reedycreate