classification
Title: minidom does not encode correctly when calling Document.writexml
Type: behavior Stage: needs patch
Components: Documentation, XML Versions: Python 3.6, Python 3.5, Python 2.7
process
Status: open Resolution:
Dependencies: Superseder:
Assigned To: docs@python Nosy List: brianvanderburg2, docs@python, ezio.melotti, scoder, serhiy.storchaka, upendra-k14
Priority: normal Keywords: easy

Created on 2013-09-03 05:36 by brianvanderburg2, last changed 2016-01-09 20:19 by r.david.murray.

Messages (5)
msg196824 - (view) Author: Brian Vanderburg (brianvanderburg2) Date: 2013-09-03 05:36
When I have unicode data to save, it seems that it does not save correctly, giving an encode error. I know this exists on 2.7 and from checking the code in xml/dom/minidom.py it looks like it does in 3.2 as well.

The method call that seem to be problematic is doc.writexml(open(filename, "wb"), "", "  ", "utf-8")

Currently I found this to work: doc.writexml(codecs.open(filename, "w", "utf-8"), "", "  ", "utf-8")

It seems like this should be handled by the writexml method since it already has the specified encoding.
msg196836 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2013-09-03 10:24
On Python 3 you should not only open file in text mode with specified encoding, but also specify the "xmlcharrefreplace" error handler.

    doc.writexml(open(filename, "w", encoding="utf-8", errors="xmlcharrefreplace"), "", "  ", "utf-8")

I can suggest only one solution -- explicitly document this behavior.

Perhaps we also should add a special module level function for writing DOM tree to binary file. Low-level writexml() should not be used directly.
msg257253 - (view) Author: Upendra Kumar (upendra-k14) * Date: 2015-12-31 10:29
I am trying to resolve a issue for the first time. Can anybody please tell me or elaborate what is "module level function" specifically in this context.
msg257255 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2015-12-31 14:58
It means a function defined in the module namespace, as opposed to as a method on a class, so that 'from xml.dom.minidom import <somefunction>' will get you that function.

This issue should be for documentation of the problem, since we won't add the function to 2.7.  A new issue should be opened for the enhancement request of adding a module level convenience function for writing a dom out to a binary file.
msg257827 - (view) Author: Ezio Melotti (ezio.melotti) * (Python committer) Date: 2016-01-09 13:29
> On Python 3 you should not only open file in text mode with specified
> encoding, but also specify the "xmlcharrefreplace" error handler.

Isn't this only required in case there are non encodable characters?
If the encoding is utf-8, this shouldn't be necessary (unless there are lone surrogates).  Specifying xmlcharrefreplace might be useful while using ascii or latin1 though.

The docs of writexml don't seem to specify if the file should be opened in text or binary mode but istm that only text mode is supported.  The advice of using xmlcharrefreplace could be added in a note.
History
Date User Action Args
2016-01-09 20:19:22r.david.murraysetnosy: - r.david.murray
2016-01-09 13:29:56ezio.melottisetnosy: + ezio.melotti

messages: + msg257827
versions: - Python 3.4
2015-12-31 14:58:02r.david.murraysetnosy: + r.david.murray
messages: + msg257255
2015-12-31 10:29:19upendra-k14setnosy: + upendra-k14
messages: + msg257253
2015-11-26 17:53:58serhiy.storchakasetkeywords: + easy
stage: needs patch
versions: + Python 3.5, Python 3.6, - Python 3.3
2013-09-13 12:33:24eli.benderskysetnosy: - eli.bendersky
2013-09-11 19:56:42serhiy.storchakasetassignee: docs@python

components: + Documentation
nosy: + docs@python
2013-09-03 10:24:14serhiy.storchakasetnosy: + serhiy.storchaka, scoder, eli.bendersky

messages: + msg196836
versions: + Python 3.3, Python 3.4
2013-09-03 05:36:59brianvanderburg2create