classification
Title: minidom does not encode correctly when calling Document.writexml
Type: behavior Stage: resolved
Components: Documentation, XML Versions: Python 3.8, Python 3.7
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: docs@python Nosy List: Windson Yang, brianvanderburg2, docs@python, ezio.melotti, scoder, serhiy.storchaka, upendra-k14
Priority: normal Keywords: easy, patch

Created on 2013-09-03 05:36 by brianvanderburg2, last changed 2019-06-01 07:00 by scoder. This issue is now closed.

Pull Requests
URL Status Linked Edit
PR 13352 merged Windson Yang, 2019-05-16 03:11
PR 13718 merged miss-islington, 2019-06-01 06:40
Messages (9)
msg196824 - (view) Author: Brian Vanderburg (brianvanderburg2) Date: 2013-09-03 05:36
When I have unicode data to save, it seems that it does not save correctly, giving an encode error. I know this exists on 2.7 and from checking the code in xml/dom/minidom.py it looks like it does in 3.2 as well.

The method call that seem to be problematic is doc.writexml(open(filename, "wb"), "", "  ", "utf-8")

Currently I found this to work: doc.writexml(codecs.open(filename, "w", "utf-8"), "", "  ", "utf-8")

It seems like this should be handled by the writexml method since it already has the specified encoding.
msg196836 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2013-09-03 10:24
On Python 3 you should not only open file in text mode with specified encoding, but also specify the "xmlcharrefreplace" error handler.

    doc.writexml(open(filename, "w", encoding="utf-8", errors="xmlcharrefreplace"), "", "  ", "utf-8")

I can suggest only one solution -- explicitly document this behavior.

Perhaps we also should add a special module level function for writing DOM tree to binary file. Low-level writexml() should not be used directly.
msg257253 - (view) Author: Upendra Kumar (upendra-k14) * Date: 2015-12-31 10:29
I am trying to resolve a issue for the first time. Can anybody please tell me or elaborate what is "module level function" specifically in this context.
msg257255 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2015-12-31 14:58
It means a function defined in the module namespace, as opposed to as a method on a class, so that 'from xml.dom.minidom import <somefunction>' will get you that function.

This issue should be for documentation of the problem, since we won't add the function to 2.7.  A new issue should be opened for the enhancement request of adding a module level convenience function for writing a dom out to a binary file.
msg257827 - (view) Author: Ezio Melotti (ezio.melotti) * (Python committer) Date: 2016-01-09 13:29
> On Python 3 you should not only open file in text mode with specified
> encoding, but also specify the "xmlcharrefreplace" error handler.

Isn't this only required in case there are non encodable characters?
If the encoding is utf-8, this shouldn't be necessary (unless there are lone surrogates).  Specifying xmlcharrefreplace might be useful while using ascii or latin1 though.

The docs of writexml don't seem to specify if the file should be opened in text or binary mode but istm that only text mode is supported.  The advice of using xmlcharrefreplace could be added in a note.
msg342621 - (view) Author: Windson Yang (Windson Yang) * Date: 2019-05-16 03:12
I added a PR for like this:

   .. note::

      You should specify the "xmlcharrefreplace" error handler when open a file with
      specified encoding::

         writer = open(
                filename, "w", encoding="utf-8",
                errors="xmlcharrefreplace")
         doc.writexml(writer, "", "  ", "utf-8")
msg344061 - (view) Author: Stefan Behnel (scoder) * (Python committer) Date: 2019-05-31 10:58
Asking users unconditionally to use the "xmlcharrefreplace" replacement method seems wrong for UTF-8. It should not be necessary.

We should, however, document explicitly that the file will receive text and not bytes, i.e. that users are themselves responsible for opening the output file with the desired encoding. We should also make it clearer that the "encoding" argument to writexml() does not change that.
msg344152 - (view) Author: Stefan Behnel (scoder) * (Python committer) Date: 2019-06-01 06:33
New changeset 5ac0b988fd5f1428efe35329c531c7b5c74d37f6 by Stefan Behnel (Windson yang) in branch 'master':
bpo-18911: clarify that the minidom XML writer receives texts but not bytes (GH-13352)
https://github.com/python/cpython/commit/5ac0b988fd5f1428efe35329c531c7b5c74d37f6
msg344153 - (view) Author: Stefan Behnel (scoder) * (Python committer) Date: 2019-06-01 06:58
New changeset 18e23f227be59241cbb1eeb6d6669771dd7275fb by Stefan Behnel (Miss Islington (bot)) in branch '3.7':
bpo-18911: clarify that the minidom XML writer receives texts but not bytes (GH-13718)
https://github.com/python/cpython/commit/18e23f227be59241cbb1eeb6d6669771dd7275fb
History
Date User Action Args
2019-06-01 07:00:02scodersetstatus: open -> closed
resolution: fixed
stage: patch review -> resolved
2019-06-01 06:58:57scodersetmessages: + msg344153
2019-06-01 06:40:04miss-islingtonsetstage: backport needed -> patch review
pull_requests: + pull_request13605
2019-06-01 06:36:41scodersetstatus: closed -> open
stage: resolved -> backport needed
resolution: fixed -> (no value)
versions: + Python 3.7
2019-06-01 06:34:15scodersetstatus: open -> closed
stage: patch review -> resolved
resolution: fixed
versions: + Python 3.8, - Python 2.7, Python 3.5, Python 3.6
2019-06-01 06:33:25scodersetmessages: + msg344152
2019-05-31 10:58:06scodersetmessages: + msg344061
2019-05-16 03:12:19Windson Yangsetnosy: + Windson Yang
messages: + msg342621
2019-05-16 03:11:47Windson Yangsetkeywords: + patch
stage: needs patch -> patch review
pull_requests: + pull_request13263
2016-01-09 20:19:22r.david.murraysetnosy: - r.david.murray
2016-01-09 13:29:56ezio.melottisetnosy: + ezio.melotti

messages: + msg257827
versions: - Python 3.4
2015-12-31 14:58:02r.david.murraysetnosy: + r.david.murray
messages: + msg257255
2015-12-31 10:29:19upendra-k14setnosy: + upendra-k14
messages: + msg257253
2015-11-26 17:53:58serhiy.storchakasetkeywords: + easy
stage: needs patch
versions: + Python 3.5, Python 3.6, - Python 3.3
2013-09-13 12:33:24eli.benderskysetnosy: - eli.bendersky
2013-09-11 19:56:42serhiy.storchakasetassignee: docs@python

components: + Documentation
nosy: + docs@python
2013-09-03 10:24:14serhiy.storchakasetnosy: + serhiy.storchaka, scoder, eli.bendersky

messages: + msg196836
versions: + Python 3.3, Python 3.4
2013-09-03 05:36:59brianvanderburg2create