minidom does not encode correctly when calling Document.writexml #63111

brianvanderburg2 · 2013-09-03T05:37:00Z

BPO	18911
Nosy	@scoder, @ezio-melotti, @serhiy-storchaka, @Windsooon
PRs	bpo-18911: using xmlcharrefreplace when open a file #13352 [3.7] bpo-18911: clarify that the minidom XML writer receives texts but not bytes (GH-13352) #13718

^{Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.}

Show more details

GitHub fields:

assignee = None
closed_at = <Date 2019-06-01.07:00:02.383>
created_at = <Date 2013-09-03.05:36:59.560>
labels = ['easy', 'type-bug', '3.8', 'expert-XML', '3.7', 'docs']
title = 'minidom does not encode correctly when calling Document.writexml'
updated_at = <Date 2019-06-01.07:00:02.382>
user = 'https://bugs.python.org/brianvanderburg2'

bugs.python.org fields:

activity = <Date 2019-06-01.07:00:02.382>
actor = 'scoder'
assignee = 'docs@python'
closed = True
closed_date = <Date 2019-06-01.07:00:02.383>
closer = 'scoder'
components = ['Documentation', 'XML']
creation = <Date 2013-09-03.05:36:59.560>
creator = 'brianvanderburg2'
dependencies = []
files = []
hgrepos = []
issue_num = 18911
keywords = ['patch', 'easy']
message_count = 9.0
messages = ['196824', '196836', '257253', '257255', '257827', '342621', '344061', '344152', '344153']
nosy_count = 7.0
nosy_names = ['scoder', 'ezio.melotti', 'docs@python', 'serhiy.storchaka', 'brianvanderburg2', 'upendra-k14', 'Windson Yang']
pr_nums = ['13352', '13718']
priority = 'normal'
resolution = 'fixed'
stage = 'resolved'
status = 'closed'
superseder = None
type = 'behavior'
url = 'https://bugs.python.org/issue18911'
versions = ['Python 3.7', 'Python 3.8']

brianvanderburg2 · 2013-09-03T05:36:59Z

When I have unicode data to save, it seems that it does not save correctly, giving an encode error. I know this exists on 2.7 and from checking the code in xml/dom/minidom.py it looks like it does in 3.2 as well.

The method call that seem to be problematic is doc.writexml(open(filename, "wb"), "", " ", "utf-8")

Currently I found this to work: doc.writexml(codecs.open(filename, "w", "utf-8"), "", " ", "utf-8")

It seems like this should be handled by the writexml method since it already has the specified encoding.

serhiy-storchaka · 2013-09-03T10:24:15Z

On Python 3 you should not only open file in text mode with specified encoding, but also specify the "xmlcharrefreplace" error handler.

doc.writexml(open(filename, "w", encoding="utf-8", errors="xmlcharrefreplace"), "", "  ", "utf-8")

I can suggest only one solution -- explicitly document this behavior.

Perhaps we also should add a special module level function for writing DOM tree to binary file. Low-level writexml() should not be used directly.

upendra-k14 · 2015-12-31T10:29:19Z

I am trying to resolve a issue for the first time. Can anybody please tell me or elaborate what is "module level function" specifically in this context.

bitdancer · 2015-12-31T14:58:02Z

It means a function defined in the module namespace, as opposed to as a method on a class, so that 'from xml.dom.minidom import <somefunction>' will get you that function.

This issue should be for documentation of the problem, since we won't add the function to 2.7. A new issue should be opened for the enhancement request of adding a module level convenience function for writing a dom out to a binary file.

ezio-melotti · 2016-01-09T13:29:56Z

On Python 3 you should not only open file in text mode with specified
encoding, but also specify the "xmlcharrefreplace" error handler.

Isn't this only required in case there are non encodable characters?
If the encoding is utf-8, this shouldn't be necessary (unless there are lone surrogates). Specifying xmlcharrefreplace might be useful while using ascii or latin1 though.

The docs of writexml don't seem to specify if the file should be opened in text or binary mode but istm that only text mode is supported. The advice of using xmlcharrefreplace could be added in a note.

Windsooon · 2019-05-16T03:12:19Z

I added a PR for like this:

.. note::

  You should specify the "xmlcharrefreplace" error handler when open a file with
  specified encoding::

         writer = open(
                filename, "w", encoding="utf-8",
                errors="xmlcharrefreplace")
         doc.writexml(writer, "", "  ", "utf-8")

scoder · 2019-05-31T10:58:06Z

Asking users unconditionally to use the "xmlcharrefreplace" replacement method seems wrong for UTF-8. It should not be necessary.

We should, however, document explicitly that the file will receive text and not bytes, i.e. that users are themselves responsible for opening the output file with the desired encoding. We should also make it clearer that the "encoding" argument to writexml() does not change that.

scoder · 2019-06-01T06:33:25Z

New changeset 5ac0b98 by Stefan Behnel (Windson yang) in branch 'master':
bpo-18911: clarify that the minidom XML writer receives texts but not bytes (GH-13352)
5ac0b98

scoder · 2019-06-01T06:58:57Z

New changeset 18e23f2 by Stefan Behnel (Miss Islington (bot)) in branch '3.7':
bpo-18911: clarify that the minidom XML writer receives texts but not bytes (GH-13718)
18e23f2

brianvanderburg2 mannequin added topic-XML type-bug An unexpected behavior, bug, or error labels Sep 3, 2013

serhiy-storchaka added the docs Documentation in the Doc dir label Sep 11, 2013

serhiy-storchaka assigned docspython Sep 11, 2013

serhiy-storchaka added the easy label Nov 26, 2015

scoder added the 3.8 only security fixes label Jun 1, 2019

scoder closed this as completed Jun 1, 2019

scoder added the 3.7 (EOL) end of life label Jun 1, 2019

scoder reopened this Jun 1, 2019

scoder closed this as completed Jun 1, 2019

ezio-melotti transferred this issue from another repository Apr 10, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

minidom does not encode correctly when calling Document.writexml #63111

minidom does not encode correctly when calling Document.writexml #63111

brianvanderburg2 mannequin commented Sep 3, 2013

brianvanderburg2 mannequin commented Sep 3, 2013

serhiy-storchaka commented Sep 3, 2013

upendra-k14 mannequin commented Dec 31, 2015

bitdancer commented Dec 31, 2015

ezio-melotti commented Jan 9, 2016

Windsooon mannequin commented May 16, 2019

scoder commented May 31, 2019

scoder commented Jun 1, 2019

scoder commented Jun 1, 2019

minidom does not encode correctly when calling Document.writexml #63111

minidom does not encode correctly when calling Document.writexml #63111

Comments

brianvanderburg2 mannequin commented Sep 3, 2013

brianvanderburg2 mannequin commented Sep 3, 2013

serhiy-storchaka commented Sep 3, 2013

upendra-k14 mannequin commented Dec 31, 2015

bitdancer commented Dec 31, 2015

ezio-melotti commented Jan 9, 2016

Windsooon mannequin commented May 16, 2019

scoder commented May 31, 2019

scoder commented Jun 1, 2019

scoder commented Jun 1, 2019