This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: Minidom can't create ASCII representation
Type: behavior Stage: patch review
Components: Library (Lib), Unicode, XML Versions: Python 3.2, Python 3.3, Python 2.7
process
Status: closed Resolution:
Dependencies: Superseder:
Assigned To: Nosy List: eli.bendersky, ezio.melotti, python-dev, serhiy.storchaka
Priority: normal Keywords: patch

Created on 2012-07-08 15:31 by serhiy.storchaka, last changed 2022-04-11 14:57 by admin. This issue is now closed.

Files
File name Uploaded Description Edit
minidom_toxml_encoding.patch serhiy.storchaka, 2012-07-08 15:31 review
Messages (5)
msg165020 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2012-07-08 15:31
Minidom can parse ASCII-encoded XML data, but can't create it.

>>> from xml.dom.minidom import parseString
>>> doc = parseString(b'<?xml version="1.0" encoding="us-ascii"?><foo>&#x20ac;</foo>')
>>> doc.toxml('us-ascii')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/serhiy/py/cpython/Lib/xml/dom/minidom.py", line 47, in toxml
    return self.toprettyxml("", "", encoding)
  File "/home/serhiy/py/cpython/Lib/xml/dom/minidom.py", line 56, in toprettyxml
    self.writexml(writer, "", indent, newl, encoding)
  File "/home/serhiy/py/cpython/Lib/xml/dom/minidom.py", line 1798, in writexml
    node.writexml(writer, indent, addindent, newl)
  File "/home/serhiy/py/cpython/Lib/xml/dom/minidom.py", line 868, in writexml
    self.childNodes[0].writexml(writer, '', '', '')
  File "/home/serhiy/py/cpython/Lib/xml/dom/minidom.py", line 1090, in writexml
    _write_data(writer, "%s%s%s" % (indent, self.data, newl))
  File "/home/serhiy/py/cpython/Lib/xml/dom/minidom.py", line 304, in _write_data
    writer.write(data)
  File "/home/serhiy/py/cpython/Lib/codecs.py", line 355, in write
    data, consumed = self.encode(object, self.errors)
UnicodeEncodeError: 'ascii' codec can't encode character '\u20ac' in position 0: ordinal not in range(128)

Same for other non-unicode encodings.

Suggested simple patch solves this issue.
msg165353 - (view) Author: Eli Bendersky (eli.bendersky) * (Python committer) Date: 2012-07-13 04:32
Serhiy - why did you remove that documentation bit?
msg165358 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2012-07-13 06:27
> Serhiy - why did you remove that documentation bit?

Because it's not relevant anymore. With patch you will never get
UnicodeError exceptions in case of unrepresentable text data.
msg165361 - (view) Author: Roundup Robot (python-dev) (Python triager) Date: 2012-07-13 06:52
New changeset 7b97cea795d8 by Eli Bendersky in branch 'default':
Issue #15296: Fix minidom.toxml/toprettyxml for non-unicode encodings.  Patch by Serhiy Storchaka, with some minor style adjustments by me.
http://hg.python.org/cpython/rev/7b97cea795d8
msg165362 - (view) Author: Eli Bendersky (eli.bendersky) * (Python committer) Date: 2012-07-13 06:53
Fixed in 3.3

Thanks for the patch
History
Date User Action Args
2022-04-11 14:57:32adminsetgithub: 59501
2012-07-13 06:53:33eli.benderskysetstatus: open -> closed

messages: + msg165362
2012-07-13 06:52:55python-devsetnosy: + python-dev
messages: + msg165361
2012-07-13 06:27:23serhiy.storchakasetmessages: + msg165358
2012-07-13 04:32:44eli.benderskysetmessages: + msg165353
2012-07-08 20:36:21pitrousetnosy: + eli.bendersky

stage: patch review
2012-07-08 15:31:03serhiy.storchakacreate