This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: xml.sax.saxutils.XMLGenerator cannot output UTF-16
Type: behavior Stage: resolved
Components: XML Versions: Python 3.2, Python 3.3, Python 3.4, Python 2.7
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: serhiy.storchaka Nosy List: Arfrever, BreamoreBoy, benjamin.peterson, doerwalter, georg.brandl, larry, loewis, neoecos, ngrig, pitrou, python-dev, serhiy.storchaka
Priority: release blocker Keywords: needs review, patch

Created on 2006-04-14 20:21 by ngrig, last changed 2022-04-11 14:56 by admin. This issue is now closed.

Files
File name Uploaded Description Edit
saxutils.diff ngrig, 2006-04-14 20:21 Patch for bug #1470540
XMLGenerator.patch serhiy.storchaka, 2012-05-30 07:57 review
XMLGenerator-2.patch serhiy.storchaka, 2012-06-15 07:20 review
XMLGenerator-3.patch serhiy.storchaka, 2012-07-15 07:08 review
XMLGenerator-4.patch serhiy.storchaka, 2013-01-14 13:35 review
XMLGenerator-5.patch serhiy.storchaka, 2013-01-20 15:32 review
XMLGenerator_fragment-2.7.patch serhiy.storchaka, 2013-02-24 09:08 review
saxutils.py neoecos, 2013-03-31 19:33 The patched file
Messages (23)
msg50009 - (view) Author: Nikolai Grigoriev (ngrig) Date: 2006-04-14 20:21
This is a patch to bug #1470540. It enables
xml.sax.saxutils.XMLGenerator to work correctly with
UTF-16 (and other encodings not derived from US-ASCII).
The proposed changes are as follows:

- in XMLGenerator.__init__(), create a StreamWriter
instead of a plain stream;

- in XMLGenerator._write(), convert everything to
Unicode before writing;

- in XMLGenerator.endDocument(), flush the StreamWriter.

The patch is applicable to xml/sax/saxutils.py in the
stable release (2.4.3), as well as to
xmlcore/sax/saxutils.py in the current release (2.5).

The smoke test is attached to the bug description in
the Bug Manager.

Regards,
Nikolai Grigoriev
msg66684 - (view) Author: Georg Brandl (georg.brandl) * (Python committer) Date: 2008-05-11 22:03
Won't this present backwards-compatibility problems if non-ASCII str
content is written?
msg114654 - (view) Author: Mark Lawrence (BreamoreBoy) * Date: 2010-08-22 09:30
The are no unit test or doc changes with the patch.  Can anyone answer Georg's question on msg66684?
msg161764 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2012-05-28 10:43
See also issue1767933.

Instead of codecs.StreamWriter better to use io.TextIOWrapper, because the first is slower and has numerous flaws.
msg161767 - (view) Author: Walter Dörwald (doerwalter) * (Python committer) Date: 2012-05-28 11:07
An alternative would be to use an incremental encoder instead of a StreamWriter. (Which is what TextIOWrapper does internally).
msg161933 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2012-05-30 07:57
Oh, I see XMLGenerator completely outdated. It even has not been ported to Python 3. See function _write:

    def _write(self, text):
        if isinstance(text, str):
            self._out.write(text)
        else:
            self._out.write(text.encode(self._encoding, _error_handling))

In Python 2 there was a choice between bytes and unicode strings. But in Python 3 encoding never happens.

XMLGenerator does not distinguish between binary and text streams.

Here is a patch that fixes the work of XMLGenerator in Python 3. Unfortunately, it is impossible to avoid the loss of backward compatibility. I tried to keep the code to work for the most common cases, but some code which "worked" before may break (including I had to correct some tests).
msg162851 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2012-06-15 07:20
The patch updated to reflect Martin's comments. I hope the old behavior now preserved in the most used in practice cases. Tests converted to work with bytes instead of strings.
msg163740 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2012-06-24 07:20
It would be nice to fix this bug before forking of the 3.3.0b1 release clone.
msg165509 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2012-07-15 07:08
Here is updated patch with more careful handling of closing (as for issue1767933) and added comments.
msg172205 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2012-10-06 15:10
Ping.
msg175472 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2012-11-12 20:44
If nobody has any objections, why not apply this patch?
msg178326 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2012-12-27 20:45
If no one objects I will commit this next year.
msg178369 - (view) Author: Georg Brandl (georg.brandl) * (Python committer) Date: 2012-12-28 07:26
I'd like Antoine to have a look at all that io stuff. It looks quite bloated.

In your except clause, you're not calling self._close.
msg179942 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2013-01-14 13:35
Patch updated. Fixed an error which Georg have found. Restored testing XMLGenerator with StringIO as Antoine pointed. Now XMLGenerator tested for StringIO, BytesIO and an user writer. Added tests for encoding.
msg180297 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2013-01-20 15:32
Patch updated. Now I get rid of __del__ to prevent hanging on reference cicles as Antoine suggested on IRC. Added test for check that XMLGenerator doesn't close the file passed as argument.
msg181797 - (view) Author: Roundup Robot (python-dev) (Python triager) Date: 2013-02-10 12:38
New changeset 010b455de0e0 by Serhiy Storchaka in branch '2.7':
Issue #1470548: XMLGenerator now works with UTF-16 and UTF-32 encodings.
http://hg.python.org/cpython/rev/010b455de0e0

New changeset 66f92f76b2ce by Serhiy Storchaka in branch '3.2':
Issue #1470548: XMLGenerator now works with binary output streams.
http://hg.python.org/cpython/rev/66f92f76b2ce

New changeset 03b878d636cf by Serhiy Storchaka in branch '3.3':
Issue #1470548: XMLGenerator now works with binary output streams.
http://hg.python.org/cpython/rev/03b878d636cf

New changeset 12d75ca12ae7 by Serhiy Storchaka in branch 'default':
Issue #1470548: XMLGenerator now works with binary output streams.
http://hg.python.org/cpython/rev/12d75ca12ae7
msg182819 - (view) Author: Arfrever Frehtes Taifersar Arahesis (Arfrever) * (Python triager) Date: 2013-02-23 20:50
The change in 2.7 branch breaks some software, including a test of Django (produce_xml_fragment from https://github.com/django/django/blob/1.4.5/tests/regressiontests/test_utils/tests.py).
The problem seems to not occur with Python 3.2, 3.3 and 3.4.

Before 010b455de0e0:
>>> from StringIO import StringIO
>>> from xml.sax.saxutils import XMLGenerator
>>> stream = StringIO()
>>> xml = XMLGenerator(stream, encoding='utf-8')
>>> xml.startElement("foo", {"aaa": "1.0", "bbb": "2.0"})
>>> xml.characters("Hello")
>>> xml.endElement("foo")
>>> xml.startElement("bar", {"ccc": "3.0", "ddd": "4.0"})
>>> xml.endElement("bar")
>>> stream.getvalue()
'<foo aaa="1.0" bbb="2.0">Hello</foo><bar ccc="3.0" ddd="4.0"></bar>'
>>>

After 010b455de0e0:
>>> from StringIO import StringIO
>>> from xml.sax.saxutils import XMLGenerator
>>> stream = StringIO()
>>> xml = XMLGenerator(stream, encoding='utf-8')
>>> xml.startElement("foo", {"aaa": "1.0", "bbb": "2.0"})
>>> xml.characters("Hello")
>>> xml.endElement("foo")
>>> xml.startElement("bar", {"ccc": "3.0", "ddd": "4.0"})
>>> xml.endElement("bar")
>>> stream.getvalue()
''
>>>
msg182861 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2013-02-24 09:08
Thank you for report. Here is a patch which fixes this bug.
msg182892 - (view) Author: Arfrever Frehtes Taifersar Arahesis (Arfrever) * (Python triager) Date: 2013-02-24 20:52
This patch works for me.
msg182930 - (view) Author: Roundup Robot (python-dev) (Python triager) Date: 2013-02-25 11:32
New changeset d707e3345a74 by Serhiy Storchaka in branch '2.7':
Issue #1470548: Do not buffer XMLGenerator output.
http://hg.python.org/cpython/rev/d707e3345a74
msg182931 - (view) Author: Roundup Robot (python-dev) (Python triager) Date: 2013-02-25 11:49
New changeset 1c03e499cdc2 by Serhiy Storchaka in branch '3.2':
Issue #1470548: Add test for fragment producing with XMLGenerator.
http://hg.python.org/cpython/rev/1c03e499cdc2

New changeset 5a4b3094903f by Serhiy Storchaka in branch '3.3':
Issue #1470548: Add test for fragment producing with XMLGenerator.
http://hg.python.org/cpython/rev/5a4b3094903f

New changeset 810d70fb17a2 by Serhiy Storchaka in branch 'default':
Issue #1470548: Add test for fragment producing with XMLGenerator.
http://hg.python.org/cpython/rev/810d70fb17a2
msg185644 - (view) Author: Sebastian Ortiz Vasquez (neoecos) Date: 2013-03-31 19:33
I have been working with this in order to generate an RSS feed using web2py.

I found, XMLGenerator method does not validate if is an unicode or string type, and it does not encode accord the encoding parameter of the XMLGenerator.

I added changed the method to verify if is an unicode object or try to convert to it using the desired encoding.

Recall that the _write UnbufferedTextIOWrapper receives an unicode object as parameter.

    def characters(self, content):
        if isinstance(content, unicode):      
            self._write(escape(content))
	else:
	    self._write(escape(unicode(content,self._encoding)))
msg185682 - (view) Author: Arfrever Frehtes Taifersar Arahesis (Arfrever) * (Python triager) Date: 2013-03-31 21:51
Sebastian Ortiz Vasquez: Please file a new issue and attach a patch (in unified format) instead of a whole Python module.
History
Date User Action Args
2022-04-11 14:56:16adminsetgithub: 43215
2013-03-31 22:03:43Arfreversetversions: + Python 3.2, Python 3.3, Python 3.4
2013-03-31 21:51:15Arfreversetmessages: + msg185682
title: Bugfix for #1470540 (XMLGenerator cannot output UTF-16 or UTF-8) -> xml.sax.saxutils.XMLGenerator cannot output UTF-16
2013-03-31 19:33:14neoecossetfiles: + saxutils.py

nosy: + neoecos
versions: - Python 3.2, Python 3.3, Python 3.4
messages: + msg185644

title: Bugfix for #1470540 (XMLGenerator cannot output UTF-16) -> Bugfix for #1470540 (XMLGenerator cannot output UTF-16 or UTF-8)
2013-02-25 11:50:36serhiy.storchakasetstatus: open -> closed
resolution: fixed
stage: resolved
2013-02-25 11:49:19python-devsetmessages: + msg182931
2013-02-25 11:32:14python-devsetmessages: + msg182930
2013-02-24 20:52:51Arfreversetmessages: + msg182892
2013-02-24 09:08:15serhiy.storchakasetfiles: + XMLGenerator_fragment-2.7.patch

messages: + msg182861
2013-02-23 20:50:30Arfreversetstatus: closed -> open
priority: normal -> release blocker


nosy: + Arfrever, benjamin.peterson, larry
messages: + msg182819
resolution: fixed -> (no value)
stage: resolved -> (no value)
2013-02-10 15:23:06serhiy.storchakasetstatus: open -> closed
resolution: fixed
stage: patch review -> resolved
2013-02-10 12:38:00python-devsetnosy: + python-dev
messages: + msg181797
2013-01-20 15:32:51serhiy.storchakasetfiles: + XMLGenerator-5.patch

messages: + msg180297
2013-01-14 13:36:14serhiy.storchakasetstage: needs patch -> patch review
2013-01-14 13:35:33serhiy.storchakasetkeywords: - easy
files: + XMLGenerator-4.patch
messages: + msg179942
2012-12-30 18:40:40serhiy.storchakasetstage: patch review -> needs patch
2012-12-28 07:26:14georg.brandlsetnosy: + pitrou
messages: + msg178369
2012-12-27 20:47:56serhiy.storchakasetassignee: serhiy.storchaka
2012-12-27 20:45:56serhiy.storchakasetmessages: + msg178326
2012-11-12 20:44:06serhiy.storchakasetmessages: + msg175472
2012-10-24 09:02:24serhiy.storchakasetstage: patch review
2012-10-20 20:09:40serhiy.storchakasetkeywords: + needs review
stage: test needed -> (no value)
versions: + Python 3.4, - Python 3.1
2012-10-06 15:10:51serhiy.storchakasetmessages: + msg172205
2012-08-05 11:14:07serhiy.storchakalinkissue4997 superseder
2012-07-20 06:58:46eli.benderskysetnosy: - eli.bendersky
2012-07-15 07:08:12serhiy.storchakasetfiles: + XMLGenerator-3.patch
nosy: + eli.bendersky
messages: + msg165509

2012-06-24 07:20:37serhiy.storchakasetmessages: + msg163740
2012-06-15 07:20:50serhiy.storchakasetfiles: + XMLGenerator-2.patch

messages: + msg162851
2012-05-30 07:58:37serhiy.storchakasetnosy: + loewis
2012-05-30 07:57:37serhiy.storchakasetfiles: + XMLGenerator.patch

messages: + msg161933
2012-05-28 11:07:58doerwaltersetnosy: + doerwalter
messages: + msg161767
2012-05-28 10:43:25serhiy.storchakasetnosy: + serhiy.storchaka

messages: + msg161764
versions: + Python 3.3
2010-08-22 09:30:57BreamoreBoysetnosy: + BreamoreBoy

messages: + msg114654
versions: + Python 3.1, Python 2.7, Python 3.2, - Python 2.6
2009-04-05 13:45:12georg.brandllinkissue1470540 superseder
2009-04-05 13:45:12georg.brandlunlinkissue1470540 dependencies
2009-03-21 02:02:41ajaksu2setstage: test needed
type: behavior
versions: + Python 2.6, - Python 2.5
2009-03-21 02:02:11ajaksu2linkissue1470540 dependencies
2008-05-11 22:03:08georg.brandlsetnosy: + georg.brandl
messages: + msg66684
2008-01-21 13:57:10akuchlingsetkeywords: + easy
2006-04-14 20:21:23ngrigcreate