New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
xml.sax.saxutils.XMLGenerator cannot output UTF-16 #43215
Comments
This is a patch to bug bpo-1470540. It enables
The patch is applicable to xml/sax/saxutils.py in the The smoke test is attached to the bug description in Regards, |
Won't this present backwards-compatibility problems if non-ASCII str |
The are no unit test or doc changes with the patch. Can anyone answer Georg's question on msg66684? |
See also bpo-1767933. Instead of codecs.StreamWriter better to use io.TextIOWrapper, because the first is slower and has numerous flaws. |
An alternative would be to use an incremental encoder instead of a StreamWriter. (Which is what TextIOWrapper does internally). |
Oh, I see XMLGenerator completely outdated. It even has not been ported to Python 3. See function _write: def _write(self, text):
if isinstance(text, str):
self._out.write(text)
else:
self._out.write(text.encode(self._encoding, _error_handling)) In Python 2 there was a choice between bytes and unicode strings. But in Python 3 encoding never happens. XMLGenerator does not distinguish between binary and text streams. Here is a patch that fixes the work of XMLGenerator in Python 3. Unfortunately, it is impossible to avoid the loss of backward compatibility. I tried to keep the code to work for the most common cases, but some code which "worked" before may break (including I had to correct some tests). |
The patch updated to reflect Martin's comments. I hope the old behavior now preserved in the most used in practice cases. Tests converted to work with bytes instead of strings. |
It would be nice to fix this bug before forking of the 3.3.0b1 release clone. |
Here is updated patch with more careful handling of closing (as for bpo-1767933) and added comments. |
Ping. |
If nobody has any objections, why not apply this patch? |
If no one objects I will commit this next year. |
I'd like Antoine to have a look at all that io stuff. It looks quite bloated. In your except clause, you're not calling self._close. |
Patch updated. Fixed an error which Georg have found. Restored testing XMLGenerator with StringIO as Antoine pointed. Now XMLGenerator tested for StringIO, BytesIO and an user writer. Added tests for encoding. |
Patch updated. Now I get rid of __del__ to prevent hanging on reference cicles as Antoine suggested on IRC. Added test for check that XMLGenerator doesn't close the file passed as argument. |
New changeset 010b455de0e0 by Serhiy Storchaka in branch '2.7': New changeset 66f92f76b2ce by Serhiy Storchaka in branch '3.2': New changeset 03b878d636cf by Serhiy Storchaka in branch '3.3': New changeset 12d75ca12ae7 by Serhiy Storchaka in branch 'default': |
The change in 2.7 branch breaks some software, including a test of Django (produce_xml_fragment from https://github.com/django/django/blob/1.4.5/tests/regressiontests/test_utils/tests.py). Before 010b455de0e0:
>>> from StringIO import StringIO
>>> from xml.sax.saxutils import XMLGenerator
>>> stream = StringIO()
>>> xml = XMLGenerator(stream, encoding='utf-8')
>>> xml.startElement("foo", {"aaa": "1.0", "bbb": "2.0"})
>>> xml.characters("Hello")
>>> xml.endElement("foo")
>>> xml.startElement("bar", {"ccc": "3.0", "ddd": "4.0"})
>>> xml.endElement("bar")
>>> stream.getvalue()
'<foo aaa="1.0" bbb="2.0">Hello</foo><bar ccc="3.0" ddd="4.0"></bar>'
>>>
After 010b455de0e0:
>>> from StringIO import StringIO
>>> from xml.sax.saxutils import XMLGenerator
>>> stream = StringIO()
>>> xml = XMLGenerator(stream, encoding='utf-8')
>>> xml.startElement("foo", {"aaa": "1.0", "bbb": "2.0"})
>>> xml.characters("Hello")
>>> xml.endElement("foo")
>>> xml.startElement("bar", {"ccc": "3.0", "ddd": "4.0"})
>>> xml.endElement("bar")
>>> stream.getvalue()
''
>>> |
Thank you for report. Here is a patch which fixes this bug. |
This patch works for me. |
New changeset d707e3345a74 by Serhiy Storchaka in branch '2.7': |
New changeset 1c03e499cdc2 by Serhiy Storchaka in branch '3.2': New changeset 5a4b3094903f by Serhiy Storchaka in branch '3.3': New changeset 810d70fb17a2 by Serhiy Storchaka in branch 'default': |
I have been working with this in order to generate an RSS feed using web2py. I found, XMLGenerator method does not validate if is an unicode or string type, and it does not encode accord the encoding parameter of the XMLGenerator. I added changed the method to verify if is an unicode object or try to convert to it using the desired encoding. Recall that the _write UnbufferedTextIOWrapper receives an unicode object as parameter. def characters(self, content):
if isinstance(content, unicode):
self._write(escape(content))
else:
self._write(escape(unicode(content,self._encoding))) |
Sebastian Ortiz Vasquez: Please file a new issue and attach a patch (in unified format) instead of a whole Python module. |
Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.
Show more details
GitHub fields:
bugs.python.org fields:
The text was updated successfully, but these errors were encountered: