Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

xml.sax.saxutils.XMLGenerator cannot output UTF-16 #43215

Closed
ngrig mannequin opened this issue Apr 14, 2006 · 23 comments
Closed

xml.sax.saxutils.XMLGenerator cannot output UTF-16 #43215

ngrig mannequin opened this issue Apr 14, 2006 · 23 comments
Assignees
Labels
release-blocker topic-XML type-bug An unexpected behavior, bug, or error

Comments

@ngrig
Copy link
Mannequin

ngrig mannequin commented Apr 14, 2006

BPO 1470548
Nosy @loewis, @doerwalter, @birkenfeld, @pitrou, @larryhastings, @benjaminp, @serhiy-storchaka
Files
  • saxutils.diff: Patch for bug XMLGenerator creates a mess with UTF-16 #43213
  • XMLGenerator.patch
  • XMLGenerator-2.patch
  • XMLGenerator-3.patch
  • XMLGenerator-4.patch
  • XMLGenerator-5.patch
  • XMLGenerator_fragment-2.7.patch
  • saxutils.py: The patched file
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = 'https://github.com/serhiy-storchaka'
    closed_at = <Date 2013-02-25.11:50:36.114>
    created_at = <Date 2006-04-14.20:21:23.000>
    labels = ['expert-XML', 'type-bug', 'release-blocker']
    title = 'xml.sax.saxutils.XMLGenerator cannot output UTF-16'
    updated_at = <Date 2013-03-31.22:03:43.896>
    user = 'https://bugs.python.org/ngrig'

    bugs.python.org fields:

    activity = <Date 2013-03-31.22:03:43.896>
    actor = 'Arfrever'
    assignee = 'serhiy.storchaka'
    closed = True
    closed_date = <Date 2013-02-25.11:50:36.114>
    closer = 'serhiy.storchaka'
    components = ['XML']
    creation = <Date 2006-04-14.20:21:23.000>
    creator = 'ngrig'
    dependencies = []
    files = ['7157', '25760', '26011', '26385', '28724', '28797', '29218', '29623']
    hgrepos = []
    issue_num = 1470548
    keywords = ['patch', 'needs review']
    message_count = 23.0
    messages = ['50009', '66684', '114654', '161764', '161767', '161933', '162851', '163740', '165509', '172205', '175472', '178326', '178369', '179942', '180297', '181797', '182819', '182861', '182892', '182930', '182931', '185644', '185682']
    nosy_count = 12.0
    nosy_names = ['loewis', 'doerwalter', 'georg.brandl', 'ngrig', 'pitrou', 'larry', 'benjamin.peterson', 'Arfrever', 'BreamoreBoy', 'python-dev', 'serhiy.storchaka', 'neoecos']
    pr_nums = []
    priority = 'release blocker'
    resolution = 'fixed'
    stage = 'resolved'
    status = 'closed'
    superseder = None
    type = 'behavior'
    url = 'https://bugs.python.org/issue1470548'
    versions = ['Python 2.7', 'Python 3.2', 'Python 3.3', 'Python 3.4']

    @ngrig
    Copy link
    Mannequin Author

    ngrig mannequin commented Apr 14, 2006

    This is a patch to bug bpo-1470540. It enables
    xml.sax.saxutils.XMLGenerator to work correctly with
    UTF-16 (and other encodings not derived from US-ASCII).
    The proposed changes are as follows:

    • in XMLGenerator.__init__(), create a StreamWriter
      instead of a plain stream;

    • in XMLGenerator._write(), convert everything to
      Unicode before writing;

    • in XMLGenerator.endDocument(), flush the StreamWriter.

    The patch is applicable to xml/sax/saxutils.py in the
    stable release (2.4.3), as well as to
    xmlcore/sax/saxutils.py in the current release (2.5).

    The smoke test is attached to the bug description in
    the Bug Manager.

    Regards,
    Nikolai Grigoriev

    @ngrig ngrig mannequin added topic-XML labels Apr 14, 2006
    @birkenfeld
    Copy link
    Member

    Won't this present backwards-compatibility problems if non-ASCII str
    content is written?

    @devdanzin devdanzin mannequin added type-bug An unexpected behavior, bug, or error labels Mar 21, 2009
    @BreamoreBoy
    Copy link
    Mannequin

    BreamoreBoy mannequin commented Aug 22, 2010

    The are no unit test or doc changes with the patch. Can anyone answer Georg's question on msg66684?

    @serhiy-storchaka
    Copy link
    Member

    See also bpo-1767933.

    Instead of codecs.StreamWriter better to use io.TextIOWrapper, because the first is slower and has numerous flaws.

    @doerwalter
    Copy link
    Contributor

    An alternative would be to use an incremental encoder instead of a StreamWriter. (Which is what TextIOWrapper does internally).

    @serhiy-storchaka
    Copy link
    Member

    Oh, I see XMLGenerator completely outdated. It even has not been ported to Python 3. See function _write:

        def _write(self, text):
            if isinstance(text, str):
                self._out.write(text)
            else:
                self._out.write(text.encode(self._encoding, _error_handling))

    In Python 2 there was a choice between bytes and unicode strings. But in Python 3 encoding never happens.

    XMLGenerator does not distinguish between binary and text streams.

    Here is a patch that fixes the work of XMLGenerator in Python 3. Unfortunately, it is impossible to avoid the loss of backward compatibility. I tried to keep the code to work for the most common cases, but some code which "worked" before may break (including I had to correct some tests).

    @serhiy-storchaka
    Copy link
    Member

    The patch updated to reflect Martin's comments. I hope the old behavior now preserved in the most used in practice cases. Tests converted to work with bytes instead of strings.

    @serhiy-storchaka
    Copy link
    Member

    It would be nice to fix this bug before forking of the 3.3.0b1 release clone.

    @serhiy-storchaka
    Copy link
    Member

    Here is updated patch with more careful handling of closing (as for bpo-1767933) and added comments.

    @serhiy-storchaka
    Copy link
    Member

    Ping.

    @serhiy-storchaka
    Copy link
    Member

    If nobody has any objections, why not apply this patch?

    @serhiy-storchaka
    Copy link
    Member

    If no one objects I will commit this next year.

    @birkenfeld
    Copy link
    Member

    I'd like Antoine to have a look at all that io stuff. It looks quite bloated.

    In your except clause, you're not calling self._close.

    @serhiy-storchaka
    Copy link
    Member

    Patch updated. Fixed an error which Georg have found. Restored testing XMLGenerator with StringIO as Antoine pointed. Now XMLGenerator tested for StringIO, BytesIO and an user writer. Added tests for encoding.

    @serhiy-storchaka
    Copy link
    Member

    Patch updated. Now I get rid of __del__ to prevent hanging on reference cicles as Antoine suggested on IRC. Added test for check that XMLGenerator doesn't close the file passed as argument.

    @python-dev
    Copy link
    Mannequin

    python-dev mannequin commented Feb 10, 2013

    New changeset 010b455de0e0 by Serhiy Storchaka in branch '2.7':
    Issue bpo-1470548: XMLGenerator now works with UTF-16 and UTF-32 encodings.
    http://hg.python.org/cpython/rev/010b455de0e0

    New changeset 66f92f76b2ce by Serhiy Storchaka in branch '3.2':
    Issue bpo-1470548: XMLGenerator now works with binary output streams.
    http://hg.python.org/cpython/rev/66f92f76b2ce

    New changeset 03b878d636cf by Serhiy Storchaka in branch '3.3':
    Issue bpo-1470548: XMLGenerator now works with binary output streams.
    http://hg.python.org/cpython/rev/03b878d636cf

    New changeset 12d75ca12ae7 by Serhiy Storchaka in branch 'default':
    Issue bpo-1470548: XMLGenerator now works with binary output streams.
    http://hg.python.org/cpython/rev/12d75ca12ae7

    @Arfrever
    Copy link
    Mannequin

    Arfrever mannequin commented Feb 23, 2013

    The change in 2.7 branch breaks some software, including a test of Django (produce_xml_fragment from https://github.com/django/django/blob/1.4.5/tests/regressiontests/test_utils/tests.py).
    The problem seems to not occur with Python 3.2, 3.3 and 3.4.

    Before 010b455de0e0:
    >>> from StringIO import StringIO
    >>> from xml.sax.saxutils import XMLGenerator
    >>> stream = StringIO()
    >>> xml = XMLGenerator(stream, encoding='utf-8')
    >>> xml.startElement("foo", {"aaa": "1.0", "bbb": "2.0"})
    >>> xml.characters("Hello")
    >>> xml.endElement("foo")
    >>> xml.startElement("bar", {"ccc": "3.0", "ddd": "4.0"})
    >>> xml.endElement("bar")
    >>> stream.getvalue()
    '<foo aaa="1.0" bbb="2.0">Hello</foo><bar ccc="3.0" ddd="4.0"></bar>'
    >>>
    
    After 010b455de0e0:
    >>> from StringIO import StringIO
    >>> from xml.sax.saxutils import XMLGenerator
    >>> stream = StringIO()
    >>> xml = XMLGenerator(stream, encoding='utf-8')
    >>> xml.startElement("foo", {"aaa": "1.0", "bbb": "2.0"})
    >>> xml.characters("Hello")
    >>> xml.endElement("foo")
    >>> xml.startElement("bar", {"ccc": "3.0", "ddd": "4.0"})
    >>> xml.endElement("bar")
    >>> stream.getvalue()
    ''
    >>>

    @Arfrever Arfrever mannequin reopened this Feb 23, 2013
    @Arfrever Arfrever mannequin added the release-blocker label Feb 23, 2013
    @Arfrever Arfrever mannequin reopened this Feb 23, 2013
    @Arfrever Arfrever mannequin added the release-blocker label Feb 23, 2013
    @serhiy-storchaka
    Copy link
    Member

    Thank you for report. Here is a patch which fixes this bug.

    @Arfrever
    Copy link
    Mannequin

    Arfrever mannequin commented Feb 24, 2013

    This patch works for me.

    @python-dev
    Copy link
    Mannequin

    python-dev mannequin commented Feb 25, 2013

    New changeset d707e3345a74 by Serhiy Storchaka in branch '2.7':
    Issue bpo-1470548: Do not buffer XMLGenerator output.
    http://hg.python.org/cpython/rev/d707e3345a74

    @python-dev
    Copy link
    Mannequin

    python-dev mannequin commented Feb 25, 2013

    New changeset 1c03e499cdc2 by Serhiy Storchaka in branch '3.2':
    Issue bpo-1470548: Add test for fragment producing with XMLGenerator.
    http://hg.python.org/cpython/rev/1c03e499cdc2

    New changeset 5a4b3094903f by Serhiy Storchaka in branch '3.3':
    Issue bpo-1470548: Add test for fragment producing with XMLGenerator.
    http://hg.python.org/cpython/rev/5a4b3094903f

    New changeset 810d70fb17a2 by Serhiy Storchaka in branch 'default':
    Issue bpo-1470548: Add test for fragment producing with XMLGenerator.
    http://hg.python.org/cpython/rev/810d70fb17a2

    @neoecos
    Copy link
    Mannequin

    neoecos mannequin commented Mar 31, 2013

    I have been working with this in order to generate an RSS feed using web2py.

    I found, XMLGenerator method does not validate if is an unicode or string type, and it does not encode accord the encoding parameter of the XMLGenerator.

    I added changed the method to verify if is an unicode object or try to convert to it using the desired encoding.

    Recall that the _write UnbufferedTextIOWrapper receives an unicode object as parameter.

        def characters(self, content):
            if isinstance(content, unicode):      
                self._write(escape(content))
    	else:
    	    self._write(escape(unicode(content,self._encoding)))

    @neoecos neoecos mannequin changed the title Bugfix for #1470540 (XMLGenerator cannot output UTF-16) Bugfix for #1470540 (XMLGenerator cannot output UTF-16 or UTF-8) Mar 31, 2013
    @neoecos neoecos mannequin changed the title Bugfix for #1470540 (XMLGenerator cannot output UTF-16) Bugfix for #1470540 (XMLGenerator cannot output UTF-16 or UTF-8) Mar 31, 2013
    @Arfrever
    Copy link
    Mannequin

    Arfrever mannequin commented Mar 31, 2013

    Sebastian Ortiz Vasquez: Please file a new issue and attach a patch (in unified format) instead of a whole Python module.

    @Arfrever Arfrever mannequin changed the title Bugfix for #1470540 (XMLGenerator cannot output UTF-16 or UTF-8) xml.sax.saxutils.XMLGenerator cannot output UTF-16 Mar 31, 2013
    @Arfrever Arfrever mannequin changed the title Bugfix for #1470540 (XMLGenerator cannot output UTF-16 or UTF-8) xml.sax.saxutils.XMLGenerator cannot output UTF-16 Mar 31, 2013
    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    release-blocker topic-XML type-bug An unexpected behavior, bug, or error
    Projects
    None yet
    Development

    No branches or pull requests

    4 participants