This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: XMLGenerator creates a mess with UTF-16
Type: behavior Stage: test needed
Components: XML Versions: Python 2.6
process
Status: closed Resolution: duplicate
Dependencies: Superseder: xml.sax.saxutils.XMLGenerator cannot output UTF-16
View: 1470548
Assigned To: Nosy List: ajaksu2, ngrig
Priority: normal Keywords:

Created on 2006-04-14 20:07 by ngrig, last changed 2022-04-11 14:56 by admin. This issue is now closed.

Files
File name Uploaded Description Edit
enctest.py ngrig, 2006-04-14 20:07 Test for UTF-16 treatment in xml.sax.saxutils.XMLGenerator
Messages (3)
msg28244 - (view) Author: Nikolai Grigoriev (ngrig) Date: 2006-04-14 20:07
When output encoding in xml.sax.saxutils.XMLGenerator
is set to UTF-16, the result is a terrible mess. Namely:

- it does not encode the XML declaration at the very
top of the file (leaving it in single-byte Latin);

- it leaves closing '>' of each start tag unencoded
(that is, always outputs a single byte);

- it inserts a spurious byte order mark for each tag,
each attribute, each text node, and each processing
instruction.

A test illustrating the issue is attached. The issue is
applicable to both stable (2.4.3) and current (2.5)
versions of Python.

---------------------------------------------
Looking in xml/sax/saxutils.py, I see the problem in
XMLGenerator._write():
   - one-byte strings aren't recoded at all (sic!);
   - two-byte strings are converted using
unicode.encode(); this results in a BOM for each call of
_write() on Unicode strings.

The issue is easy to fix by using StreamWriter instead
of  a plain stream as the output sink. I am going to
submit a patch shortly.

Regards,
Nikolai Grigoriev 
msg28245 - (view) Author: Nikolai Grigoriev (ngrig) Date: 2006-04-16 07:42
Logged In: YES 
user_id=195108

FYI: I posted a patch (#1470548) that fixes the issue. 

Regards,
Nikolai Grigoriev
msg83907 - (view) Author: Daniel Diniz (ajaksu2) * (Python triager) Date: 2009-03-21 02:02
Patch on issue 1470548.
History
Date User Action Args
2022-04-11 14:56:16adminsetgithub: 43213
2009-04-05 13:45:12georg.brandlsetstatus: open -> closed
resolution: duplicate
dependencies: - xml.sax.saxutils.XMLGenerator cannot output UTF-16
superseder: xml.sax.saxutils.XMLGenerator cannot output UTF-16
2009-03-21 02:02:11ajaksu2setdependencies: + xml.sax.saxutils.XMLGenerator cannot output UTF-16
type: behavior
versions: + Python 2.6, - Python 2.5
nosy: + ajaksu2

messages: + msg83907
stage: test needed
2006-04-14 20:07:46ngrigcreate