This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author scoder
Recipients scoder
Date 2010-03-03.07:15:21
SpamBayes Score 1.06359e-13
Marked as misclassified No
Message-id <1267600525.56.0.547982490868.issue8047@psf.upfronthosting.co.za>
In-reply-to
Content
The xml.etree.ElementTree package in the Python 3.x standard library breaks compatibility with existing ET 1.2 code. The serialiser returns a unicode string when no encoding is passed. Previously, the serialiser was guaranteed to return a byte string. By default, the string was 7-bit ASCII compatible.

This behavioural change breaks all code that relies on the default behaviour of ElementTree. Since there is no longer a default encoding in Python 3, unicode strings are incompatible with byte strings, which means that the result of the serialisation can no longer be written to a file, for example.

XML is well defined as a stream of bytes. Redefining it as a unicode string *by default* is hard to understand at best.

Finally, it would have been good to look at the other ET implementation before introducing such a change. The lxml.etree package has had support for serialising XML into a unicode string for years, and does so in a clear, safe and explicit way. It requires the user to pass the 'unicode' (Py3 'str') type as encoding parameter, e.g.

    tree.tostring(encoding=str)

which is explicit enough to make it clear that this is different from a normal encoding.
History
Date User Action Args
2010-03-03 07:15:25scodersetrecipients: + scoder
2010-03-03 07:15:25scodersetmessageid: <1267600525.56.0.547982490868.issue8047@psf.upfronthosting.co.za>
2010-03-03 07:15:23scoderlinkissue8047 messages
2010-03-03 07:15:22scodercreate