This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author effbot
Recipients effbot, flox, georg.brandl, gvanrossum, r.david.murray, scoder
Date 2010-03-12.09:00:45
SpamBayes Score 6.1498795e-08
Marked as misclassified No
Message-id <>
"'None' has always been the documented default for the encoding parameter"

That's probably mostly by accident at least in original ET, but the 1.3 draft docs at does spell it out explicitly for the 'write' method:

   Output encoding. If omitted or set to None, defaults to US-ASCII.

Not sure I'd consider this text binding in itself, though (even if I'd argue that it's preferred to have the same interpretation of encoding everywhere).

"writing out the Unicode serialisation will result in an incorrect XML serialisation"

I think Guido meant the ElementTree.write method; is that broken too?

The file.write(et.tostring()) issue is probably my most pressing concern here; that's a common use case (e.g. when using "iterparse" to cut pieces from a big document), and the defaults were chosen to increase the chance that this automatically do the right thing for non-ASCII even if the programmer never tests it.  In 3.X, that construct is suddenly dependent on the interpreter's default encoding.

I think I'd prefer old "tostring" behaviour and a separate "tounicode" function, and I'm still not convinced that the latter is required for the XML use case (which implies that maybe it should live in lxml.html for the HTML case, even if it ends up calling the same internal implementation).

Or should that be "tobytes" and "tounicode" to eliminate all ambiguity?
Date User Action Args
2010-03-12 09:00:48effbotsetrecipients: + effbot, gvanrossum, georg.brandl, scoder, r.david.murray, flox
2010-03-12 09:00:48effbotsetmessageid: <>
2010-03-12 09:00:46effbotlinkissue8047 messages
2010-03-12 09:00:45effbotcreate