Message 100880 - Python tracker

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author	effbot
Recipients	effbot, flox, georg.brandl, pitrou, r.david.murray, scoder
Date	2010-03-11.19:01:10
SpamBayes Score	1.2521727e-08
Marked as misclassified	No
Message-id	<1268334073.06.0.43535054637.issue8047@psf.upfronthosting.co.za>
In-reply-to

Content
> if I don't specify an encoding, I get unicode. If I do specify an encoding, I get encoded bytes. You're confusing the XML document encoding with character set encoding. A serialized (unparsed) XML document is a byte stream, not a string of Unicode characters. And the character set encoding is both embedded in that byte stream and affects how it's generated in more than one way; you cannot just recode XML documents nilly willy and expect things to work. A parsed XML document (an infoset) -- for ET, that's the tree of Element objects -- does indeed contain Unicode strings, but the transformation from the byte stream to the Unicode string doesn't just involve character set decoding; there are several other constructs that are handled by the XML parser. > Ha. There has been a very long temporal window You should have had plenty of time to fix it, then, right?

> if I don't specify an encoding, I get unicode.  If I do specify an encoding, I get encoded bytes.

You're confusing the XML document encoding with character set encoding.

A serialized (unparsed) XML document is a byte stream, not a string of Unicode characters.  And the character set encoding is both embedded in that byte stream and affects how it's generated in more than one way; you cannot just recode XML documents nilly willy and expect things to work.

A parsed XML document (an infoset) -- for ET, that's the tree of Element objects -- does indeed contain Unicode strings, but the transformation from the byte stream to the Unicode string doesn't just involve character set decoding; there are several other constructs that are handled by the XML parser.

> Ha. There has been a very long temporal window

You should have had plenty of time to fix it, then, right?

History
Date	User	Action	Args
2010-03-11 19:01:13	effbot	set	recipients: + effbot, georg.brandl, pitrou, scoder, r.david.murray, flox
2010-03-11 19:01:13	effbot	set	messageid: <1268334073.06.0.43535054637.issue8047@psf.upfronthosting.co.za>
2010-03-11 19:01:10	effbot	link	issue8047 messages
2010-03-11 19:01:10	effbot	create