This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: XML codec
Type: enhancement Stage:
Components: Unicode Versions: Python 2.6
process
Status: closed Resolution: rejected
Dependencies: Superseder:
Assigned To: lemburg Nosy List: doerwalter, jafo, lemburg
Priority: normal Keywords: patch

Created on 2007-11-07 17:52 by doerwalter, last changed 2022-04-11 14:56 by admin. This issue is now closed.

Files
File name Uploaded Description Edit
diff.txt doerwalter, 2007-11-07 17:52
diff2.txt doerwalter, 2007-11-08 21:25
Messages (9)
msg57211 - (view) Author: Walter Dörwald (doerwalter) * (Python committer) Date: 2007-11-07 17:52
The patch adds an XML codec. It implements encoding detection as
specified in http://www.w3.org/TR/2004/REC-xml-20040204/#sec-guessing
and supports externally specified encodings for both encoding and decoding.
msg57213 - (view) Author: Guido van Rossum (gvanrossum) * (Python committer) Date: 2007-11-07 17:53
I think it's good to add this; I don't have time to review though.
msg57221 - (view) Author: Marc-Andre Lemburg (lemburg) * (Python committer) Date: 2007-11-07 19:43
Nice codec !

The only nit I have is the name: "xml" isn't intuitive enough. I had to
read the code to figure out what the codec actually does. 

"xml" used a encoding usually refers to having Unicode text converted to
ASCII with XML entity escapes for all non-ASCII characters.

How about "xml-auto-detect" or something along those lines ?!
msg57222 - (view) Author: Walter Dörwald (doerwalter) * (Python committer) Date: 2007-11-07 21:42
"xml-auto-detect" sounds OK to me, it even makes sense for the encoder,
because it normally detects the encoding to use for writing from the XML
declaration.

We could put "xml-auto-detect" into the alias mapping and keep xml as
the module name.

But I noticed I have to rewrap a lot of lines, before I check it in.
msg57224 - (view) Author: Marc-Andre Lemburg (lemburg) * (Python committer) Date: 2007-11-07 21:54
Leaving the module name as "xml" would remove that name from the
namespace of possible encodings.

"xml" as encoding name is problematic, as many people regard writing
data in XML as "encoding the data in XML".

I'd simply not use it at all, not even for a codec that converts between
 Unicode and ASCII+XML entities.
msg57280 - (view) Author: Walter Dörwald (doerwalter) * (Python committer) Date: 2007-11-08 21:25
OK, I've changed the name of the codec to xml_auto_detect and added
support for EBCDIC.
msg57281 - (view) Author: Marc-Andre Lemburg (lemburg) * (Python committer) Date: 2007-11-08 21:37
Thanks, Walter !
msg63696 - (view) Author: Sean Reifschneider (jafo) * (Python committer) Date: 2008-03-17 17:52
Marc-Andre: Is this good to be committed, or does it need to be reviewed
further?
msg63703 - (view) Author: Walter Dörwald (doerwalter) * (Python committer) Date: 2008-03-17 18:14
There was resistance in python-dev against this patch (see the thread at
http://mail.python.org/pipermail/python-dev/2007-November/075138.html),
so this issue should probably closed as rejected.

However there was consensus, that a detect_xml_encoding() function might
be usefull.
History
Date User Action Args
2022-04-11 14:56:28adminsetgithub: 45740
2008-03-18 15:14:31jafosetstatus: open -> closed
resolution: rejected
2008-03-17 18:14:25doerwaltersetmessages: + msg63703
2008-03-17 17:52:30jafosetpriority: normal
assignee: lemburg
messages: + msg63696
nosy: + jafo
2007-11-08 21:37:27lemburgsetmessages: + msg57281
2007-11-08 21:25:53doerwaltersetfiles: + diff2.txt
messages: + msg57280
2007-11-07 21:59:56gvanrossumsetnosy: - gvanrossum
2007-11-07 21:54:18lemburgsetmessages: + msg57224
2007-11-07 21:42:20doerwaltersetmessages: + msg57222
2007-11-07 19:43:06lemburgsetnosy: + lemburg
messages: + msg57221
2007-11-07 17:53:57gvanrossumsetnosy: + gvanrossum
messages: + msg57213
2007-11-07 17:52:18doerwaltercreate