Author Tomalak
Recipients Tomalak, sechi_francesco
Date 2009-05-10.16:34:29
SpamBayes Score 1.48018e-10
Marked as misclassified No
Message-id <1241973273.1.0.543750931833.issue5752@psf.upfronthosting.co.za>
In-reply-to
Content
Francesco, I think you are missing the point. :-) The problem has two sides.

If I create an XML document using the DOM (not by parsing it from a
string!), then I can put newline characters into attribute value. This
is allowed and conforms to the XML spec. 

However, *literal* newlines in an attribute value (i.e. when the
document is parsed from a string) have no meaning. The parser treats
them as if they were insignificant whitespace -- they are converted to a
single space. This is also valid and conforms to the XML spec.

The catch: This leads to an actual data loss if I *wanted* to store
newline characters in an attribute -- unless the newline characters are
properly encoded. Encoding the newline characters is also valid and
conforms to the spec, so the DOM implementation should do it. 

In other words - the parsing process you refer to is actually working
fine. If an attribute contains a literal newline, it is indeed okay to
collapse it into a space. It's only the document serializing that is broken.

Minidom is clearly missing functionality here, and it does not conform
to the XML spec. If I store a string of data in an XML document, it must
be ensured that upon reading the document again, I get the *same* data
back. This is what I check with my test script.
History
Date User Action Args
2009-05-10 16:34:34Tomalaksetrecipients: + Tomalak, sechi_francesco
2009-05-10 16:34:33Tomalaksetmessageid: <1241973273.1.0.543750931833.issue5752@psf.upfronthosting.co.za>
2009-05-10 16:34:31Tomalaklinkissue5752 messages
2009-05-10 16:34:30Tomalakcreate