Message87528
Francesco, I think you are missing the point. :-) The problem has two sides.
If I create an XML document using the DOM (not by parsing it from a
string!), then I can put newline characters into attribute value. This
is allowed and conforms to the XML spec.
However, *literal* newlines in an attribute value (i.e. when the
document is parsed from a string) have no meaning. The parser treats
them as if they were insignificant whitespace -- they are converted to a
single space. This is also valid and conforms to the XML spec.
The catch: This leads to an actual data loss if I *wanted* to store
newline characters in an attribute -- unless the newline characters are
properly encoded. Encoding the newline characters is also valid and
conforms to the spec, so the DOM implementation should do it.
In other words - the parsing process you refer to is actually working
fine. If an attribute contains a literal newline, it is indeed okay to
collapse it into a space. It's only the document serializing that is broken.
Minidom is clearly missing functionality here, and it does not conform
to the XML spec. If I store a string of data in an XML document, it must
be ensured that upon reading the document again, I get the *same* data
back. This is what I check with my test script. |
|
Date |
User |
Action |
Args |
2009-05-10 16:34:34 | Tomalak | set | recipients:
+ Tomalak, sechi_francesco |
2009-05-10 16:34:33 | Tomalak | set | messageid: <1241973273.1.0.543750931833.issue5752@psf.upfronthosting.co.za> |
2009-05-10 16:34:31 | Tomalak | link | issue5752 messages |
2009-05-10 16:34:30 | Tomalak | create | |
|