Message195359
I'm not an expert, but from: http://www.w3.org/TR/REC-xml/#NT-AttValue
AttValue ::= '"' ([^<&"] | Reference)* '"' | "'" ([^<&'] | Reference)* "'"
which I read as: Any Reference character is valid, except & and <, which are used for escaping and closing the element.
The sequence <value>]]></value> also valdates as well-formed at http://www.xmlvalidation.com/
The sequence <value>]></value> parses OK (So, it's only with a double ] and > )
It's probably related to parsing <![CDATA[ ... ]]> (i.e. I guess when the parser detects ]]> it
assumes / requires the state of <![CDATA[ which is, of course, not true)
The sequence <value><![CDATA[foo]]></value> is parsed correctly:
>>> ET.fromstring('<value><![CDATA[foo]]></value>').text
'foo'
BTW, lxml.etree.fromstring fails also and so does http://www.w3schools.com/xml/xml_validator.asp
I'll ask around on the lxml mailinglist what they think about this behavior. |
|
Date |
User |
Action |
Args |
2013-08-16 16:48:27 | kees | set | recipients:
+ kees, r.david.murray |
2013-08-16 16:48:27 | kees | set | messageid: <1376671707.02.0.77364460266.issue18753@psf.upfronthosting.co.za> |
2013-08-16 16:48:27 | kees | link | issue18753 messages |
2013-08-16 16:48:26 | kees | create | |
|