Message 195359 - Python tracker

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author	kees
Recipients	kees, r.david.murray
Date	2013-08-16.16:48:26
SpamBayes Score	-1.0
Marked as misclassified	Yes
Message-id	<1376671707.02.0.77364460266.issue18753@psf.upfronthosting.co.za>
In-reply-to

Content
I'm not an expert, but from: http://www.w3.org/TR/REC-xml/#NT-AttValue AttValue ::= '"' ([^<&"] \| Reference)* '"' \| "'" ([^<&'] \| Reference)* "'" which I read as: Any Reference character is valid, except & and <, which are used for escaping and closing the element. The sequence <value>]]></value> also valdates as well-formed at http://www.xmlvalidation.com/ The sequence <value>]></value> parses OK (So, it's only with a double ] and > ) It's probably related to parsing <![CDATA[ ... ]]> (i.e. I guess when the parser detects ]]> it assumes / requires the state of <![CDATA[ which is, of course, not true) The sequence <value><![CDATA[foo]]></value> is parsed correctly: >>> ET.fromstring('<value><![CDATA[foo]]></value>').text 'foo' BTW, lxml.etree.fromstring fails also and so does http://www.w3schools.com/xml/xml_validator.asp I'll ask around on the lxml mailinglist what they think about this behavior.

I'm not an expert, but from: http://www.w3.org/TR/REC-xml/#NT-AttValue

	AttValue ::= '"' ([^<&"] | Reference)* '"' | "'" ([^<&'] | Reference)* "'"

which I read as: Any Reference character is valid, except & and <, which are used for escaping and closing the element.

The sequence <value>]]></value> also valdates as well-formed at http://www.xmlvalidation.com/

The sequence <value>]></value> parses OK (So, it's only with a double ] and > )

It's probably related to parsing <![CDATA[ ... ]]> (i.e. I guess when the parser detects ]]> it 
assumes / requires the state of <![CDATA[ which is, of course, not true)

The sequence <value><![CDATA[foo]]></value> is parsed correctly:
>>> ET.fromstring('<value><![CDATA[foo]]></value>').text
'foo'


BTW, lxml.etree.fromstring fails also and so does http://www.w3schools.com/xml/xml_validator.asp

I'll ask around on the lxml mailinglist what they think about this behavior.

History
Date	User	Action	Args
2013-08-16 16:48:27	kees	set	recipients: + kees, r.david.murray
2013-08-16 16:48:27	kees	set	messageid: <1376671707.02.0.77364460266.issue18753@psf.upfronthosting.co.za>
2013-08-16 16:48:27	kees	link	issue18753 messages
2013-08-16 16:48:26	kees	create