Message 22920 - Python tracker

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author	neptune235
Recipients
Date	2004-10-28.04:59:36
SpamBayes Score
Marked as misclassified
Message-id
In-reply-to

Content
HTMLParser has a problem related to the fact that is doesn't seem to comply to the spec for XHTML. What I am refering to can be read about here: http://www.w3.org/TR/xhtml1/#h-4.8 In a nutshell, HTMLParser doesn't treat data inside 'script' or 'style' elements as #PCDATA, but rather behaves like an HTML 4 parser even for XHTML documents, parsing only end tags. As a result, entity references in javascript are not converted as they should be. XHTML authors writing to spec can expect entities in script sections of XHTML documents to be converted if the script is not explicitly escaped as a CDATA section. which brings up problem two, That sections explicitly escaped as CDATA are also parsed as HTML 4 'script' and 'style' sections...End tags are parsed... My understanding is that this is bad as well: http://www.w3.org/TR/2004/REC-xml-20040204/#dt-cdsection because CDend is the only thing that's supposed to be parsed in a CDATA section for all XML documents?

HTMLParser has a problem related to the fact that is
doesn't seem to comply to the spec for XHTML. What I am
refering to can be read about here:
http://www.w3.org/TR/xhtml1/#h-4.8
In a nutshell, HTMLParser doesn't treat data inside
'script' or 'style' elements as #PCDATA, but rather
behaves like an HTML 4 parser even for XHTML documents,
parsing only end tags. As a result, entity references
in javascript are not converted as they should be.
XHTML authors writing to spec can expect entities in
script sections of XHTML documents to be converted if
the script is not explicitly escaped as a CDATA
section. which brings up problem two, That sections
explicitly escaped as CDATA are also parsed as HTML 4
'script' and 'style' sections...End tags are parsed...
My understanding is that this is bad as well:
http://www.w3.org/TR/2004/REC-xml-20040204/#dt-cdsection
because CDend is the only thing that's supposed to be
parsed in a CDATA section for all XML documents?

History
Date	User	Action	Args
2007-08-23 14:27:09	admin	link	issue1055864 messages
2007-08-23 14:27:09	admin	create