Message 98424 - Python tracker

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author	eric.araujo
Recipients	eric.araujo, once-off
Date	2010-01-27.13:40:34
SpamBayes Score	0.09393517
Marked as misclassified	No
Message-id	<1264599638.21.0.765850292385.issue5498@psf.upfronthosting.co.za>
In-reply-to

Content
Hello XML of the form <tag/> are an SGML hack, or more precisely the combination of two features of SGML. The forward slash closes the tag, and the following angle bracket is character data, not part of the tag. The W3C validator uses a real SGML parser for HTML doctypes, and fails on XML-like /> constructs: http://validator.w3.org/check?uri=data%3Atext%2Fhtml%2C%3C!DOCTYPE+html+PUBLIC+%22-%2F%2FW3C%2F%2FDTD+HTML+4.01%2F%2FEN%22+%22http%3A%2F%2Fwww.w3.org%2FTR%2Fhtml4%2Fstrict.dtd%22%3E+%3Chtml%3E+%3Chead%3E+++%3Ctitle%3ETest%3C%2Ftitle%3E+++%3Cmeta+name%3Dtest+content%3Done%2F%3E+++%3Cmeta+name%3Dbug+content%3Dtwo%3E+%3C%2Fhead%3E+%3Cbody%3E+++%3Cp%3ETest%3C%2Fp%3E+%3C%2Fbody%3E+%3C%2Fhtml%3E&charset=%28detect+automatically%29&doctype=Inline&group=1&verbose=1 The complete explanation can be read at http://www.cs.tut.fi/~jkorpela/html/empty.html In conclusion, sgmllib is right. Use an XML parser for XML or an HTML5 parser for HTML. Kind regards

Hello

XML of the form <tag/> are an SGML hack, or more precisely the combination of two features of SGML. The forward slash closes the tag, and the following angle bracket is character data, not part of the tag.

The W3C validator  uses a real SGML parser for HTML doctypes, and fails on XML-like /> constructs: http://validator.w3.org/check?uri=data%3Atext%2Fhtml%2C%3C!DOCTYPE+html+PUBLIC+%22-%2F%2FW3C%2F%2FDTD+HTML+4.01%2F%2FEN%22+%22http%3A%2F%2Fwww.w3.org%2FTR%2Fhtml4%2Fstrict.dtd%22%3E+%3Chtml%3E+%3Chead%3E+++%3Ctitle%3ETest%3C%2Ftitle%3E+++%3Cmeta+name%3Dtest+content%3Done%2F%3E+++%3Cmeta+name%3Dbug+content%3Dtwo%3E+%3C%2Fhead%3E+%3Cbody%3E+++%3Cp%3ETest%3C%2Fp%3E+%3C%2Fbody%3E+%3C%2Fhtml%3E&charset=%28detect+automatically%29&doctype=Inline&group=1&verbose=1

The complete explanation can be read at http://www.cs.tut.fi/~jkorpela/html/empty.html

In conclusion, sgmllib is right. Use an XML parser for XML or an HTML5 parser for HTML.

Kind regards

History
Date	User	Action	Args
2010-01-27 13:40:38	eric.araujo	set	recipients: + eric.araujo, once-off
2010-01-27 13:40:38	eric.araujo	set	messageid: <1264599638.21.0.765850292385.issue5498@psf.upfronthosting.co.za>
2010-01-27 13:40:36	eric.araujo	link	issue5498 messages
2010-01-27 13:40:35	eric.araujo	create