This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author eric.araujo
Recipients eric.araujo, once-off
Date 2010-01-27.13:40:34
SpamBayes Score 0.09393517
Marked as misclassified No
Message-id <1264599638.21.0.765850292385.issue5498@psf.upfronthosting.co.za>
In-reply-to
Content
Hello

XML of the form <tag/> are an SGML hack, or more precisely the combination of two features of SGML. The forward slash closes the tag, and the following angle bracket is character data, not part of the tag.

The W3C validator  uses a real SGML parser for HTML doctypes, and fails on XML-like /> constructs: http://validator.w3.org/check?uri=data%3Atext%2Fhtml%2C%3C!DOCTYPE+html+PUBLIC+%22-%2F%2FW3C%2F%2FDTD+HTML+4.01%2F%2FEN%22+%22http%3A%2F%2Fwww.w3.org%2FTR%2Fhtml4%2Fstrict.dtd%22%3E+%3Chtml%3E+%3Chead%3E+++%3Ctitle%3ETest%3C%2Ftitle%3E+++%3Cmeta+name%3Dtest+content%3Done%2F%3E+++%3Cmeta+name%3Dbug+content%3Dtwo%3E+%3C%2Fhead%3E+%3Cbody%3E+++%3Cp%3ETest%3C%2Fp%3E+%3C%2Fbody%3E+%3C%2Fhtml%3E&charset=%28detect+automatically%29&doctype=Inline&group=1&verbose=1

The complete explanation can be read at http://www.cs.tut.fi/~jkorpela/html/empty.html

In conclusion, sgmllib is right. Use an XML parser for XML or an HTML5 parser for HTML.

Kind regards
History
Date User Action Args
2010-01-27 13:40:38eric.araujosetrecipients: + eric.araujo, once-off
2010-01-27 13:40:38eric.araujosetmessageid: <1264599638.21.0.765850292385.issue5498@psf.upfronthosting.co.za>
2010-01-27 13:40:36eric.araujolinkissue5498 messages
2010-01-27 13:40:35eric.araujocreate