This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author ezio.melotti
Recipients eric.araujo, ezio.melotti, r.david.murray, teoryn
Date 2011-11-01.13:21:41
SpamBayes Score 1.5006343e-09
Marked as misclassified No
Message-id <1320153702.19.0.00632884125491.issue12629@psf.upfronthosting.co.za>
In-reply-to
Content
I think <x><y z=""o"" /></x> should be parser as <x><y z="" /></x>, and the o"" should be ignored.
<x><y z="""" /></x> should be parser as <x><y z="" /></x>, and the last two "" should be ignored.  This is what Firefox seems to do.

Currently the parser doesn't seem to handle extraneous data in the start tag too well, because the locatestarttagend_tolerant regex looks for (more or less) well-formed attributes.
Attached a patch for test_htmlparser with the two examples provided by Kevin.
History
Date User Action Args
2011-11-01 13:21:42ezio.melottisetrecipients: + ezio.melotti, eric.araujo, r.david.murray, teoryn
2011-11-01 13:21:42ezio.melottisetmessageid: <1320153702.19.0.00632884125491.issue12629@psf.upfronthosting.co.za>
2011-11-01 13:21:41ezio.melottilinkissue12629 messages
2011-11-01 13:21:41ezio.melotticreate