Message 146774 - Python tracker

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author	ezio.melotti
Recipients	eric.araujo, ezio.melotti, r.david.murray, teoryn
Date	2011-11-01.13:21:41
SpamBayes Score	1.5006343e-09
Marked as misclassified	No
Message-id	<1320153702.19.0.00632884125491.issue12629@psf.upfronthosting.co.za>
In-reply-to

Content
I think <x><y z=""o"" /></x> should be parser as <x><y z="" /></x>, and the o"" should be ignored. <x><y z="""" /></x> should be parser as <x><y z="" /></x>, and the last two "" should be ignored. This is what Firefox seems to do. Currently the parser doesn't seem to handle extraneous data in the start tag too well, because the locatestarttagend_tolerant regex looks for (more or less) well-formed attributes. Attached a patch for test_htmlparser with the two examples provided by Kevin.

I think <x><y z=""o"" /></x> should be parser as <x><y z="" /></x>, and the o"" should be ignored.
<x><y z="""" /></x> should be parser as <x><y z="" /></x>, and the last two "" should be ignored.  This is what Firefox seems to do.

Currently the parser doesn't seem to handle extraneous data in the start tag too well, because the locatestarttagend_tolerant regex looks for (more or less) well-formed attributes.
Attached a patch for test_htmlparser with the two examples provided by Kevin.

History
Date	User	Action	Args
2011-11-01 13:21:42	ezio.melotti	set	recipients: + ezio.melotti, eric.araujo, r.david.murray, teoryn
2011-11-01 13:21:42	ezio.melotti	set	messageid: <1320153702.19.0.00632884125491.issue12629@psf.upfronthosting.co.za>
2011-11-01 13:21:41	ezio.melotti	link	issue12629 messages
2011-11-01 13:21:41	ezio.melotti	create