This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author smroid
Recipients
Date 2003-06-17.03:09:17
SpamBayes Score
Marked as misclassified
Message-id
In-reply-to
Content
HTML examples seen in the wild that cause parse errors
in HTMLParser include:

<a width="100%"cellspacing=0>
  -- note lack of space between val and next attr name

<a foo=>
  -- trailing attribute has no value after =

<a href=javascript:popup('/popup/html.html')>
  -- javascript fragment with embedded quotes

My patch contains improvements to the 'attrfind' and
'locatestarttagend' regexps that allow these examples
to parse.

The existing test_htmlparser.py unit test continues to
pass, except for the one test case where it considers
<a foo=> to be an error.

I commented out that case and added new test cases to
cover the examples above.
History
Date User Action Args
2007-08-23 15:27:46adminlinkissue755670 messages
2007-08-23 15:27:46admincreate