This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author dayveday
Recipients dayveday
Date 2009-08-07.01:25:08
SpamBayes Score 2.8371745e-05
Marked as misclassified No
Message-id <1249608310.39.0.325559183765.issue6662@psf.upfronthosting.co.za>
In-reply-to
Content
When HTMLParser.HTMLParser encounters a malformed charref (for example 
&#bad;) it no longer parsers the following HTML correctly.

For example:
  <p>&#bad;</p>
Recognises the starttag "p" but considers the rest to be data.

To reproduce:
class MyParser(HTMLParser.HTMLParser):
  def handle_starttag(self, tag, attrs):
    print 'Start "%s"' % tag
  def handle_endtag(self,tag):
    print 'End "%s"' % tag
  def handle_charref(self, ref):
    print 'Charref "%s"' % ref
  def handle_data(self, data):
    print 'Data "%s"' % data
parser = MyParser()
parser.feed('<p>&#bad;</p>')
parser.close()

Expected output:
Start "p"
Data "&#bad;"
End "p"

Actual output:
Start "p"
Data "&#bad;</p>"
History
Date User Action Args
2009-08-07 01:25:10dayvedaysetrecipients: + dayveday
2009-08-07 01:25:10dayvedaysetmessageid: <1249608310.39.0.325559183765.issue6662@psf.upfronthosting.co.za>
2009-08-07 01:25:08dayvedaylinkissue6662 messages
2009-08-07 01:25:08dayvedaycreate