Message 91392 - Python tracker

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author	dayveday
Recipients	dayveday
Date	2009-08-07.01:25:08
SpamBayes Score	2.8371745e-05
Marked as misclassified	No
Message-id	<1249608310.39.0.325559183765.issue6662@psf.upfronthosting.co.za>
In-reply-to

Content
When HTMLParser.HTMLParser encounters a malformed charref (for example &#bad;) it no longer parsers the following HTML correctly. For example: <p>&#bad;</p> Recognises the starttag "p" but considers the rest to be data. To reproduce: class MyParser(HTMLParser.HTMLParser): def handle_starttag(self, tag, attrs): print 'Start "%s"' % tag def handle_endtag(self,tag): print 'End "%s"' % tag def handle_charref(self, ref): print 'Charref "%s"' % ref def handle_data(self, data): print 'Data "%s"' % data parser = MyParser() parser.feed('<p>&#bad;</p>') parser.close() Expected output: Start "p" Data "&#bad;" End "p" Actual output: Start "p" Data "&#bad;</p>"

When HTMLParser.HTMLParser encounters a malformed charref (for example 
&#bad;) it no longer parsers the following HTML correctly.

For example:
  <p>&#bad;</p>
Recognises the starttag "p" but considers the rest to be data.

To reproduce:
class MyParser(HTMLParser.HTMLParser):
  def handle_starttag(self, tag, attrs):
    print 'Start "%s"' % tag
  def handle_endtag(self,tag):
    print 'End "%s"' % tag
  def handle_charref(self, ref):
    print 'Charref "%s"' % ref
  def handle_data(self, data):
    print 'Data "%s"' % data
parser = MyParser()
parser.feed('<p>&#bad;</p>')
parser.close()

Expected output:
Start "p"
Data "&#bad;"
End "p"

Actual output:
Start "p"
Data "&#bad;</p>"

History
Date	User	Action	Args
2009-08-07 01:25:10	dayveday	set	recipients: + dayveday
2009-08-07 01:25:10	dayveday	set	messageid: <1249608310.39.0.325559183765.issue6662@psf.upfronthosting.co.za>
2009-08-07 01:25:08	dayveday	link	issue6662 messages
2009-08-07 01:25:08	dayveday	create