Message 97177 - Python tracker

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author	stefan.schweizer
Recipients	stefan.schweizer
Date	2010-01-03.20:13:28
SpamBayes Score	0.06409736
Marked as misclassified	No
Message-id	<1262549609.89.0.245956516405.issue7626@psf.upfronthosting.co.za>
In-reply-to

Content
HTMLParser should only handle entity references that are terminated with a semicolon. I know that the semicolon can be omitted in some cases (http://www.w3.org/TR/html4/charset.html#h-5.3) and that some browsers are more tolerant, but the following example causes some odd output: >>> import HTMLParser >>> class EntityrefParser(HTMLParser.HTMLParser): ... def handle_data(self, data): ... print "handle_data '%s'" % data ... def handle_entityref(self, name): ... print "handle_entityref '%s'" % name ... >>> p = EntityrefParser() >>> p.feed("<p>spam&eggs are delicious</p>") Expected Result: handle_data 'spam&eggs are delicious' Actual Result: handle_data 'spam' handle_entityref 'eggs' handle_data ' are delicious'

HTMLParser should only handle entity references that are terminated with a semicolon. I know that the semicolon can be omitted in some cases (http://www.w3.org/TR/html4/charset.html#h-5.3) and that some browsers are more tolerant, but the following example causes some odd output:

>>> import HTMLParser
>>> class EntityrefParser(HTMLParser.HTMLParser):
...     def handle_data(self, data):
...         print "handle_data '%s'" % data
...     def handle_entityref(self, name):
...         print "handle_entityref '%s'" % name
... 
>>> p = EntityrefParser()
>>> p.feed("<p>spam&eggs are delicious</p>")

Expected Result:
handle_data 'spam&eggs are delicious'

Actual Result:
handle_data 'spam'
handle_entityref 'eggs'
handle_data ' are delicious'

History
Date	User	Action	Args
2010-01-03 20:13:30	stefan.schweizer	set	recipients: + stefan.schweizer
2010-01-03 20:13:29	stefan.schweizer	set	messageid: <1262549609.89.0.245956516405.issue7626@psf.upfronthosting.co.za>
2010-01-03 20:13:28	stefan.schweizer	link	issue7626 messages
2010-01-03 20:13:28	stefan.schweizer	create