Message97177
HTMLParser should only handle entity references that are terminated with a semicolon. I know that the semicolon can be omitted in some cases (http://www.w3.org/TR/html4/charset.html#h-5.3) and that some browsers are more tolerant, but the following example causes some odd output:
>>> import HTMLParser
>>> class EntityrefParser(HTMLParser.HTMLParser):
... def handle_data(self, data):
... print "handle_data '%s'" % data
... def handle_entityref(self, name):
... print "handle_entityref '%s'" % name
...
>>> p = EntityrefParser()
>>> p.feed("<p>spam&eggs are delicious</p>")
Expected Result:
handle_data 'spam&eggs are delicious'
Actual Result:
handle_data 'spam'
handle_entityref 'eggs'
handle_data ' are delicious' |
|
Date |
User |
Action |
Args |
2010-01-03 20:13:30 | stefan.schweizer | set | recipients:
+ stefan.schweizer |
2010-01-03 20:13:29 | stefan.schweizer | set | messageid: <1262549609.89.0.245956516405.issue7626@psf.upfronthosting.co.za> |
2010-01-03 20:13:28 | stefan.schweizer | link | issue7626 messages |
2010-01-03 20:13:28 | stefan.schweizer | create | |
|