Message208336
Python 2.7 HTMLParse.py lines 185-199 (similar lines still exist in Python 3.4)
match = charref.match(rawdata, i)
if match:
...
else:
if ";" in rawdata[i:]: #bail by consuming &#
self.handle_data(rawdata[0:2])
i = self.updatepos(i, 2)
break
if you feed a broken charref, that is non-numeric, it will pass whatever random string that happened to be at the start of rawdata to handle_data(). Eg:
p = HTMLParser()
p.handle_data = lambda x: sys.stdout.write(x)
p.feed('<p>&#foo;</p>')
will print '<p' which is clearly wrong. I think the intention of the code is to pass '&#', which seems saner. |
|
Date |
User |
Action |
Args |
2014-01-17 14:06:13 | iko | set | recipients:
+ iko |
2014-01-17 14:06:13 | iko | set | messageid: <1389967573.45.0.115549710544.issue20288@psf.upfronthosting.co.za> |
2014-01-17 14:06:13 | iko | link | issue20288 messages |
2014-01-17 14:06:13 | iko | create | |
|