Message60380
sgmllib doesn't support the hexadecimal style of character nor
Unicode characters, both of which are commonly seen on web pages.
The following replacements fix both problems.
charref = re.compile('&#([0-9a-fA-F]+)[^0-9a-fA-F]')
def handle_charref(self, ref):
try:
if ref[0] == 'x' or ref[0] == 'X': m =
int(ref[1:], 16)
else: m = int(ref)
self.handle_data(unichr(m).encode('utf-8'))
except ValueError:
self.unknown_charref(ref)
|
|
Date |
User |
Action |
Args |
2008-01-20 09:56:21 | admin | link | issue803422 messages |
2008-01-20 09:56:21 | admin | create | |
|