This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author aaronsw
Recipients
Date 2003-09-09.20:53:13
SpamBayes Score
Marked as misclassified
Message-id
In-reply-to
Content
sgmllib doesn't support the hexadecimal style of character nor 
Unicode characters, both of which are commonly seen on web pages. 
The following replacements fix both problems.

charref = re.compile('&#([0-9a-fA-F]+)[^0-9a-fA-F]')

	def handle_charref(self, ref):
		try:
			if ref[0] == 'x' or ref[0] == 'X': m = 
int(ref[1:], 16)
			else: m = int(ref)
			self.handle_data(unichr(m).encode('utf-8'))
		except ValueError:
			self.unknown_charref(ref)
History
Date User Action Args
2008-01-20 09:56:21adminlinkissue803422 messages
2008-01-20 09:56:21admincreate