Author jhylton
Date 2005-05-12.02:30:55
The HTML spec describes two ways to encode an attribute
value that contains a URI with an ampersand.

>>> from HTMLParser import *
>>> class P(HTMLParser):
...   def handle_starttag(self, tag, attrs):
...     print attrs
>>> P().feed("<tag attr=\"&\">")
[('attr', '&')]
>>> P().feed("<tag attr=\"&\">")
[('attr', '&')]

It seems that each string should produce the same
parsed value.  I would hazard a guess that the easiest
way to make this happen is to extend the current
unescape() to unescape character references.  Is there
any reason not to do that?  I'll provide a fix if that
sounds like a reasonable answer.
