This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author jhylton
Recipients
Date 2005-05-12.02:30:55
SpamBayes Score
Marked as misclassified
Message-id
In-reply-to
Content
The HTML spec describes two ways to encode an attribute
value that contains a URI with an ampersand.

http://www.w3.org/TR/REC-html40/appendix/notes.html#h-B.2.2


>>> from HTMLParser import *
>>> class P(HTMLParser):
...   def handle_starttag(self, tag, attrs):
...     print attrs
...
>>> P().feed("<tag attr=\"&\">")
[('attr', '&')]
>>> P().feed("<tag attr=\"&\">")
[('attr', '&')]

It seems that each string should produce the same
parsed value.  I would hazard a guess that the easiest
way to make this happen is to extend the current
unescape() to unescape character references.  Is there
any reason not to do that?  I'll provide a fix if that
sounds like a reasonable answer.
History
Date User Action Args
2008-01-20 09:57:49adminlinkissue1200313 messages
2008-01-20 09:57:49admincreate