Message 60736 - Python tracker

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author	jhylton
Recipients
Date	2005-05-12.02:30:55
SpamBayes Score
Marked as misclassified
Message-id
In-reply-to

Content
The HTML spec describes two ways to encode an attribute value that contains a URI with an ampersand. http://www.w3.org/TR/REC-html40/appendix/notes.html#h-B.2.2 >>> from HTMLParser import * >>> class P(HTMLParser): ... def handle_starttag(self, tag, attrs): ... print attrs ... >>> P().feed("<tag attr=\"&\">") [('attr', '&')] >>> P().feed("<tag attr=\"&\">") [('attr', '&')] It seems that each string should produce the same parsed value. I would hazard a guess that the easiest way to make this happen is to extend the current unescape() to unescape character references. Is there any reason not to do that? I'll provide a fix if that sounds like a reasonable answer.

The HTML spec describes two ways to encode an attribute
value that contains a URI with an ampersand.

http://www.w3.org/TR/REC-html40/appendix/notes.html#h-B.2.2


>>> from HTMLParser import *
>>> class P(HTMLParser):
...   def handle_starttag(self, tag, attrs):
...     print attrs
...
>>> P().feed("<tag attr=\"&\">")
[('attr', '&')]
>>> P().feed("<tag attr=\"&\">")
[('attr', '&')]

It seems that each string should produce the same
parsed value.  I would hazard a guess that the easiest
way to make this happen is to extend the current
unescape() to unescape character references.  Is there
any reason not to do that?  I'll provide a fix if that
sounds like a reasonable answer.

History
Date	User	Action	Args
2008-01-20 09:57:49	admin	link	issue1200313 messages
2008-01-20 09:57:49	admin	create