Message60736
The HTML spec describes two ways to encode an attribute
value that contains a URI with an ampersand.
http://www.w3.org/TR/REC-html40/appendix/notes.html#h-B.2.2
>>> from HTMLParser import *
>>> class P(HTMLParser):
... def handle_starttag(self, tag, attrs):
... print attrs
...
>>> P().feed("<tag attr=\"&\">")
[('attr', '&')]
>>> P().feed("<tag attr=\"&\">")
[('attr', '&')]
It seems that each string should produce the same
parsed value. I would hazard a guess that the easiest
way to make this happen is to extend the current
unescape() to unescape character references. Is there
any reason not to do that? I'll provide a fix if that
sounds like a reasonable answer.
|
|
Date |
User |
Action |
Args |
2008-01-20 09:57:49 | admin | link | issue1200313 messages |
2008-01-20 09:57:49 | admin | create | |
|