This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: HTMLParser fails to handle charref in attribute value
Type: enhancement Stage: resolved
Components: Library (Lib) Versions: Python 3.2, Python 3.3, Python 2.7
process
Status: closed Resolution: out of date
Dependencies: Superseder:
Assigned To: ezio.melotti Nosy List: ajaksu2, ezio.melotti, fdrake, jhylton, python-dev
Priority: normal Keywords:

Created on 2005-05-12 02:30 by jhylton, last changed 2022-04-11 14:56 by admin. This issue is now closed.

Messages (5)
msg60736 - (view) Author: Jeremy Hylton (jhylton) (Python triager) Date: 2005-05-12 02:30
The HTML spec describes two ways to encode an attribute
value that contains a URI with an ampersand.

http://www.w3.org/TR/REC-html40/appendix/notes.html#h-B.2.2


>>> from HTMLParser import *
>>> class P(HTMLParser):
...   def handle_starttag(self, tag, attrs):
...     print attrs
...
>>> P().feed("<tag attr=\"&\">")
[('attr', '&')]
>>> P().feed("<tag attr=\"&\">")
[('attr', '&')]

It seems that each string should produce the same
parsed value.  I would hazard a guess that the easiest
way to make this happen is to extend the current
unescape() to unescape character references.  Is there
any reason not to do that?  I'll provide a fix if that
sounds like a reasonable answer.
msg82199 - (view) Author: Daniel Diniz (ajaksu2) * (Python triager) Date: 2009-02-16 01:01
Maybe the charrefs were lost in the SF -> Roundup transition?
msg147268 - (view) Author: Ezio Melotti (ezio.melotti) * (Python committer) Date: 2011-11-08 01:11
unescape() already converts named, decimal and hexadecimal entities, so this can be closed.
msg147614 - (view) Author: Roundup Robot (python-dev) (Python triager) Date: 2011-11-14 16:57
New changeset 3c3009f63700 by Ezio Melotti in branch '2.7':
#1745761, #755670, #13357, #12629, #1200313: improve attribute handling in HTMLParser.
http://hg.python.org/cpython/rev/3c3009f63700

New changeset 16ed15ff0d7c by Ezio Melotti in branch '3.2':
#1745761, #755670, #13357, #12629, #1200313: improve attribute handling in HTMLParser.
http://hg.python.org/cpython/rev/16ed15ff0d7c

New changeset 426f7a2b1826 by Ezio Melotti in branch 'default':
#1745761, #755670, #13357, #12629, #1200313: merge with 3.2.
http://hg.python.org/cpython/rev/426f7a2b1826
msg147621 - (view) Author: Ezio Melotti (ezio.melotti) * (Python committer) Date: 2011-11-14 17:16
There was actually a bug with entities in unquoted attribute values. I fixed it and added tests for all the cases (quoted and unquoted).
History
Date User Action Args
2022-04-11 14:56:11adminsetgithub: 41975
2011-11-14 17:16:53ezio.melottisetmessages: + msg147621
versions: + Python 2.7, Python 3.3
2011-11-14 16:57:15python-devsetnosy: + python-dev
messages: + msg147614
2011-11-08 01:11:30ezio.melottisetstatus: open -> closed

assignee: fdrake -> ezio.melotti

nosy: + ezio.melotti
messages: + msg147268
resolution: out of date
stage: test needed -> resolved
2010-08-21 14:33:38BreamoreBoysetversions: + Python 3.2, - Python 2.7
2009-02-16 01:01:58ajaksu2setnosy: + ajaksu2
stage: test needed
type: enhancement
messages: + msg82199
versions: + Python 2.7
2005-05-12 02:30:55jhyltoncreate