Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HTMLParser fails to handle charref in attribute value #41975

Closed
jhylton mannequin opened this issue May 12, 2005 · 5 comments
Closed

HTMLParser fails to handle charref in attribute value #41975

jhylton mannequin opened this issue May 12, 2005 · 5 comments
Assignees
Labels
stdlib Python modules in the Lib dir type-feature A feature request or enhancement

Comments

@jhylton
Copy link
Mannequin

jhylton mannequin commented May 12, 2005

BPO 1200313
Nosy @freddrake, @devdanzin, @ezio-melotti

Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

Show more details

GitHub fields:

assignee = 'https://github.com/ezio-melotti'
closed_at = <Date 2011-11-08.01:11:30.655>
created_at = <Date 2005-05-12.02:30:55.000>
labels = ['type-feature', 'library']
title = 'HTMLParser fails to handle charref in attribute value'
updated_at = <Date 2011-11-14.17:16:53.339>
user = 'https://bugs.python.org/jhylton'

bugs.python.org fields:

activity = <Date 2011-11-14.17:16:53.339>
actor = 'ezio.melotti'
assignee = 'ezio.melotti'
closed = True
closed_date = <Date 2011-11-08.01:11:30.655>
closer = 'ezio.melotti'
components = ['Library (Lib)']
creation = <Date 2005-05-12.02:30:55.000>
creator = 'jhylton'
dependencies = []
files = []
hgrepos = []
issue_num = 1200313
keywords = []
message_count = 5.0
messages = ['60736', '82199', '147268', '147614', '147621']
nosy_count = 5.0
nosy_names = ['jhylton', 'fdrake', 'ajaksu2', 'ezio.melotti', 'python-dev']
pr_nums = []
priority = 'normal'
resolution = 'out of date'
stage = 'resolved'
status = 'closed'
superseder = None
type = 'enhancement'
url = 'https://bugs.python.org/issue1200313'
versions = ['Python 2.7', 'Python 3.2', 'Python 3.3']

@jhylton
Copy link
Mannequin Author

jhylton mannequin commented May 12, 2005

The HTML spec describes two ways to encode an attribute
value that contains a URI with an ampersand.

http://www.w3.org/TR/REC-html40/appendix/notes.html#h-B.2.2

>>> from HTMLParser import *
>>> class P(HTMLParser):
...   def handle_starttag(self, tag, attrs):
...     print attrs
...
>>> P().feed("<tag attr=\"&\">")
[('attr', '&')]
>>> P().feed("<tag attr=\"&\">")
[('attr', '&')]

It seems that each string should produce the same
parsed value. I would hazard a guess that the easiest
way to make this happen is to extend the current
unescape() to unescape character references. Is there
any reason not to do that? I'll provide a fix if that
sounds like a reasonable answer.

@jhylton jhylton mannequin assigned freddrake May 12, 2005
@jhylton jhylton mannequin added the stdlib Python modules in the Lib dir label May 12, 2005
@jhylton jhylton mannequin assigned freddrake May 12, 2005
@jhylton jhylton mannequin added the stdlib Python modules in the Lib dir label May 12, 2005
@devdanzin
Copy link
Mannequin

devdanzin mannequin commented Feb 16, 2009

Maybe the charrefs were lost in the SF -> Roundup transition?

@devdanzin devdanzin mannequin added type-feature A feature request or enhancement labels Feb 16, 2009
@ezio-melotti
Copy link
Member

unescape() already converts named, decimal and hexadecimal entities, so this can be closed.

@python-dev
Copy link
Mannequin

python-dev mannequin commented Nov 14, 2011

New changeset 3c3009f63700 by Ezio Melotti in branch '2.7':
bpo-1745761, bpo-755670, bpo-13357, bpo-12629, bpo-1200313: improve attribute handling in HTMLParser.
http://hg.python.org/cpython/rev/3c3009f63700

New changeset 16ed15ff0d7c by Ezio Melotti in branch '3.2':
bpo-1745761, bpo-755670, bpo-13357, bpo-12629, bpo-1200313: improve attribute handling in HTMLParser.
http://hg.python.org/cpython/rev/16ed15ff0d7c

New changeset 426f7a2b1826 by Ezio Melotti in branch 'default':
bpo-1745761, bpo-755670, bpo-13357, bpo-12629, bpo-1200313: merge with 3.2.
http://hg.python.org/cpython/rev/426f7a2b1826

@ezio-melotti
Copy link
Member

There was actually a bug with entities in unquoted attribute values. I fixed it and added tests for all the cases (quoted and unquoted).

@ezio-melotti ezio-melotti transferred this issue from another repository Apr 9, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
stdlib Python modules in the Lib dir type-feature A feature request or enhancement
Projects
None yet
Development

No branches or pull requests

2 participants