Title: HTMLParser improperly handling open tags when strict is False
msg146479 - (view) Author: Christopher Allen-Poole (Christopher.Allen-Poole) Date: 2011-10-27 07:56
This is is encountered when extending html.parser.HTMLParser and running with strict mode False.

Expected behavior:
When '''<div style=""    ><b>The <a href="some_url">rain</a> <br /> in <span>Spain</span></b></div>''' is passed to the feed method, div, b, a, br, and span should all be passed to the handle_starttag method.

Actual behavior
The handle_data method receives the values <div style=""    >,<b>,<a href="some_url">,<br />,<span> in addition to the regular text.

This can be fixed by changing this (inside the parse_starttag method):

m =, k)


m = hparse.attrfind_tolerant.match(rawdata, k)
msg146481 - (view) Author: Ezio Melotti (ezio.melotti) * (Python committer) Date: 2011-10-27 08:31
Incidentally I was just investigating this very same issue, and your suggestion seems to work for me too.
I'll see if the change has any downside and come up with a patch + test.
Thanks for the report!
msg146490 - (view) Author: Ezio Melotti (ezio.melotti) * (Python committer) Date: 2011-10-27 14:49
The attached patch fixes replaces search with match as you suggested and tweaks a regex to make the old tests pass.
msg146550 - (view) Author: Roundup Robot (python-dev) (Python triager) Date: 2011-10-28 10:24
New changeset 41d41776aa6d by Ezio Melotti in branch '3.2':
#13273: fix a bug that prevented HTMLParser to properly detect some tags when strict=False.

New changeset b194117f176c by Ezio Melotti in branch 'default':
#13273: merge with 3.2.
msg146552 - (view) Author: Ezio Melotti (ezio.melotti) * (Python committer) Date: 2011-10-28 10:27
Fixed, thanks a lot for the report!
