Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HTMLParser improperly handling open tags when strict is False #57482

Closed
ChristopherAllen-Poole mannequin opened this issue Oct 27, 2011 · 5 comments
Closed

HTMLParser improperly handling open tags when strict is False #57482

ChristopherAllen-Poole mannequin opened this issue Oct 27, 2011 · 5 comments
Assignees
Labels
stdlib Python modules in the Lib dir type-bug An unexpected behavior, bug, or error

Comments

@ChristopherAllen-Poole
Copy link
Mannequin

ChristopherAllen-Poole mannequin commented Oct 27, 2011

BPO 13273
Nosy @ezio-melotti
Files
  • issue13273.diff
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = 'https://github.com/ezio-melotti'
    closed_at = <Date 2011-10-28.10:27:48.407>
    created_at = <Date 2011-10-27.07:56:01.477>
    labels = ['type-bug', 'library']
    title = 'HTMLParser improperly handling open tags when strict is False'
    updated_at = <Date 2011-10-28.10:27:48.405>
    user = 'https://bugs.python.org/ChristopherAllen-Poole'

    bugs.python.org fields:

    activity = <Date 2011-10-28.10:27:48.405>
    actor = 'ezio.melotti'
    assignee = 'ezio.melotti'
    closed = True
    closed_date = <Date 2011-10-28.10:27:48.407>
    closer = 'ezio.melotti'
    components = ['Library (Lib)']
    creation = <Date 2011-10-27.07:56:01.477>
    creator = 'Christopher.Allen-Poole'
    dependencies = []
    files = ['23535']
    hgrepos = []
    issue_num = 13273
    keywords = ['patch']
    message_count = 5.0
    messages = ['146479', '146481', '146490', '146550', '146552']
    nosy_count = 3.0
    nosy_names = ['ezio.melotti', 'python-dev', 'Christopher.Allen-Poole']
    pr_nums = []
    priority = 'normal'
    resolution = 'fixed'
    stage = 'resolved'
    status = 'closed'
    superseder = None
    type = 'behavior'
    url = 'https://bugs.python.org/issue13273'
    versions = ['Python 3.2', 'Python 3.3']

    @ChristopherAllen-Poole
    Copy link
    Mannequin Author

    ChristopherAllen-Poole mannequin commented Oct 27, 2011

    This is is encountered when extending html.parser.HTMLParser and running with strict mode False.

    Expected behavior:
    When '''<div style="" ><b>The <a href="some_url">rain</a> <br /> in <span>Spain</span></b></div>''' is passed to the feed method, div, b, a, br, and span should all be passed to the handle_starttag method.

    Actual behavior
    The handle_data method receives the values <div style="" >,<b>,<a href="some_url">,<br />,<span> in addition to the regular text.

    This can be fixed by changing this (inside the parse_starttag method):

    m = hparse.attrfind_tolerant.search(rawdata, k)

    to

    m = hparse.attrfind_tolerant.match(rawdata, k)

    @ChristopherAllen-Poole ChristopherAllen-Poole mannequin added stdlib Python modules in the Lib dir type-bug An unexpected behavior, bug, or error labels Oct 27, 2011
    @ezio-melotti
    Copy link
    Member

    Incidentally I was just investigating this very same issue, and your suggestion seems to work for me too.
    I'll see if the change has any downside and come up with a patch + test.
    Thanks for the report!

    @ezio-melotti ezio-melotti self-assigned this Oct 27, 2011
    @ezio-melotti
    Copy link
    Member

    The attached patch fixes replaces search with match as you suggested and tweaks a regex to make the old tests pass.

    @python-dev
    Copy link
    Mannequin

    python-dev mannequin commented Oct 28, 2011

    New changeset 41d41776aa6d by Ezio Melotti in branch '3.2':
    bpo-13273: fix a bug that prevented HTMLParser to properly detect some tags when strict=False.
    http://hg.python.org/cpython/rev/41d41776aa6d

    New changeset b194117f176c by Ezio Melotti in branch 'default':
    bpo-13273: merge with 3.2.
    http://hg.python.org/cpython/rev/b194117f176c

    @ezio-melotti
    Copy link
    Member

    Fixed, thanks a lot for the report!

    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    stdlib Python modules in the Lib dir type-bug An unexpected behavior, bug, or error
    Projects
    None yet
    Development

    No branches or pull requests

    1 participant