Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

html.HTMLParser raises UnboundLocalError: #62002

Closed
bmispelon mannequin opened this issue Apr 20, 2013 · 6 comments
Closed

html.HTMLParser raises UnboundLocalError: #62002

bmispelon mannequin opened this issue Apr 20, 2013 · 6 comments
Assignees
Labels
easy stdlib Python modules in the Lib dir type-bug An unexpected behavior, bug, or error

Comments

@bmispelon
Copy link
Mannequin

bmispelon mannequin commented Apr 20, 2013

BPO 17802
Nosy @ezio-melotti, @bitdancer, @bmispelon
Files
  • issue17802-unittest.patch: Patch for unit tests to reproduce issue 17802
  • issue17802.diff
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = 'https://github.com/ezio-melotti'
    closed_at = <Date 2013-05-01.13:25:05.887>
    created_at = <Date 2013-04-20.10:58:16.423>
    labels = ['easy', 'type-bug', 'library']
    title = 'html.HTMLParser raises UnboundLocalError:'
    updated_at = <Date 2013-05-01.13:25:05.886>
    user = 'https://github.com/bmispelon'

    bugs.python.org fields:

    activity = <Date 2013-05-01.13:25:05.886>
    actor = 'ezio.melotti'
    assignee = 'ezio.melotti'
    closed = True
    closed_date = <Date 2013-05-01.13:25:05.887>
    closer = 'ezio.melotti'
    components = ['Library (Lib)']
    creation = <Date 2013-04-20.10:58:16.423>
    creator = 'bmispelon'
    dependencies = []
    files = ['29979', '29986']
    hgrepos = []
    issue_num = 17802
    keywords = ['patch', 'easy']
    message_count = 6.0
    messages = ['187414', '187416', '187582', '187608', '188222', '188224']
    nosy_count = 5.0
    nosy_names = ['ezio.melotti', 'r.david.murray', 'python-dev', 'bmispelon', 'Thomas.Barlow']
    pr_nums = []
    priority = 'normal'
    resolution = 'fixed'
    stage = 'resolved'
    status = 'closed'
    superseder = None
    type = 'behavior'
    url = 'https://bugs.python.org/issue17802'
    versions = ['Python 3.3', 'Python 3.4']

    @bmispelon
    Copy link
    Mannequin Author

    bmispelon mannequin commented Apr 20, 2013

    When trying to parse the string a&b, the parser raises an UnboundLocalError:

    {{{
    >>> from html.parser import HTMLParser
    >>> p = HTMLParser()
    >>> p.feed('a&b')
    >>> p.close()
    Traceback (most recent call last):
      File "<stdin>", line 1, in <module>
      File "/usr/lib/python3.3/html/parser.py", line 149, in close
        self.goahead(1)
      File "/usr/lib/python3.3/html/parser.py", line 252, in goahead
        if k <= i:
    UnboundLocalError: local variable 'k' referenced before assignment
    }}}

    Granted, the HTML is invalid, but this error looks like it might have been an oversight.

    @bmispelon bmispelon mannequin added type-crash A hard crash of the interpreter, possibly with a core dump stdlib Python modules in the Lib dir labels Apr 20, 2013
    @bitdancer
    Copy link
    Member

    Thanks for the report. Yes, that's in a complicated bit of error recovery code, and clearly you found a path through it that doesn't have a corresponding test :)

    @bitdancer bitdancer added easy type-bug An unexpected behavior, bug, or error and removed type-crash A hard crash of the interpreter, possibly with a core dump labels Apr 20, 2013
    @ezio-melotti ezio-melotti self-assigned this Apr 20, 2013
    @ThomasBarlow
    Copy link
    Mannequin

    ThomasBarlow mannequin commented Apr 22, 2013

    Just adding a patch here with a few unit tests to demonstrate the issue, comments here are welcome. This is my first patch, I believe I have put the tests in the correct place.

    It appears the problem only occurs if there is an incomplete XML entity where a sequence of valid characters (for an XML entity's name) lead to the end-of-file.

    The test case for "a&b " passes, as it detects the space as an illegal character for the entity name.

    @ezio-melotti
    Copy link
    Member

    Thanks for the patch Thomas!
    Starting from your work I made an updated patch that fixes the bug, but at the same time the tests revealed another possible issue.
    In case of invalid character references, HTMLParser still calls handle_entityref instead of reporting them as 'data'. Not sure what the preferable behavior should be though, but anyway this is a separate issue.

    @python-dev
    Copy link
    Mannequin

    python-dev mannequin commented May 1, 2013

    New changeset 9cb90c1a1a46 by Ezio Melotti in branch '3.3':
    bpo-17802: Fix an UnboundLocalError in html.parser. Initial tests by Thomas Barlow.
    http://hg.python.org/cpython/rev/9cb90c1a1a46

    New changeset 20be90a3a714 by Ezio Melotti in branch 'default':
    bpo-17802: merge with 3.3.
    http://hg.python.org/cpython/rev/20be90a3a714

    @ezio-melotti
    Copy link
    Member

    Fixed, thanks for the report!

    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    easy stdlib Python modules in the Lib dir type-bug An unexpected behavior, bug, or error
    Projects
    None yet
    Development

    No branches or pull requests

    2 participants