Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

html.parser.HTMLParser.unescape works only with the first 128 entities #57097

Closed
yveszioupcom mannequin opened this issue Sep 2, 2011 · 5 comments
Closed

html.parser.HTMLParser.unescape works only with the first 128 entities #57097

yveszioupcom mannequin opened this issue Sep 2, 2011 · 5 comments
Assignees
Labels
stdlib Python modules in the Lib dir type-bug An unexpected behavior, bug, or error

Comments

@yveszioupcom
Copy link
Mannequin

yveszioupcom mannequin commented Sep 2, 2011

BPO 12888
Nosy @ezio-melotti
Files
  • unescape_bug.patch
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = 'https://github.com/ezio-melotti'
    closed_at = <Date 2011-09-05.14:26:55.245>
    created_at = <Date 2011-09-02.21:08:40.271>
    labels = ['type-bug', 'library']
    title = 'html.parser.HTMLParser.unescape works only with the first 128 entities'
    updated_at = <Date 2011-09-05.14:26:55.242>
    user = 'https://bugs.python.org/yveszioupcom'

    bugs.python.org fields:

    activity = <Date 2011-09-05.14:26:55.242>
    actor = 'ezio.melotti'
    assignee = 'ezio.melotti'
    closed = True
    closed_date = <Date 2011-09-05.14:26:55.245>
    closer = 'ezio.melotti'
    components = ['Library (Lib)']
    creation = <Date 2011-09-02.21:08:40.271>
    creator = 'yves@zioup.com'
    dependencies = []
    files = ['23092']
    hgrepos = ['65']
    issue_num = 12888
    keywords = ['patch']
    message_count = 5.0
    messages = ['143434', '143457', '143459', '143512', '143513']
    nosy_count = 4.0
    nosy_names = ['peter.otten', 'ezio.melotti', 'yves@zioup.com', 'python-dev']
    pr_nums = []
    priority = 'normal'
    resolution = 'fixed'
    stage = 'resolved'
    status = 'closed'
    superseder = None
    type = 'behavior'
    url = 'https://bugs.python.org/issue12888'
    versions = ['Python 3.2', 'Python 3.3']

    @yveszioupcom
    Copy link
    Mannequin Author

    yveszioupcom mannequin commented Sep 2, 2011

    html.parser.HTMLParser.unescape works only with the first 128 entities, it leaves the other ones as they are.

    @yveszioupcom
    Copy link
    Mannequin Author

    yveszioupcom mannequin commented Sep 3, 2011

    Added a test case:
    http://hg.zioup.org/cpython/rev/4accd3181061

    If you set the loop < 128 then the test passes (set at 1000 right now).

    @peterotten
    Copy link
    Mannequin

    peterotten mannequin commented Sep 3, 2011

    The unescape() method uses re.sub(regex, sub, re.ASCII), but the third argument is count, not flags. Fix is easy: use

    re.sub(regex, sub, flags=re.ASCII).

    @ezio-melotti ezio-melotti self-assigned this Sep 3, 2011
    @ezio-melotti ezio-melotti added the type-bug An unexpected behavior, bug, or error label Sep 3, 2011
    @python-dev
    Copy link
    Mannequin

    python-dev mannequin commented Sep 5, 2011

    New changeset 9896fc2a8167 by Ezio Melotti in branch '3.2':
    bpo-12888: Fix a bug in HTMLParser.unescape that prevented it to escape more than 128 entities. Patch by Peter Otten.
    http://hg.python.org/cpython/rev/9896fc2a8167

    New changeset 7b6096852665 by Ezio Melotti in branch 'default':
    bpo-12888: merge with 3.2.
    http://hg.python.org/cpython/rev/7b6096852665

    @ezio-melotti
    Copy link
    Member

    Fixed, thanks for the report and the patch!

    @ezio-melotti ezio-melotti added the stdlib Python modules in the Lib dir label Sep 5, 2011
    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    stdlib Python modules in the Lib dir type-bug An unexpected behavior, bug, or error
    Projects
    None yet
    Development

    No branches or pull requests

    1 participant