This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: html.parser.HTMLParser.unescape works only with the first 128 entities
Type: behavior Stage: resolved
Components: Library (Lib) Versions: Python 3.2, Python 3.3
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: ezio.melotti Nosy List: ezio.melotti, peter.otten, python-dev, yves@zioup.com
Priority: normal Keywords: patch

Created on 2011-09-02 21:08 by yves@zioup.com, last changed 2022-04-11 14:57 by admin. This issue is now closed.

Files
File name Uploaded Description Edit
unescape_bug.patch peter.otten, 2011-09-03 10:15 review
Repositories containing patches
http://hg.zioup.org/cpython/
Messages (5)
msg143434 - (view) Author: Yves Dorfsman (yves@zioup.com) Date: 2011-09-02 21:08
html.parser.HTMLParser.unescape works only with the first 128 entities, it leaves the other ones as they are.
msg143457 - (view) Author: Yves Dorfsman (yves@zioup.com) Date: 2011-09-03 08:35
Added a test case:
http://hg.zioup.org/cpython/rev/4accd3181061

If you set the loop < 128 then the test passes (set at 1000 right now).
msg143459 - (view) Author: Peter Otten (peter.otten) * Date: 2011-09-03 10:15
The unescape() method uses re.sub(regex, sub, re.ASCII), but the third argument is count, not flags. Fix is easy: use

re.sub(regex, sub, flags=re.ASCII).
msg143512 - (view) Author: Roundup Robot (python-dev) (Python triager) Date: 2011-09-05 14:16
New changeset 9896fc2a8167 by Ezio Melotti in branch '3.2':
#12888: Fix a bug in HTMLParser.unescape that prevented it to escape more than 128 entities.  Patch by Peter Otten.
http://hg.python.org/cpython/rev/9896fc2a8167

New changeset 7b6096852665 by Ezio Melotti in branch 'default':
#12888: merge with 3.2.
http://hg.python.org/cpython/rev/7b6096852665
msg143513 - (view) Author: Ezio Melotti (ezio.melotti) * (Python committer) Date: 2011-09-05 14:26
Fixed, thanks for the report and the patch!
History
Date User Action Args
2022-04-11 14:57:21adminsetgithub: 57097
2011-09-05 14:26:55ezio.melottisetstatus: open -> closed
versions: + Python 3.3
messages: + msg143513

components: + Library (Lib), - None
resolution: fixed
stage: commit review -> resolved
2011-09-05 14:16:31python-devsetnosy: + python-dev
messages: + msg143512
2011-09-03 12:48:26ezio.melottisetassignee: ezio.melotti

type: behavior
nosy: + ezio.melotti
stage: commit review
2011-09-03 10:15:20peter.ottensetfiles: + unescape_bug.patch

nosy: + peter.otten
messages: + msg143459

keywords: + patch
2011-09-03 08:35:20yves@zioup.comsethgrepos: + hgrepo65
messages: + msg143457
2011-09-02 21:08:40yves@zioup.comcreate