This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: Something wrong with html.unescape()
Type: behavior Stage: resolved
Components: Library (Lib) Versions: Python 3.8
process
Status: closed Resolution: not a bug
Dependencies: Superseder:
Assigned To: Nosy List: christian.heimes, corona10, ezio.melotti, hongweipeng, serhiy.storchaka, Валентин Dreyk
Priority: normal Keywords:

Created on 2020-06-05 16:07 by Валентин Dreyk, last changed 2022-04-11 14:59 by admin. This issue is now closed.

Messages (3)
msg370765 - (view) Author: Валентин Dreyk (Валентин Dreyk) Date: 2020-06-05 16:07
import html
import xml.sax.saxutils as saxutils

print(saxutils.unescape("&reghard"))  # &reghard
print(html.unescape("&reghard"))  # ®hard



html.unescape() replace "&reg" to "®" even without ";" at the end.
msg371847 - (view) Author: Christian Heimes (christian.heimes) * (Python committer) Date: 2020-06-19 07:33
According to https://en.wikipedia.org/wiki/List_of_XML_and_HTML_character_entity_references#cite_ref-semicolon_1-64 the trailing semicolon can be omitted for the named entity "reg". That means "&reg" and "®" are equivalent.

saxutils.unescape() only handles '<', '>', and '&' by default. You have to pass in a dictionary to unescape other entities.
msg371849 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2020-06-19 08:04
Concur with Christian. It works as designed, in accordance to the standard.
History
Date User Action Args
2022-04-11 14:59:32adminsetgithub: 85050
2020-06-19 08:07:02ezio.melottisetnosy: + ezio.melotti
2020-06-19 08:04:47serhiy.storchakasetstatus: open -> closed

nosy: + serhiy.storchaka
messages: + msg371849

resolution: not a bug
stage: resolved
2020-06-19 07:33:55christian.heimessetnosy: + christian.heimes
messages: + msg371847
2020-06-19 06:34:22hongweipengsetnosy: + hongweipeng
2020-06-09 14:57:55corona10setstage: needs patch -> (no value)
versions: - Python 3.9, Python 3.10
2020-06-09 14:51:50corona10setversions: + Python 3.9, Python 3.10
2020-06-09 14:51:43corona10setstage: needs patch
2020-06-09 14:46:15corona10setnosy: + corona10
2020-06-05 16:07:08Валентин Dreykcreate