classification
Title: small disadvantage of htmlentitydefs
Type: behavior Stage: resolved
Components: Library (Lib) Versions: Python 3.2, Python 3.3, Python 3.4, Python 2.7
process
Status: closed Resolution: not a bug
Dependencies: Superseder:
Assigned To: ezio.melotti Nosy List: WitcherGeralt, ezio.melotti
Priority: normal Keywords:

Created on 2012-12-24 14:39 by WitcherGeralt, last changed 2012-12-25 16:42 by ezio.melotti. This issue is now closed.

Messages (2)
msg178060 - (view) Author: Al Korgun (WitcherGeralt) Date: 2012-12-24 14:39
>>> import htmlentitydefs
>>> htmlentitydefs.name2codepoint.get("quot")  # ok
34
>>> htmlentitydefs.name2codepoint.get("apos", "null")  # ' -> chr(39)
'null'
msg178148 - (view) Author: Ezio Melotti (ezio.melotti) * (Python committer) Date: 2012-12-25 16:42
That's because ' is not a valid character reference in HTML 4, but only in HTML5/XML/XHTML.  A mapping that contains a list of HTML 5 entities has been added from Python 3.3.  Modules like HTMLParser also include ' among the entities while parsing.
History
Date User Action Args
2012-12-25 16:42:03ezio.melottisetstatus: open -> closed
type: behavior
messages: + msg178148

assignee: ezio.melotti
resolution: not a bug
stage: resolved
2012-12-24 15:06:13serhiy.storchakasetnosy: + ezio.melotti

versions: + Python 3.2, Python 3.3, Python 3.4, - Python 2.6
2012-12-24 14:44:50WitcherGeraltsetversions: + Python 2.6
2012-12-24 14:39:58WitcherGeraltcreate