classification
Title: HTML5 named character references not consistent
Type: Stage: needs patch
Components: Documentation, Library (Lib) Versions: Python 3.9, Python 3.8, Python 3.7
process
Status: open Resolution: not a bug
Dependencies: Superseder:
Assigned To: docs@python Nosy List: docs@python, ezio.melotti, mikeraider, terry.reedy
Priority: normal Keywords:

Created on 2019-11-08 16:09 by mikeraider, last changed 2019-11-09 02:17 by terry.reedy.

Messages (2)
msg356246 - (view) Author: Mike Raider (mikeraider) Date: 2019-11-08 16:09
In the file 
cpython/blob/master/Lib/html/entities.py

the HTML5 named character references (line 264) do not look consistent.

Some references have a semicolon at the end, some not, and some have both variants.

Is there a reason for this?
msg356282 - (view) Author: Terry J. Reedy (terry.reedy) * (Python committer) Date: 2019-11-09 02:17
Questions should usually be asked on python-list or elsewhere.

To answer, html5 was created from
https://html.spec.whatwg.org/multipage/named-characters.html
with these issues and patches.
#11113 dc44f55cc9dc1d016799362c344958baab328ff4
       518dbfd7b5a4614b095befc62d1abf1588c7c14a
#16245 e6e96eea5157650be77306b15b28bc815e14c2f3

The peculiarities in the dict keys reflect peculiarities in the standard. For instance, msg163706 of #11113 says "the standard allows some charref to end without a ';', but not all of them."

I am leaving this open to add a link to the source file both in entities.py and the doc.  It shows examples of the entities.  A new one for me is smashp; 	U+02A33 	⨳.
History
Date User Action Args
2019-11-09 02:17:40terry.reedysetassignee: docs@python
components: + Documentation
versions: + Python 3.7, Python 3.8, Python 3.9
nosy: + terry.reedy, docs@python

messages: + msg356282
resolution: not a bug
stage: needs patch
2019-11-09 01:13:20terry.reedysetnosy: + ezio.melotti
2019-11-08 16:09:16mikeraidercreate