classification
Title: Update html.entities.html5 dictionary and parseentities.py
Type: behavior Stage: committed/rejected
Components: Library (Lib) Versions: Python 3.4
process
Status: open Resolution:
Dependencies: Superseder:
Assigned To: ezio.melotti Nosy List: Ramchandra Apte, eric.araujo, ezio.melotti, georg.brandl, iuliia.proskurnia, kushaldas, larry, python-dev
Priority: release blocker Keywords: patch

Created on 2012-10-16 09:41 by ezio.melotti, last changed 2013-02-20 14:19 by Ramchandra Apte.

Files
File name Uploaded Description Edit
issue16245.diff ezio.melotti, 2012-10-16 11:51 New Tools/scripts/parse_html5_entities.py review
issue16245-2.diff ezio.melotti, 2012-10-23 11:26 review
issue16245-3.diff iuliia.proskurnia, 2012-10-23 12:11 review
Messages (9)
msg173021 - (view) Author: Ezio Melotti (ezio.melotti) * (Python committer) Date: 2012-10-16 09:41
A JSON file containing all the HTML5 entities is now available at http://dev.w3.org/html5/spec/entities.json.
I tested from the interpreter to see if it matches the values in html.entities.html5 and there are a dozen of entities that need to be updated:

>>> s = json.load(open('entities.json'))
>>> from html.entities import html5
>>> for (k1,i1),(k2,i2) in zip(sorted(s.items()), sorted(html5.items())):
...   if i1['characters'] != i2: (k1, k2, i1['characters'], i2, i1['codepoints'], list(map(ord, i2)))
... 
('⃜', 'DotDot;', '⃜', '◌⃜', [8412], [9676, 8412])
('̑', 'DownBreve;', '̑', '◌̑', [785], [9676, 785])
('⟨', 'LeftAngleBracket;', '⟨', '〈', [10216], [9001])
('
', 'NewLine;', '\n', '␊', [10], [9226])
('⟩', 'RightAngleBracket;', '⟩', '〉', [10217], [9002])
('	', 'Tab;', '\t', '␉', [9], [9225])
('⃛', 'TripleDot;', '⃛', '◌⃛', [8411], [9676, 8411])
('⟨', 'lang;', '⟨', '〈', [10216], [9001])
('⟨', 'langle;', '⟨', '〈', [10216], [9001])
('⟩', 'rang;', '⟩', '〉', [10217], [9002])
('⟩', 'rangle;', '⟩', '〉', [10217], [9002])
('⃛', 'tdot;', '⃛', '◌⃛', [8411], [9676, 8411])

The Tools/scripts/parseentities.py script should also be updated (or possibly a new, separate script should be added), so it can be used to generate the html5 dict.  I'm setting this as release blocker so that the update gets done before the release (other values might change in the meanwhile).
msg173345 - (view) Author: Éric Araujo (eric.araujo) * (Python committer) Date: 2012-10-19 16:22
I say replace the code.  HTML 4.01 won’t be updated.
msg173589 - (view) Author: Ezio Melotti (ezio.melotti) * (Python committer) Date: 2012-10-23 10:34
I think it's ok to have a separate file rather than patching the existing one (see attached patch).  If the old script is not used anymore it could be removed, otherwise we could just leave it there.
msg173601 - (view) Author: Iuliia Proskurnia (iuliia.proskurnia) Date: 2012-10-23 12:11
Version with --patch to modify Lib/html/entities.py automatically
msg173618 - (view) Author: Roundup Robot (python-dev) Date: 2012-10-23 13:46
New changeset dd8b969d7459 by Ezio Melotti in branch 'default':
#16245: add a script to generate the html.entities.html5 dict.
http://hg.python.org/cpython/rev/dd8b969d7459
msg173619 - (view) Author: Roundup Robot (python-dev) Date: 2012-10-23 13:54
New changeset 1eb1c6942ac8 by Ezio Melotti in branch '3.3':
#16245: Fix the value of a few entities in html.entities.html5.
http://hg.python.org/cpython/rev/1eb1c6942ac8

New changeset 70fab10cd542 by Ezio Melotti in branch 'default':
#16245: merge with 3.3.
http://hg.python.org/cpython/rev/70fab10cd542
msg173629 - (view) Author: Roundup Robot (python-dev) Date: 2012-10-23 18:14
New changeset fb80df16c4ff by Ezio Melotti in branch 'default':
Add Misc/NEWS entry for dd8b969d7459/#16245.
http://hg.python.org/cpython/rev/fb80df16c4ff
msg173631 - (view) Author: Ezio Melotti (ezio.melotti) * (Python committer) Date: 2012-10-23 18:22
I now committed an improved version of the script (thanks Iuliia!) and updated the html.entities.html5 dictionary accordingly.

I'm leaving this open because we will have to check if the dictionary is still updated before the release of Python 3.4.
msg182506 - (view) Author: Ramchandra Apte (Ramchandra Apte) * Date: 2013-02-20 14:19
Shouldn't this be deferred blocker?
History
Date User Action Args
2013-02-20 14:19:58Ramchandra Aptesetnosy: + Ramchandra Apte
messages: + msg182506
2013-02-10 18:27:42pitrousetversions: - Python 3.3
2012-11-01 16:28:07serhiy.storchakasetnosy: - serhiy.storchaka
2012-10-23 18:22:35ezio.melottisetmessages: + msg173631
components: + Library (Lib)
stage: patch review -> committed/rejected
2012-10-23 18:14:54python-devsetmessages: + msg173629
2012-10-23 13:54:36python-devsetmessages: + msg173619
2012-10-23 13:46:42python-devsetnosy: + python-dev
messages: + msg173618
2012-10-23 12:11:48iuliia.proskurniasetfiles: + issue16245-3.diff
nosy: + iuliia.proskurnia
messages: + msg173601

2012-10-23 11:26:01ezio.melottisetfiles: + issue16245-2.diff
2012-10-23 10:34:43ezio.melottisetmessages: + msg173589
stage: needs patch -> patch review
2012-10-19 16:22:10eric.araujosetnosy: + larry, eric.araujo, georg.brandl
messages: + msg173345
2012-10-16 12:00:57serhiy.storchakasetnosy: + serhiy.storchaka
2012-10-16 11:51:41ezio.melottisetfiles: + issue16245.diff
keywords: + patch
2012-10-16 09:57:06kushaldassetnosy: + kushaldas
2012-10-16 09:41:51ezio.melotticreate