Author Brian.Jones
Recipients Brian.Jones
Date 2011-02-04.03:43:53
SpamBayes Score 0.00292563
Marked as misclassified No
Message-id <>
In Python 3.2b2, html.entities.codepoint2name and name2codepoint only support the 252 HTML entity names defined in the HTML 4 spec from 1997. I'm wondering if there's a reason not to support W3C Recommendation 'XML Entity Definitions for Characters'

This standard contains significantly more characters, and it is noted in that spec that the HTML 5 drafts use that spec's entities. You can see the current HTML 5 'Named character references' here:

If this is just a matter of somebody going in to do the grunt work, let me know. 

If startup costs associated with importing a huge dictionary are a concern, perhaps a more efficient type that enables the same lookup interface can be defined. 

If other reasons exist to not move in this direction, please do let me know!
Date User Action Args
2011-02-04 03:43:55Brian.Jonessetrecipients: + Brian.Jones
2011-02-04 03:43:55Brian.Jonessetmessageid: <>
2011-02-04 03:43:54Brian.Joneslinkissue11113 messages
2011-02-04 03:43:53Brian.Jonescreate