Message 67102 - Python tracker

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author	thomaspinckney3
Recipients	thomaspinckney3
Date	2008-05-20.05:43:48
SpamBayes Score	0.0009664701
Marked as misclassified	No
Message-id	<1211262235.18.0.705526453605.issue2927@psf.upfronthosting.co.za>
In-reply-to

Content
There is currently a private method inside of html.parser.HTMLParser to unescape HTML &...; style escapes. This would be useful to expose for other users who want to unescape a piece of HTML. Additionally, many websites don't use proper unicode or iso-8859-1 encodings and accidentally use Microsoft Code Page 1252 extensions. I added code to map these to their appropriate unicode values. The unescaping logic was slightly simplified too. This is my first Python patch submission, so please let me know if I've done anything wrong. A new test case was also added for this functionality.

There is currently a private method inside of html.parser.HTMLParser to 
unescape HTML &...; style escapes. This would be useful to expose for 
other users who want to unescape a piece of HTML.

Additionally, many websites don't use proper unicode or iso-8859-1 
encodings and accidentally use Microsoft Code Page 1252 extensions. I 
added code to map these to their appropriate unicode values.

The unescaping logic was slightly simplified too.

This is my first Python patch submission, so please let me know if I've 
done anything wrong.

A new test case was also added for this functionality.

History
Date	User	Action	Args
2008-05-20 05:43:55	thomaspinckney3	set	spambayes_score: 0.00096647 -> 0.0009664701 recipients: + thomaspinckney3
2008-05-20 05:43:55	thomaspinckney3	set	spambayes_score: 0.00096647 -> 0.00096647 messageid: <1211262235.18.0.705526453605.issue2927@psf.upfronthosting.co.za>
2008-05-20 05:43:53	thomaspinckney3	link	issue2927 messages
2008-05-20 05:43:52	thomaspinckney3	create