This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author orsenthil
Recipients Martin.Potthast, orsenthil, r.david.murray
Date 2010-12-23.14:58:45
SpamBayes Score 3.5696792e-09
Marked as misclassified No
Message-id <1293116330.49.0.334906315275.issue10759@psf.upfronthosting.co.za>
In-reply-to
Content
Yes, I too agree that HTMLParser.unescape() should split-out malformed char-ref just as other browsers do.

But, as unescape function has undocumented/unexposed for releases, I am not sure making it exposed is a good idea. HTMLParser is more for event based parsing of tags, and unescape is a just a helper function in that context.

Given that reasoning if you see the malformatted test, you see that event based parsing does return the malformatted data properly For e.g -  ("data", "&#bad;").

Only calling unescape explicitly does not exhibit this behavior.

Martin: I am not sure if changing something in line 168 would solve the issue. In that particular block of code, the else condition is responsible for throwing the malformed charref on an event. If would like to elaborate a bit more on your suggestion, it would be helpful.

However, I do agree that unescape can be changed as per your patch and I have added a simple test to exercise that change. I think, this can go in.
History
Date User Action Args
2010-12-23 14:58:50orsenthilsetrecipients: + orsenthil, r.david.murray, Martin.Potthast
2010-12-23 14:58:50orsenthilsetmessageid: <1293116330.49.0.334906315275.issue10759@psf.upfronthosting.co.za>
2010-12-23 14:58:46orsenthillinkissue10759 messages
2010-12-23 14:58:45orsenthilcreate