Title: HTMLParser cannot deal with mixture of arbitrary data and character reference
Components: Library (Lib) Versions: Python 2.6
msg91128 - (view) Author: Liu DongMiao ( Date: 2009-07-31 07:45
HTMLParser (Python 2.6.2) Cannot deal with mixture of arbitrary data and
character reference. 

In line 365-373, replaceEntities(s) returns unichr(charref) in unicode,
which cannot be a mixture with arbitrary data in str.

A fix way: replace unichr(c) with unichr(c).encode('utf-8').
msg91158 - (view) Author: bones7456 (bones7456) Date: 2009-08-01 06:11
another fix way:
and these three lines to the head of file:

import sys
msg91164 - (view) Author: Liu DongMiao ( Date: 2009-08-01 16:20
i think this should not be a bug.

as we dont know the encoding of str, so we cannt deal with str and
unicode together. 

in my example, str is in utf-8, so i need to convert unicode to str in

i will takes bones' suggestion.
