Message73571
It seems that HTMLParser.feed throws an exception whenever an attribute
name contains both quotation mark '&' and non-ascii characters.
Running the attached test file with Python 2.5 succeeds, but with Python
2.6, the result is:
C:\Python26>python.exe test.py
Without & in attribute
OK
With & in attribute
Traceback (most recent call last):
File "test.py", line 18, in <module>
HP().feed(s)
File "C:\Python26\lib\HTMLParser.py", line 108, in feed
self.goahead(0)
File "C:\Python26\lib\HTMLParser.py", line 148, in goahead
k = self.parse_starttag(i)
File "C:\Python26\lib\HTMLParser.py", line 249, in parse_starttag
attrvalue = self.unescape(attrvalue)
File "C:\Python26\lib\HTMLParser.py", line 386, in unescape
return re.sub(r"&(#?[xX]?(?:[0-9a-fA-F]+|\w{1,8}));",
replaceEntities, s)
File "C:\Python26\lib\re.py", line 150, in sub
return _compile(pattern, 0).sub(repl, string, count)
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 0:
ordinal
not in range(128)
I am running:
Python 2.6rc2 (r26rc2:66507, Sep 18 2008, 14:27:33) [MSC v.1500 32 bit
(Intel)] on win32 |
|
Date |
User |
Action |
Args |
2008-09-22 12:33:10 | yanne | set | recipients:
+ yanne |
2008-09-22 12:33:10 | yanne | set | messageid: <1222086790.7.0.800001604957.issue3932@psf.upfronthosting.co.za> |
2008-09-22 12:32:10 | yanne | link | issue3932 messages |
2008-09-22 12:32:08 | yanne | create | |
|