Message31176
I had a similar problem recently and did not have time to file a bug-report. Thanks for doing that.
The problem is the code that handles entity and character references in SGMLParser.parse_starttag. Seems that it is not careful about unicode/str issues.
(But maybe Beautifulsoup needs to tell it to?)
My quick'n'dirty workaround was to remove the offending char-entity from the website before feeding it to Beautifulsoup::
text = text.replace('®', '') # remove rights reserved sign entity
cheers,
stefan
|
|
Date |
User |
Action |
Args |
2007-08-23 14:51:43 | admin | link | issue1651995 messages |
2007-08-23 14:51:43 | admin | create | |
|