This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author wrstlprmpft
Recipients
Date 2007-02-05.07:16:58
SpamBayes Score
Marked as misclassified
Message-id
In-reply-to
Content
I had a similar problem recently and did not have time to file a bug-report. Thanks for doing that.

The problem is the code that handles entity and character references in SGMLParser.parse_starttag. Seems that it is not careful about unicode/str issues.
(But maybe Beautifulsoup needs to tell it to?)

My quick'n'dirty workaround was to remove the offending char-entity from the website before feeding it to Beautifulsoup::

  text = text.replace('®', '') # remove rights reserved sign entity

cheers,
stefan
History
Date User Action Args
2007-08-23 14:51:43adminlinkissue1651995 messages
2007-08-23 14:51:43admincreate