This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author Rhamphoryncus
Recipients Rhamphoryncus, ezio.melotti, lemburg
Date 2008-07-12.19:03:49
SpamBayes Score 0.0121
Marked as misclassified No
Message-id <>
Marc, perhaps Unicode has refined their definitions since you last looked?

Valid UTF-8 *cannot* contain surrogates[1].  If it does, you have
CESU-8[2][3], not UTF-8.

So there are two bugs: first, the UTF-8 codec should refuse to load
surrogates.  Second, since the original bug showed up before the .pyc is
created, something in the parse/compilation/whatever stage is producing

[1] 4th bullet point of D92 in
Date User Action Args
2008-07-12 19:03:53Rhamphoryncussetspambayes_score: 0.0121 -> 0.0121
recipients: + Rhamphoryncus, lemburg, ezio.melotti
2008-07-12 19:03:52Rhamphoryncussetspambayes_score: 0.0121 -> 0.0121
messageid: <>
2008-07-12 19:03:50Rhamphoryncuslinkissue3297 messages
2008-07-12 19:03:49Rhamphoryncuscreate