This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author mortenlj
Recipients akuchling, belopolsky, benjamin.peterson, donlorenzo, mortenlj, pitrou, rsc, zanella
Date 2008-06-26.14:45:06
SpamBayes Score 0.070361935
Marked as misclassified No
Message-id <1214491512.86.0.543795928041.issue2650@psf.upfronthosting.co.za>
In-reply-to
Content
One issue that the current implementation has, which I can't see have 
been commented on here, is that it kills utf8 characters (and probably 
every other character encoding that is multi-byte).

A é character in an utf8 encoded string will be represented by two 
bytes. When passed through re.escape, those two bytes are checked 
individually, and both are considered non-alphanumeric, and is 
consequently escaped, breaking the utf8 string into complete gibberish 
instead.
History
Date User Action Args
2008-06-26 14:45:13mortenljsetspambayes_score: 0.0703619 -> 0.070361935
recipients: + mortenlj, akuchling, belopolsky, pitrou, rsc, benjamin.peterson, zanella, donlorenzo
2008-06-26 14:45:12mortenljsetspambayes_score: 0.0703619 -> 0.0703619
messageid: <1214491512.86.0.543795928041.issue2650@psf.upfronthosting.co.za>
2008-06-26 14:45:07mortenljlinkissue2650 messages
2008-06-26 14:45:06mortenljcreate