Author loewis
Recipients lemburg, loewis, rhettinger, vvro
Date 2008-05-23.05:59:12
SpamBayes Score 0.195433
Marked as misclassified No
Message-id <>
I'm rejecting this idea, for the reasons already given by others: the
same string might have different hash values, depending on which
encoding is chosen. Users will have to be explicit when hashing, just as
they need to be explicit when they chose a hash algorithm (i.e. md5,
sha1, or sha256 - they all do the same thing, but still produce
different output).

If you want a hash algorithm that abstracts from these details, use the
builtin hash function:

py> hash(u'joão')
Date User Action Args
2008-05-23 05:59:17loewissetspambayes_score: 0.195433 -> 0.195433
recipients: + loewis, lemburg, rhettinger, vvro
2008-05-23 05:59:16loewissetspambayes_score: 0.195433 -> 0.195433
messageid: <>
2008-05-23 05:59:15loewislinkissue2948 messages
2008-05-23 05:59:13loewiscreate