Author lemburg
Recipients lemburg, loewis, rhettinger, vvro
Date 2008-05-23.08:32:23
SpamBayes Score 0.000493089
Marked as misclassified No
Message-id <>
In-reply-to <>
On 2008-05-23 05:38, Raymond Hettinger wrote:
> Raymond Hettinger <> added the comment:
> I don't think this is the right thing to do.  The hash algorithms are 
> defined in terms of bytes, but Unicode is an abstracted from a byte 
> level encoding.  It doesn't make sense to convert using an arbitrary 
> encoding (such as UTF-8) because someone else might hash the same text 
> using a different encoding.
> Marc, do you concur?


While we could fix an encoding to use for converting Unicode to
bytes, e.g. UTF-8, you clearly want hash functions to be portable
across platforms, programming languages and implementations.

Other languages or implementations might choose UTF-16 or some
other encoding, so it's not clear which encoding to choose and
there doesn't seem to be a standard for this either.

-1 on the idea. Martin already closed and rejected the idea for me.

Marc-Andre Lemburg

Professional Python Services directly from the Source  (#1, May 23 2008)
 >>> Python/Zope Consulting and Support ...
 >>> mxODBC.Zope.Database.Adapter ...   
 >>> mxODBC, mxDateTime, mxTextTools ...

:::: Try mxODBC.Zope.DA for Windows,Linux,Solaris,MacOSX for free ! :::: Software, Skills and Services GmbH  Pastor-Loeh-Str.48
     D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg
            Registered at Amtsgericht Duesseldorf: HRB 46611
Date User Action Args
2008-05-23 08:32:58lemburgsetspambayes_score: 0.000493089 -> 0.000493089
recipients: + lemburg, loewis, rhettinger, vvro
2008-05-23 08:32:50lemburglinkissue2948 messages
2008-05-23 08:32:44lemburgcreate