Message 67222 - Python tracker

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author	lemburg
Recipients	lemburg, loewis, rhettinger, vvro
Date	2008-05-23.08:32:23
SpamBayes Score	0.00049308926
Marked as misclassified	No
Message-id	<4836810C.5020905@egenix.com>
In-reply-to	<1211513909.49.0.869721572381.issue2948@psf.upfronthosting.co.za>

Content
On 2008-05-23 05:38, Raymond Hettinger wrote: > Raymond Hettinger <rhettinger@users.sourceforge.net> added the comment: > > I don't think this is the right thing to do. The hash algorithms are > defined in terms of bytes, but Unicode is an abstracted from a byte > level encoding. It doesn't make sense to convert using an arbitrary > encoding (such as UTF-8) because someone else might hash the same text > using a different encoding. > > Marc, do you concur? Yes. While we could fix an encoding to use for converting Unicode to bytes, e.g. UTF-8, you clearly want hash functions to be portable across platforms, programming languages and implementations. Other languages or implementations might choose UTF-16 or some other encoding, so it's not clear which encoding to choose and there doesn't seem to be a standard for this either. -1 on the idea. Martin already closed and rejected the idea for me. Thanks, -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Source (#1, May 23 2008) >>> Python/Zope Consulting and Support ... http://www.egenix.com/ >>> mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/ >>> mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/ ________________________________________________________________________ :::: Try mxODBC.Zope.DA for Windows,Linux,Solaris,MacOSX for free ! :::: eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611

On 2008-05-23 05:38, Raymond Hettinger wrote:
> Raymond Hettinger <rhettinger@users.sourceforge.net> added the comment:
> 
> I don't think this is the right thing to do.  The hash algorithms are 
> defined in terms of bytes, but Unicode is an abstracted from a byte 
> level encoding.  It doesn't make sense to convert using an arbitrary 
> encoding (such as UTF-8) because someone else might hash the same text 
> using a different encoding.
> 
> Marc, do you concur?

Yes.

While we could fix an encoding to use for converting Unicode to
bytes, e.g. UTF-8, you clearly want hash functions to be portable
across platforms, programming languages and implementations.

Other languages or implementations might choose UTF-16 or some
other encoding, so it's not clear which encoding to choose and
there doesn't seem to be a standard for this either.

-1 on the idea. Martin already closed and rejected the idea for me.

Thanks,
-- 
Marc-Andre Lemburg
eGenix.com

Professional Python Services directly from the Source  (#1, May 23 2008)
 >>> Python/Zope Consulting and Support ...        http://www.egenix.com/
 >>> mxODBC.Zope.Database.Adapter ...             http://zope.egenix.com/
 >>> mxODBC, mxDateTime, mxTextTools ...        http://python.egenix.com/
________________________________________________________________________

:::: Try mxODBC.Zope.DA for Windows,Linux,Solaris,MacOSX for free ! ::::

    eGenix.com Software, Skills and Services GmbH  Pastor-Loeh-Str.48
     D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg
            Registered at Amtsgericht Duesseldorf: HRB 46611

History
Date	User	Action	Args
2008-05-23 08:32:58	lemburg	set	spambayes_score: 0.000493089 -> 0.00049308926 recipients: + lemburg, loewis, rhettinger, vvro
2008-05-23 08:32:50	lemburg	link	issue2948 messages
2008-05-23 08:32:44	lemburg	create