This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: _sha256 et al. encode to UTF-8 by default
Type: behavior Stage:
Components: Extension Modules Versions: Python 2.7
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: gregory.p.smith Nosy List: ebfe, gregory.p.smith, hagen, kmtracey, lemburg, pitrou, rpetrov, vstinner
Priority: normal Keywords: 26backport

Created on 2008-09-01 09:27 by hagen, last changed 2022-04-11 14:56 by admin. This issue is now closed.

Messages (23)
msg72220 - (view) Author: Hagen Fürstenau (hagen) Date: 2008-09-01 09:27
Whereas openssl-based _hashlib refuses to accept unencoded strings:

>>> _hashlib.openssl_sha256("\xff")
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: object supporting the buffer API required

the _sha256 version encodes to UTF-8 by default:

>>> _sha256.sha256("\xff").digest() ==
_sha256.sha256("\xff".encode("utf-8")).digest()
True

I think refusing is better, but at least the behaviour should be
consistent. Same for the other algorithms in hashlib.
msg73550 - (view) Author: Gregory P. Smith (gregory.p.smith) * (Python committer) Date: 2008-09-22 01:01
agreed.  most platforms should be using the openssl version, i will
update the non-openssl implementations to behave the same.

I don't think this is worth being a release blocker.  I'll do it for 3.0.1.
msg79112 - (view) Author: Hagen Fürstenau (hagen) Date: 2009-01-05 08:48
Seems that this problem is being taken care of in issue #4751.
msg79115 - (view) Author: Lukas Lueg (ebfe) Date: 2009-01-05 09:47
solved in #4818 and #4821
msg81726 - (view) Author: Gregory P. Smith (gregory.p.smith) * (Python committer) Date: 2009-02-12 07:36
Fixed in py3k branch r69524.

needs porting to release30-maint.

possibly also release26-maint and trunk.
msg81738 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2009-02-12 11:18
I don't think backporting to 2.6 is fine, people may be relying on the
current behaviour.
As for 3.0.1, you'd better be quick, it's scheduled for tomorrow.
msg81739 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2009-02-12 11:22
Wooops, my mouse clicked on Remove!? I removed Message73550, sorry 
gregory. Here was the content of the message:
---
agreed.  most platforms should be using the openssl version, i will
update the non-openssl implementations to behave the same.

I don't think this is worth being a release blocker.  I'll do it for 
3.0.1.
---
msg81740 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2009-02-12 11:25
I agree with pitrou: leave python 2.6 unchanged, but please backport 
to 3.0.1 ;-)
msg81741 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2009-02-12 11:32
gpolo gave me the solution to restore a deleted message:
http://bugs.python.org/issueXXXX?@action=edit&@add@messages=MSGNUM
msg81820 - (view) Author: Gregory P. Smith (gregory.p.smith) * (Python committer) Date: 2009-02-12 21:15
fixed in release30-maint r69555.

sounds like its out of the question for 2.6.  i will backport it to 
trunk.
msg81858 - (view) Author: Gregory P. Smith (gregory.p.smith) * (Python committer) Date: 2009-02-13 03:01
fixed in trunk r69561.
msg96431 - (view) Author: Marc-Andre Lemburg (lemburg) * (Python committer) Date: 2009-12-15 11:22
Gregory, this patch should not have been backported to Python 2.7. See issue

Could you please revert the change on trunk ? Thanks.

A much better solution would be to issue a -3 warning in case a Unicode
object is passed to the hash functions. However, this is major work to
get right, since the "s#" parser marker also accepts buffer interfaces.
msg96501 - (view) Author: Roumen Petrov (rpetrov) * Date: 2009-12-16 23:41
What about inconsistent module build - as is reported some platform
build sha256 module that support  unicode but most it is not build if
openssl is version 0.8+. Same for sha512 module.
If unicode for hashlib is not acceptable for trunk than why is not build
always sha{256|512}  without to check for openssl version number ?
msg96934 - (view) Author: Gregory P. Smith (gregory.p.smith) * (Python committer) Date: 2009-12-28 02:13
lemburg - see which issue #?

Anyways perhaps the right thing to do instead of trunk r65961 would have 
been to change the s# to an s*.

Undoing it will be more painful now as several changes have gone in since 
that require undoing and possibly redoing differently.
msg96935 - (view) Author: Gregory P. Smith (gregory.p.smith) * (Python committer) Date: 2009-12-28 02:25
rpetrov - I couldn't really understand your message so I'm not sure if I'm 
answering the right things:  yes both the openssl and non-openssl modules 
need to behave identically.  the reason openssl is used when possible is 
that its optimized hash functions are several times faster than the plain 
C versions in the individual modules.
msg96936 - (view) Author: Karen Tracey (kmtracey) Date: 2009-12-28 03:04
I think the missing issue reference is to this thread on python-dev:

http://mail.python.org/pipermail/python-dev/2009-December/094574.html
msg96991 - (view) Author: Roumen Petrov (rpetrov) * Date: 2009-12-29 10:35
gregory - refer to setup.py logic to build modules
msg96995 - (view) Author: Marc-Andre Lemburg (lemburg) * (Python committer) Date: 2009-12-29 13:37
Gregory P. Smith wrote:
> 
> Gregory P. Smith <greg@krypto.org> added the comment:
> 
> lemburg - see which issue #?

Sorry, the message got truncated for some reason.

I was referring to http://bugs.python.org/issue3745

This was discussed on python-dev: http://mail.python.org/pipermail/python-dev/2009-December/094593.html

> Anyways perhaps the right thing to do instead of trunk r65961 would have 
> been to change the s# to an s*.

That would have worked as well.

> Undoing it will be more painful now as several changes have gone in since 
> that require undoing and possibly redoing differently.

Using s* should pretty much avoid the need to use GET_BUFFER_VIEW_OR_ERROUT(),
so if you want to keep the other changes, removing the use of the
macro should be fairly straight-forward, unless I'm missing something.
msg97151 - (view) Author: Gregory P. Smith (gregory.p.smith) * (Python committer) Date: 2010-01-02 22:33
trunk r77252 switches python 2.7 to use 's*' for argument parsing.  unicodes can be hashed (encoded to the system default encoding by s*) again.

This change has been blocked from being merged into py3k unless someone decides we actually want this magic unicode encoding behavior to exist there as well.

setup.py has also been updated to compile all versions of the hash algorithm modules when Py_DEBUG is defined.  I'll update tests run on all implementations next so that it is easier for developers to maintain identical behavior across all implementations without needing to explicitly remember to reconfigure their setup and test those.
msg97152 - (view) Author: Gregory P. Smith (gregory.p.smith) * (Python committer) Date: 2010-01-02 22:38
In order to get a -3 PyErr_WarnPy3k warning for unicode being passed to hashlib objects (a nice idea) I suggest creating an additonal 's*' like thing ('s3' perhaps?) in Python/getargs.c for that purpose rather than modifying all of the hashlib modules to accept an O, type check it and warn, and then re-parse it as a s* (that'd be a lot of tedious code duplication).
msg97153 - (view) Author: Gregory P. Smith (gregory.p.smith) * (Python committer) Date: 2010-01-03 00:48
I believe everything in here has been addressed.  Please open new issues with details for anything that doesn't quite right.
msg97202 - (view) Author: Marc-Andre Lemburg (lemburg) * (Python committer) Date: 2010-01-04 11:43
Gregory P. Smith wrote:
> 
> Gregory P. Smith <greg@krypto.org> added the comment:
> 
> trunk r77252 switches python 2.7 to use 's*' for argument parsing.  unicodes can be hashed (encoded to the system default encoding by s*) again.
> 
> This change has been blocked from being merged into py3k unless someone decides we actually want this magic unicode encoding behavior to exist there as well.

Thanks for updating the implementation.
msg97203 - (view) Author: Marc-Andre Lemburg (lemburg) * (Python committer) Date: 2010-01-04 11:49
Gregory P. Smith wrote:
> 
> Gregory P. Smith <greg@krypto.org> added the comment:
> 
> In order to get a -3 PyErr_WarnPy3k warning for unicode being passed to hashlib objects (a nice idea) I suggest creating an additonal 's*' like thing ('s3' perhaps?) in Python/getargs.c for that purpose rather than modifying all of the hashlib modules to accept an O, type check it and warn, and then re-parse it as a s* (that'd be a lot of tedious code duplication).

Good idea. We're likely going to need this in more places, so I'm +1 on
adding an "s3" parser marker.
History
Date User Action Args
2022-04-11 14:56:38adminsetgithub: 47995
2010-01-04 11:49:27lemburgsetmessages: + msg97203
2010-01-04 11:43:58lemburgsetmessages: + msg97202
2010-01-03 00:48:11gregory.p.smithsetstatus: open -> closed
resolution: fixed
messages: + msg97153
2010-01-02 22:38:53gregory.p.smithsetmessages: + msg97152
2010-01-02 22:33:31gregory.p.smithsetmessages: + msg97151
2009-12-29 13:37:13lemburgsetmessages: + msg96995
2009-12-29 10:35:09rpetrovsetmessages: + msg96991
2009-12-28 03:04:50kmtraceysetmessages: + msg96936
2009-12-28 02:25:52gregory.p.smithsetmessages: + msg96935
2009-12-28 02:13:53gregory.p.smithsetmessages: + msg96934
versions: - Python 3.0, Python 3.1
2009-12-16 23:41:39rpetrovsetnosy: + rpetrov
messages: + msg96501
2009-12-15 14:44:52kmtraceysetnosy: + kmtracey
2009-12-15 11:22:55lemburgsetstatus: closed -> open

nosy: + lemburg
messages: + msg96431

resolution: fixed -> (no value)
2009-02-13 03:01:29gregory.p.smithsetstatus: open -> closed
messages: + msg81858
resolution: fixed
components: + Extension Modules, - Library (Lib)
versions: + Python 3.0, Python 3.1
2009-02-12 21:16:00gregory.p.smithsetkeywords: + 26backport
messages: + msg81820
versions: - Python 2.6, Python 3.0
2009-02-12 11:32:34vstinnersetmessages: + msg81741
2009-02-12 11:31:46vstinnersetmessages: + msg73550
2009-02-12 11:25:24vstinnersetmessages: + msg81740
2009-02-12 11:22:04vstinnersetnosy: + vstinner
messages: + msg81739
2009-02-12 11:21:05vstinnersetmessages: - msg73550
2009-02-12 11:18:03pitrousetnosy: + pitrou
messages: + msg81738
2009-02-12 07:36:56gregory.p.smithsetpriority: normal
messages: + msg81726
versions: + Python 2.6, Python 2.7
2009-01-05 09:47:36ebfesetnosy: + ebfe
messages: + msg79115
2009-01-05 08:48:17hagensetmessages: + msg79112
2008-09-22 01:01:44gregory.p.smithsetassignee: gregory.p.smith
messages: + msg73550
nosy: + gregory.p.smith
2008-09-01 09:27:05hagencreate