classification
Title: sha1module: Switch sha1 implementation to sha1dc/hardened sha1
Type: enhancement Stage: resolved
Components: Library (Lib) Versions: Python 3.8
process
Status: closed Resolution:
Dependencies: Superseder:
Assigned To: christian.heimes Nosy List: antoine.pietri, christian.heimes, loewis, rhettinger, vstinner
Priority: normal Keywords:

Created on 2018-10-08 11:37 by antoine.pietri, last changed 2018-10-16 14:55 by antoine.pietri. This issue is now closed.

Messages (7)
msg327343 - (view) Author: Antoine Pietri (antoine.pietri) * Date: 2018-10-08 11:37
SHA-1 has been broken a while ago. While the general recommandation is to migrate to more recent hashes (like SHA-2 and SHA-3), a lot of industry applications (notably Merkle DAG implementations like Git or Blockchains) require backwards compatibility with SHA-1, at least for the time being required for all the users to transition.

The SHAttered authors published along with their paper a reference implementation of a "hardened SHA-1" algorithm, a SHA-1 implementation that uses counter-cryptanalysis to detect inputs that were forged to produce a hash collision. What that means is that Hardened SHA-1 is a secure hash function that produces the same output as SHA-1 in 99.999999...% of cases, and only differs when two inputs were specifically made to generate collisions. The reference implementation is here: https://github.com/cr-marcstevens/sha1collisiondetection

A large part of the industry has adopted Hardened SHA-1 as a temporary replacement for SHA-1, most notably Git under the name "sha1dc": https://github.com/git/git/commit/28dc98e343ca4eb370a29ceec4c19beac9b5c01e

Since CPython has its own implementation of SHA-1, I think it would be a good idea to provide a hardened SHA-1 implementation. So either:

1. we replace the current implementation of sha1 by sha1dc completely, which might be a problem for people who write script to detect whether two files collide with classic sha1

2. we replace the current implementation but we keep the old one under a new name, like "sha1_broken" or "sha1_classic", which breaks backwards compatibility in a few marginal cases but the functionality can be trivially restored by changing the name of the hash

3. we keep the current implementation but add a new one under a new name "sha1dc", which probably means most people will stay on a broken implementation for no good reason, but it will be fully backwards-compatible even in the marginal cases

4. we don't implement Hardened SHA-1 at all, and we advise people to change their hash algorithm, while realizing that this solution is not feasible in a lot of cases.

I'd suggest going with either 1. or 2. What would be your favorite option?

Not sure whether this should go in security or enhancement, so I put it in the latter category to be more conservative in issue prioritization. I added the devs who worked the most on Modules/sha1module.c in the Nosy list.
msg327494 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2018-10-10 21:27
I dislike modifying a hash function to return its output but keep the same name. For name, "SHA1" must remain "SHA1". If you want a variant, it should have a different name, but I would expect that the existing sha1 function remains unchanged. How do you keep the compatibility between different programming languages and applications if one use SHA1 and the other uses "hardened SHA-1"?

One alternative is to stop using sha1 :-D

> A large part of the industry has adopted Hardened SHA-1 ...

Do you have examples?
msg327500 - (view) Author: Antoine Pietri (antoine.pietri) * Date: 2018-10-10 22:11
On Wed, Oct 10, 2018 at 11:27 PM STINNER Victor <report@bugs.python.org> wrote:
> I dislike modifying a hash function to return its output but keep the same name. For name, "SHA1" must remain "SHA1". If you want a variant, it should have a different name, but I would expect that the existing sha1 function remains unchanged. How do you keep the compatibility between different programming languages and applications if one use SHA1 and the other uses "hardened SHA-1"?

Well, as I said we could almost consider both algorithms to be
"compatible", in that they only differ in an infinitesimally small
number of cases that were specifically *designed* to break SHA1. I
agree it's not ideal to just replace the function directly, and that's
why I suggested 4 possible alternatives. But you have to understand
that the decision is not as simple as just "it doesn't give the same
outputs so it should have a different name", because it *does* give
the same outputs in *all of the cases that weren't designed to break
it*, and the tradeoff for not making that the default is that most
people who don't care about seeing the collisions happen will keep
using a broken implementation for no reason.

I'm not saying I disagree with you here, I'm just making sure you're
aware of the tradeoff. If we make it the default, it's a *very slight*
break of backwards compatibility, but it will be a positive change for
99.99% of users. The only affected people will be the ones that were
writing scripts to check whether collisions did exist in the old
algorithm, and if we change the name of the "classic sha1" they could
trivially change it themselves.

That said, if you'd rather have another name for it, it also works for
me, it's better than having nothing.

> One alternative is to stop using sha1 :-D

Totally agree with you here, but it's not always an option, so I'd
argue we should do our best to mitigate the problem.

> Do you have examples?

I already gave the Git example:

https://github.com/git/git/commit/28dc98e343ca4eb370a29ceec4c19beac9b5c01e#diff-a44b837d82653a78649b57443ba99460

Fossil also migrated to it:

https://www.fossil-scm.org/xfer/doc/trunk/www/hashpolicy.wiki

The truth is, most of the other Merkle Tree implementations (like
Bitcoin) were using a different hash in the first place, and that
seems to be the main application where you have to keep backward
compatibility with your hashes. So the fact that two of the main SHA-1
Merkle tree implementations moved to Hardened SHA-1 is huge, IMO.
msg327511 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2018-10-11 03:40
Assigning to Christian to make the call.

+1 for option #1, replacing sha1 implementation with the harden version, helping us move close to more-secure-by-default.
msg327832 - (view) Author: Christian Heimes (christian.heimes) * (Python committer) Date: 2018-10-16 14:05
I wouldn't call SHA1 a secure hash function any more. SHA1DC is both an incompatible implementation and a bandaid for legacy applications that can't easily update to a proper hashing algorithm. Also it's rather pointless to update our SHA1 implementation since OpenSSL still uses the standardized SHA1 implementation. CPython prefers OpenSSL's implementation because it's much, much faster than libtomcrypt's implementation.

I need to study SHA1DC first and get some advice before I can make an educated statement. But I'm leaning towards -1 to even support SHA1DC in the standard library, because I don't want to promote SHA1 any more. Applications should move to SHA2, SHA3 and blake2.
msg327835 - (view) Author: Christian Heimes (christian.heimes) * (Python committer) Date: 2018-10-16 14:42
I talked to some experts (Alex Gaynor, Simo Sorce). They all share my sentiment and are against SHA1DC. The algorithm is just a poor bandaid for a gapping security issue. Everybody was strongly against replacing SHA1 with SHA1DC by default, because it's an incompatible implementation. SHA1DC is only able to counteract some of the known flaws, too. Even git doesn't replace SHA1 with SHA1DC directly. Instead it turns a detected collision into a fatal error [1].

I'm -1 to add it to the Python standard library. Alex pointed out that the lack of SHA1DC in OpenSSL is a clear sign that it's not generally useful. SHA1DC may be useful for few applications like git. In general it's not a fool-proof safety net for SHA1.

[1] https://github.com/git/git/blob/master/sha1dc_git.c#L17-L23
msg327836 - (view) Author: Antoine Pietri (antoine.pietri) * Date: 2018-10-16 14:55
Thanks, those arguments are convincing. I guess for applications that really can't move to a more secure hash, it would be better for them to rely on third-party libraries that implement the "band-aid".

I'm closing this for now.
History
Date User Action Args
2018-10-16 14:55:49antoine.pietrisetstatus: open -> closed

messages: + msg327836
stage: resolved
2018-10-16 14:42:29christian.heimessetmessages: + msg327835
2018-10-16 14:05:07christian.heimessetmessages: + msg327832
2018-10-11 03:40:23rhettingersetassignee: christian.heimes

messages: + msg327511
nosy: + rhettinger
2018-10-10 22:11:54antoine.pietrisetmessages: + msg327500
2018-10-10 21:27:54vstinnersetmessages: + msg327494
2018-10-08 11:37:02antoine.pietricreate