classification
Title: Deprecation of MD5
Type: Stage:
Components: Extension Modules Versions: Python 3.1, Python 2.7
process
Status: closed Resolution: rejected
Dependencies: Superseder:
Assigned To: Nosy List: ebfe, gregory.p.smith, gvanrossum, lemburg, loewis, rhettinger
Priority: normal Keywords:

Created on 2009-01-06 20:06 by ebfe, last changed 2009-01-07 15:39 by gvanrossum. This issue is now closed.

Messages (12)
msg79281 - (view) Author: Lukas Lueg (ebfe) Date: 2009-01-06 20:06
MD5 is one of the most popular cryptographic hash-functions around,
mainly for it's good performance and availability throughout
applications and libraries. The MD5 algorithm is currently implemented
in python as part of the hashlib-module and (in more general terms) as
part of SSL in the ssl-module. However, concerns about the security of
MD5 have risen during the last few years. In 2007 a practical attack to
create collisions in the compression-function has been released and on
12/31/2008 US-CERT issued a note to warn about the general insecurity of
MD5 (http://www.kb.cert.org/vuls/id/836068).


I propose and strongly suggest to start deprecate direct support for MD5
during this year and completly remove support for it afterwards.

 * MD5 is a cryptographic hash function, it's reason for being is
security. By means of current hardware and attack vectors it's a matter
of hours to create collisions and fool MD5 hashes. The reason for being
has come to an end.
 * Python runs an uncountable number of exposed user interfaces on the
web. Usually the programmers rely on the security of the backing
libraries. Python can't provide this with MD5.
 * The functionality of MD5 can be easily replaced by using other hashes
that are supported by python (e.g. SHA1). They supply compareable
performance but are not binary-compatible (yay).
 * Programmers use MD5 in python without the need for it's cryptographic
attributes (e.g. creating unique indexes). Keeping MD5 for this use
however devaluates overall security of python for the good of few.


I'd like to start a discussion about this. Please keep in mind that -
although MD5 is currently still very popular and python's support for it
is justifed by demand - it's existence will come to an end soon.

We should now act and give people time to update their implementations. 


In a rough cut:

 - Patch haslib to throw a DeprecationWarning, starting during the first
half of 2009.
 - Update documentation not to use MD5 for security reasons
 - Remove MD5 from python in 2010.
 - Keep accordance to PEP 4


Goodbye MD5 and thanks for all the fish.
msg79282 - (view) Author: Marc-Andre Lemburg (lemburg) * (Python committer) Date: 2009-01-06 20:17
On 2009-01-06 21:06, Lukas Lueg wrote:
> MD5 is one of the most popular cryptographic hash-functions around,
> mainly for it's good performance and availability throughout
> applications and libraries. The MD5 algorithm is currently implemented
> in python as part of the hashlib-module and (in more general terms) as
> part of SSL in the ssl-module. However, concerns about the security of
> MD5 have risen during the last few years. In 2007 a practical attack to
> create collisions in the compression-function has been released and on
> 12/31/2008 US-CERT issued a note to warn about the general insecurity of
> MD5 (http://www.kb.cert.org/vuls/id/836068).
> 
> 
> I propose and strongly suggest to start deprecate direct support for MD5
> during this year and completly remove support for it afterwards.

A strong -1 on that idea.

MD5 is in wide-spread use as hash function. It can no longer
be considered a cryptographic hash function, but still serves its
purpose as fast, easy to use general purpose hash function well.

Removing it from Python would cripple Python for no apparent reason.
msg79283 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2009-01-06 20:20
Because MD5 is used widely, Python needs to support it, if only to be
able to verify MD5 signatures when offered.
msg79285 - (view) Author: Gregory P. Smith (gregory.p.smith) * (Python committer) Date: 2009-01-06 20:33
The hashlib docs already mention the problems with md5 et al via a
bright red:

"Warning

Some algorithms have known hash collision weaknesses, see the FAQ at the
end."

thanks for closing this.  not gonna happen.
msg79291 - (view) Author: Lukas Lueg (ebfe) Date: 2009-01-06 21:42
As I already said to Raymond: At least we should update the
documentation. The "FAQ" currently linked is from 2005.

The CERT-Advisory from provides a clean and simple language: "In 2008,
researchers demonstrated the practical vulnerability [...] We are
currently unaware of a practical solution to this problem. *Do not use
the MD5 algorithm*."
msg79293 - (view) Author: Marc-Andre Lemburg (lemburg) * (Python committer) Date: 2009-01-06 21:59
On 2009-01-06 22:42, Lukas Lueg wrote:
> Lukas Lueg <knabberknusperhaus@yahoo.de> added the comment:
> 
> As I already said to Raymond: At least we should update the
> documentation. The "FAQ" currently linked is from 2005.
>
> The CERT-Advisory from provides a clean and simple language: "In 2008,
> researchers demonstrated the practical vulnerability [...] We are
> currently unaware of a practical solution to this problem. *Do not use
> the MD5 algorithm*."

That's a correct statement for cryptographic work based on MD5.

However, it's not true with respect to using MD5 as fast general
purpose hash algorithm in non-crypto applications, so I think the
warning on http://docs.python.org/library/hashlib.html is sufficient.

Note that the various SHA implementations are also starting to
get some heat lately, so it's only a question of time until these
get excluded from the set of cryptographic hash functions:

http://en.wikipedia.org/wiki/SHA1
http://en.wikipedia.org/wiki/Cryptographic_hash_function

also see:

http://en.wikipedia.org/wiki/Hash_function

"""
Hash functions are related to (and often confused with) checksums, check digits,
fingerprints, randomizing functions, error correcting codes, and cryptographic
hash functions. Although these concepts overlap to some extent, each has its own
uses and requirements.
"""

It might be a good idea to remove the word "secure" from the
hashlib documentation, since security of these algorithms is
always limited to a certain period of time.
msg79295 - (view) Author: Lukas Lueg (ebfe) Date: 2009-01-06 22:10
> It might be a good idea to remove the word "secure" from the
> hashlib documentation, since security of these algorithms is
> always limited to a certain period of time.

I'm sorry, was that a boy attempted humor ? [Misuse quote from DH3: Check]

Anyway, in fact that might be a good idea: Reflect that the hashlib
module includes hash functions for the sake of compatibility and
interoperability and not everlasting security.
msg79296 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2009-01-06 22:13
Secure hash or cryptographic hash is the correct term and I think we
should leave that in, if only to make the original intent clear and to
make them easier to search for.

I propose adding a sentence to the first paragraph noting that the level
of security varies by algorithm and that over time some of the
algorithms are being found to have possible cryptographic weaknesses or
exploits.
msg79297 - (view) Author: Marc-Andre Lemburg (lemburg) * (Python committer) Date: 2009-01-06 22:39
On 2009-01-06 23:10, Lukas Lueg wrote:
> Lukas Lueg <knabberknusperhaus@yahoo.de> added the comment:
> 
>> It might be a good idea to remove the word "secure" from the
>> hashlib documentation, since security of these algorithms is
>> always limited to a certain period of time.
> 
> I'm sorry, was that a boy attempted humor ? [Misuse quote from DH3: Check]

No, it's the reality of life and one of the reasons why digitally
signed data needs to be resigned every few years in order to keep
the data secured and the legal status of the signature intact.

Note that SHA-0 and -1 were broken in 2005:

    http://www.schneier.com/blog/archives/2005/08/new_cryptanalyt.html

In Germany, the BSI which corresponds to the NSA in the US, publishes
a list of algorithms each year that are deemed safe, including their
expiration year:

    http://www.bundesnetzagentur.de/enid/Veroeffentlichungen/Algorithmen_sw.html
    (in German)

They regard SHA-1 as expired by the end of this year. For SHA-2 functions
they give 2015 as expiry date.

The NSA has similar guidelines:

    http://csrc.nist.gov/groups/ST/hash/statement.html

They currently suggest using SHA-2 functions for crypto applications,
but are also running a new contest for SHA-3:

    http://csrc.nist.gov/groups/ST/hash/sha-3/Round1/submissions_rnd1.html

> Anyway, in fact that might be a good idea: Reflect that the hashlib
> module includes hash functions for the sake of compatibility and
> interoperability and not everlasting security.

BTW: Not sure what Deer Hunter 3 has to do with all this ;-)

    http://www.planetdeerhunter.com/dh3
msg79298 - (view) Author: Lukas Lueg (ebfe) Date: 2009-01-06 22:54
actually I smelled irony and was referring to die hard 3 :-\

> No, it's the reality of life and one of the reasons why digitally
> signed data needs to be resigned every few years in order to keep
> the data secured and the legal status of the signature intact.

I know that of course and that's why I brought this all up.


> I propose adding a sentence to the first paragraph noting that the level
> of security varies by algorithm and that over time some of the
> algorithms are being found to have possible cryptographic weaknesses or
> exploits.

Fine.
msg79315 - (view) Author: Martin v. Löwis (loewis) * (Python committer) Date: 2009-01-07 10:14
> I propose and strongly suggest to start deprecate direct support for MD5
> during this year and completly remove support for it afterwards.

-1. Stopping usage of md5 should be the user's choice, not Python's.

>  * MD5 is a cryptographic hash function, it's reason for being is
> security. By means of current hardware and attack vectors it's a matter
> of hours to create collisions and fool MD5 hashes. The reason for being
> has come to an end.

I think you misunderstand the kind of problem that has been detected.
It is still *not* possible to produce a colliding text within
reasonable, when given the md5 hash. So when md5 is used as the trap
function for password storage, it's use remains perfectly safe.

Likewise, md5 is still well capable of detecting corruption of binary
files (e.g. during downloads), and will remain in use for this
application for many more years.

It is only in the context of digital signatures that the "chosen prefix"
attack can be demonstrated successfully.

>  * Python runs an uncountable number of exposed user interfaces on the
> web. Usually the programmers rely on the security of the backing
> libraries. Python can't provide this with MD5.

That's like saying "Mercedes drivers rely on efficient operation of the
motor. By putting water into the tank, the motor fails to deliver. So
let's put a ban on the usage of water in cars."

>  * The functionality of MD5 can be easily replaced by using other hashes
> that are supported by python (e.g. SHA1). They supply compareable
> performance but are not binary-compatible (yay).

In some case, yes, replacement is easy. In other cases, replacement is
not so easy. For example, for password hashes, you cannot simply rehash
all passwords - because you typically don't know what they are.
msg79341 - (view) Author: Guido van Rossum (gvanrossum) * (Python committer) Date: 2009-01-07 15:38
For the record, I'm with Martin -- there are many existing uses that we
can't just legislate away.
History
Date User Action Args
2009-01-07 15:39:00gvanrossumsetnosy: + gvanrossum
messages: + msg79341
2009-01-07 10:14:50loewissetnosy: + loewis
messages: + msg79315
2009-01-06 22:54:34ebfesetmessages: + msg79298
2009-01-06 22:39:23lemburgsetmessages: + msg79297
2009-01-06 22:13:36rhettingersetmessages: + msg79296
2009-01-06 22:10:04ebfesetmessages: + msg79295
2009-01-06 21:59:13lemburgsetmessages: + msg79293
2009-01-06 21:42:21ebfesetmessages: + msg79291
2009-01-06 20:33:14gregory.p.smithsetnosy: + gregory.p.smith
messages: + msg79285
2009-01-06 20:20:34rhettingersetstatus: open -> closed
resolution: rejected
messages: + msg79283
nosy: + rhettinger
2009-01-06 20:17:53lemburgsetnosy: + lemburg
messages: + msg79282
2009-01-06 20:06:15ebfecreate