classification
Title: FIPS support for hashlib
Type: enhancement Stage: patch review
Components: Library (Lib) Versions: Python 3.7, Python 3.6
process
Status: open Resolution:
Dependencies: Superseder:
Assigned To: Nosy List: bkabrda, christian.heimes, cstratak, dholth, dmalcolm, doughellmann, icordasc, jpokorny, lukecarrier, pitrou, rbcollins, rpetrov, yolanda.robla
Priority: normal Keywords: patch

Created on 2010-07-10 00:22 by dmalcolm, last changed 2017-05-25 13:26 by cstratak.

Files
File name Uploaded Description Edit
py3k-hashlib-fips-issue9216.patch dmalcolm, 2010-07-12 18:48 Patch against py3k branch review
0001-Rework-_hashlib-caching-moving-per-hash-cached-data-.patch dmalcolm, 2011-09-16 19:45
0002-Add-error-handling-to-initialization-of-_hashlib.patch dmalcolm, 2011-09-16 19:46
0003-Add-optional-usedforsecurity-argument-in-various-pla.patch dmalcolm, 2011-09-16 19:46
0004-_hashlib-Add-selftest-for-FIPS-mode-and-usedforsecur.patch dmalcolm, 2011-09-16 19:46
virtualenv_distribute lukecarrier, 2012-06-29 00:39
Messages (28)
msg109808 - (view) Author: Dave Malcolm (dmalcolm) (Python committer) Date: 2010-07-10 00:22
(taking the liberty of adding gregory.p.smith to the "nosy" list; hope that's OK)

This is a higher-level take on issue 9146.

Some versions of OpenSSL have a FIPS mode that can refuse the use of non-certified hashes.

The idea is that FIPS mode should prevent the use of non-certified hashes for security uses.  For example, MD5 shouldn't be used for signatures these days (see e.g. http://www.kb.cert.org/vuls/id/836068).

However, there are legitimate non-security uses of these hashes.  For example, one might use MD5 hashes of objects to places them in bins for later retrieval, purely as a speed optimization (e.g. files in directories on a filesystem).

I'm working on a patch to hashlib which would better support this, but it involves an API expansion, and I wanted to sound things out first.

The API idea is to introduce a new keyword argument, say "usedforsecurity" to hashlib.new() and to the named hashlib constructors, such as hashlib.md5().  This would default to True.  If code is using these hashes in FIPS mode, the developer needs to override this: usedforsecurity=False to mark the callsite as a non-security-sensitive location.  Internally, this would lead to the EVP_MD_CTX being initialized with EVP_MD_CTX_FLAG_NON_FIPS_ALLOW.

This way, if you run unaudited code in an environment that cares about FIPS, the code will raise exceptions if it uses a non-valid hash, but during code audit the callsites can be marked clearly as "usedforsecurity=False", and be used as before.

In non-FIPS environments, the flag would be ignored.

Am I right in thinking that the _hashlib module should be treated as an implementation detail here?  The entry points within _hashlib are likely to double, with a pair of pre-initialized contexts, one with the flag, one without.

Does this sound reasonable?  Thanks.
msg109891 - (view) Author: Gregory P. Smith (gregory.p.smith) * (Python committer) Date: 2010-07-10 17:00
That sounds fine to me and I do like the usedforsecurity annotation on the API.  I'll gladly review any patches.
msg110124 - (view) Author: Dave Malcolm (dmalcolm) (Python committer) Date: 2010-07-12 18:48
Attached is a patch against the py3k branch which implements this.

I've checked that it builds against openssl-0.9.8o.tar.gz, openssl-1.0.0a.tar.gz, and against Fedora 12 and 13's heavily-patched openssl-1.0.0. The bulk of my testing has been against Fedora's openssl.

I've added selftests to try to verify the new API.  I try to detect if the OpenSSL enforces FIPS, via trying to run "openssl md5" as a subprocess, and seeing if I can trigger an error.

With FIPS enforcement off, all tests pass when built against 0.9.8o and 1.0.0a and F13's 1.0.0, other than those for FIPS enforcement itself, which skip.

With FIPS enforcement on, all tests pass when built against F13's openssl.  (I haven't yet figured out how to get the fips selftest to pass for the other builds, it's testing checksums against the wrong libcrypto for some reason; see caveat below):
$ ./python Lib/test/test_hashlib.py
$ OPENSSL_FORCE_FIPS_MODE=1 ./python Lib/test/test_hashlib.py

For all of the various contexts stored in _hashopenssl.c, we now store two: one with the override flag, one without.  This required some reworking of the various preprocessor magic in that file, so I've gathered everything related to an algorithm into a structure, and moved most of the logic into functions, rather than macros.  I'm assuming that these will get inlined under optimization, and that the bulk of the time that you're trying to optimize out are the EVP lookups and initializations, rather than function call overhead.

How's this looking?

Do I need to add a dummy "usedforsecurity" arg to all of the non-openssl message digest implementations within the tree?


Unfortunately, if fips mode is on, and the fips selftest fails for the openssl library, every hash use will fail, both with and without the flag:
  ValueError: error:2D07D06A:FIPS routines:EVP_DigestInit_ex:fips selftest failed
and this leads to a crippled hashlib module.  It's not clear to me if there's a good way to handle this.  (Having said that, a site that has the technical expertise to opt-in to FIPS mode is hopefully able to diagnose this, and fix their openssl library)
msg144152 - (view) Author: Dave Malcolm (dmalcolm) (Python committer) Date: 2011-09-16 19:45
I've refreshed this patch against the latest version of the code in hg.

In an attempt to make it easier to review, I've split it up into four (so far) thematic patches, which apply in sequence.
msg144153 - (view) Author: Dave Malcolm (dmalcolm) (Python committer) Date: 2011-09-16 19:48
[and yes, I used git to generate the 4 patches; sorry ]
msg144154 - (view) Author: Dave Malcolm (dmalcolm) (Python committer) Date: 2011-09-16 19:57
The cumulative effect of the above patches (to _hashlib) are equivalent to what I've applied downstream to python 2 in RHEL 6.0 and Fedora 17 onwards, and python 3 in Fedora 17 onwards.

In those environments I've additionally patched hashlib to only use _hashlib, rather than falling back on _md5 etc, since otherwise you get confusing error messages from hashlib.md5() when it defers to _md5 due to FIPS enforcement.  In my downstream builds we can be sure of building against OpenSSL, but this other part of the patch seems less appropriate for upstream python, given that upstream python tries to be flexible in terms of its dependencies.

Hope this makes sense.
msg155688 - (view) Author: Gregory P. Smith (gregory.p.smith) * (Python committer) Date: 2012-03-13 23:15
quick summary of comments from pycon sprints discussion:

this looks pretty good.  i like the 0001 refactoring cleanup.  a couple things to fix in error handling (better messages and some bogus handling in the test). dmalcolm has the notes on what to do.

do it and commit away or ask for more review as you see fit.
msg155741 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2012-03-14 10:23
Patch 0002:

- cached_info->error_msg doesn't seem deallocated anywhere?

Patch 0003:

- "usedforsecurity" is a poor name IMO; make it shorter and/or PEP8-ize it ("used_for_security")
- the 2-element context array thing is obscure: why not distinct "ctx" and "ctx_non_fips" members?
- "this could fail, e.g. low on memory, or encodings": doesn't it lack an error-handling path, then?

Patch 0004:

- openssl_can_enforce_fips(): instead of calling OpenSSL in a subprocess, perhaps it's possible to expose a public flag in the hashlib module (e.g. "hashlib.HAS_FIPS")? or is this info not fetchable programmatically?
- openssl_can_enforce_fips() needs to check the subprocess return code, in case another error happened
- run_command_with_fips_enforcement() should use the assert_python_ok() and assert_python_failure() functions from Lib/test/script_helper.py

Overall:

- please put back the unconditional tests for the "usedforsecurity" argument (even when FIPS can't be enforced)
- the patches lack docs (Doc/library/hashlib.rst)
- please commit all this as a single commit, not 4 different ones
msg155846 - (view) Author: Gregory P. Smith (gregory.p.smith) * (Python committer) Date: 2012-03-15 03:17
My summary of our discussion was pretty terse. :)  dmalcolm has more detailed TODO list notes that include things like the error cases and .rst documentation.

As for how to commit it, i'd make 0001 its own commit as it is a useful refactoring otherwise unrelated to this change.

I'll leave it entirely up to dmalcolm how many commits he wants 0002 onward to be.  No need to be picky.

usedforsecurity vs used_for_security, agreed, used_for_security is better.

dmalcolm was going to make an enum to index the two element array, that'd give meaningful names instead of 0 or 1.  simply using two named variables would also work but it would require the loop in 0003 to be expanded or turned into a small static method for the body (not a bad idea) instead.  i'm fine with either.
msg164308 - (view) Author: Luke Carrier (lukecarrier) Date: 2012-06-29 00:39
I've not done enough digging on the issue I'm presently experiencing to draw any conclusions make any suggestions, but this change seems to break the present distribute module (version 0.6.27). It appears it will likely break a great deal of other code too.

I've pasted the relevant output here and attached the full traceback.
  File "/usr/lib64/python3.2/hashlib.py", line 112, in __get_openssl_constructor
    f(usedforsecurity=False)
TypeError: openssl_md5() takes no keyword arguments

Whilst I agree with the notion behind this change, Fedora's quick actions have led to me spending the best part of an hour of the night before ship day diagnosing issues caused by undocumented (or at least under-documented) changes to code I haven't written or interfaced with. _Please_ publicise the change a little better? Pretty please!?
msg164328 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2012-06-29 12:25
> _Please_ publicise the change a little better? Pretty please!?

This changes haven't been committed in Python, so you probably want to post on the Fedora bug tracker instead.
msg166575 - (view) Author: Daniel Holth (dholth) Date: 2012-07-27 15:38
While you are at it, can you edit the docs to put md5() at the bottom of the page at the back of the list in a 2-point font and raise a DeprecationWarning("This function is totally lame, and it is slower than SHA-3, get with the program.") the first time it is used? I don't agree that md5 has a legitimate place in systems designed after 1996.
msg166576 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2012-07-27 15:41
> While you are at it, can you edit the docs to put md5() at the bottom
> of the page at the back of the list in a 2-point font and raise a
> DeprecationWarning("This function is totally lame, and it is slower
> than SHA-3, get with the program.") the first time it is used?

Please... don't make suggestions unrelated to the issue. Open a new issue instead.
msg172302 - (view) Author: Roumen Petrov (rpetrov) * Date: 2012-10-07 13:55
Everything in this issue posted until now has to be managed as vendor patch.
msg191130 - (view) Author: Christian Heimes (christian.heimes) * (Python committer) Date: 2013-06-14 14:07
It's out of scope for 3.3 but I'd love to see the feature in 3.4.
msg285593 - (view) Author: Robert Collins (rbcollins) * (Python committer) Date: 2017-01-17 00:01
A few thoughts;

usedforsecurity=xxx seems awkward: I wouldn't want, as a user of hashlib, to have to put that in literally every use I make of it.

If I understand the situation correctly, the goal is for both linters, and at runtime, identification of the intended purpose of a call to md5 - e.g. whether there are security implications in its use (as far as FIPS is concerned).

Perhaps having two separate implementations of the interfaces, one general purpose and one FIPS would be decent.

e.g. from hashlib.fips import sha1 
etc
etc
and hashlib.fips simply wouldn't contain md5.

Then the md5 thats in hashlib is by definition not FIPS ready and any code using it should be fixed.
msg285640 - (view) Author: Doug Hellmann (doughellmann) * (Python committer) Date: 2017-01-17 13:54
@rbcollins, I don't think providing a hashlib.fips module without md5() solves the problem. The idea is to have a way to call md5() in non-secure situations, and to signal to the FIPS system that the call is OK. A separate module would work if it included an md5() function that always did that signaling. But creating a separate module just to wrap one function like that seems like overkill, doesn't it?
msg285643 - (view) Author: Yolanda (yolanda.robla) Date: 2017-01-17 14:27
I agree with Doug. From my understanding, the intention of the patch is to allow the usage of md5 for non-security purposes, without being blocked by FIPS.
Right now, md5() calls running on a FIPS enabled kernel, are blocked without discrimination of the usage, that shall be ok for hashing purposes, and more performant than other methods. This patch provides the ability to continue using md5 just flagging it properly.
msg285676 - (view) Author: Robert Collins (rbcollins) * (Python committer) Date: 2017-01-17 20:04
@doug - I don't see how a separate fips module *wouldn't* solve it:
 - code that uses md5 in security contexts wouldn't be able to call it from the fips module, which is the needed outcome
 - code that uses md5 and isn't fips compliant would be importing from the non-fips module, and thats as auditable as looking for a 'usedforsecurity=False' flag
 - auditors can assume that code that doesn't use the fips module


And its way less messy: remember we're going to have this flag passed to every hashlib invocation from every project in order to *opt out* of the FIPS restrictions. Because, over time, FIPS will change, so noone can assume that any given function is and will remain FIPS compatible: and this flag is going to percolate up into e.g. the HMAC module.

I think thats pretty ugly: want to calculate the sha of a blob to look it up in git? sha1sum(file.read(), usedforsecurity=False)

Separately I wonder about the impact on higher layers - are they ready to be parameterised by objects, or do they look things up by name - and thus need to start accepting this new parameter and passing it down?
msg285677 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2017-01-17 20:19
The separate module idea is an interesting one, though I wonder if it aligns with users' goals.  Perhaps some users simply want to set the OPENSSL_FORCE_FIPS_MODE environment variable and then run existing Python code with it to ensure that code is FIPS-compliant.  A separate module assumes that the developer is the one who makes the decision of running in FIPS compliance mode or not.
msg285678 - (view) Author: Ian Cordasco (icordasc) * Date: 2017-01-17 20:26
So I see the argument on both sides of this discussion. Having those optional arguments for all the functions seems like an obvious blocker. If a submodule is a blocker, what if we provide a context-manager to signal this?
msg285679 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2017-01-17 20:29
AFAICT from David's patch, there isn't a new argument in all hashlib functions but only in the digest constructors.  Someone might want to correct me.
msg285681 - (view) Author: Yolanda (yolanda.robla) Date: 2017-01-17 20:45
@rbtcollins, so you mean the apps using it, shall be "fips aware" ? That will be the point of your separate module?
So...

if fips_enabled then
    use fips.md5
else
    use normal.md5
msg285683 - (view) Author: Doug Hellmann (doughellmann) * (Python committer) Date: 2017-01-17 21:07
@Robert, I thought you were proposing a hashlib.fips module that did not include md5() at all. If it does include the function, and the function does whatever is needed to disable the "die when using MD5" on a FIPS system, then I agree it would work. 

Your point about the FIPS standard changing and needing to include more hash types in the future is good.
msg285684 - (view) Author: Doug Hellmann (doughellmann) * (Python committer) Date: 2017-01-17 21:08
@Antoine - The idea behind introducing some API mechanism is exactly as you say, to let the developer say "this use of this algorithm is not related to security" to tell FIPS systems to not be pedantic.
msg285706 - (view) Author: Yolanda (yolanda.robla) Date: 2017-01-18 08:24
@rbtcollins, even if we go with a FIPS aware module, we'd still need to detect if md5 was used for security purposes.
If we build a system that detects FIPS enablement, call md5 say ... for generating a password, and then the python fips_md5 call is masking it, we'd be breaking FIPS rules.
I still see the point of the used_for_security flag. Maybe reverting the flag, set used_for_security to False because the normal usage of md5 shall be for hashes and non security stuff?
msg285895 - (view) Author: Christian Heimes (christian.heimes) * (Python committer) Date: 2017-01-20 10:14
Objection from hashlib maintainer: I will reject a used_for_security flag with default of False. I'm slowly moving Python to a secure-by-default policy. Therefore used_for_security must be an explicit opt-out.

I'm aware that the policy will require modifications to all software that uses MD5. To be honest that's my goal. If you care about FIPS, then any use of MD5 must be a concious and careful decision. I want developers to move away from MD5 and replace it with SipHash24, Blake2 or SHA-2. MD5 should *only* remain when backwards incompatibility prevent migration.
msg285897 - (view) Author: Christian Heimes (christian.heimes) * (Python committer) Date: 2017-01-20 10:20
PS: I'm also against a hashlib.fips module in stdlib. FIPS mode is irrelevant for majority of users and countries. I neither want to confuse people nor introduce more maintenance and documentation burden than necessary. Antoine gave another good reason against a fips module, too.

I'm fine with a used_for_security flag and functions to get/set FIPS state. Something like hashlib.get_fips_mode() is useful for testing.
History
Date User Action Args
2017-05-25 13:26:51cstrataksetnosy: + cstratak
2017-01-20 10:20:58christian.heimessetmessages: + msg285897
2017-01-20 10:14:43christian.heimessetmessages: + msg285895
2017-01-18 08:24:07yolanda.roblasetmessages: + msg285706
2017-01-17 21:08:42doughellmannsetmessages: + msg285684
2017-01-17 21:07:29doughellmannsetmessages: + msg285683
2017-01-17 20:45:33yolanda.roblasetmessages: + msg285681
2017-01-17 20:29:46pitrousetmessages: + msg285679
2017-01-17 20:26:40icordascsetnosy: + icordasc
messages: + msg285678
2017-01-17 20:19:00pitrousetmessages: + msg285677
2017-01-17 20:04:23rbcollinssetmessages: + msg285676
2017-01-17 14:27:26yolanda.roblasetnosy: + yolanda.robla
messages: + msg285643
2017-01-17 13:54:40doughellmannsetmessages: + msg285640
2017-01-17 00:01:34rbcollinssetnosy: + rbcollins
messages: + msg285593
2016-11-23 17:56:05gregory.p.smithsetnosy: - gregory.p.smith
2016-11-23 17:47:18doughellmannsetnosy: + doughellmann
2016-09-08 14:50:28christian.heimessetversions: + Python 3.6, Python 3.7, - Python 3.4
2013-11-06 14:10:29bkabrdasetnosy: + bkabrda
2013-07-20 17:26:43jpokornysetnosy: + jpokorny
2013-06-14 14:07:08christian.heimessetmessages: + msg191130
versions: - Python 3.3
2012-10-07 13:55:43rpetrovsetnosy: + rpetrov
messages: + msg172302
2012-10-06 23:49:16christian.heimessetnosy: + christian.heimes

versions: + Python 3.4
2012-07-27 15:41:51pitrousetmessages: + msg166576
2012-07-27 15:38:22dholthsetnosy: + dholth
messages: + msg166575
2012-06-29 12:25:20pitrousetmessages: + msg164328
2012-06-29 00:39:38lukecarriersetfiles: + virtualenv_distribute
nosy: + lukecarrier
messages: + msg164308

2012-03-15 03:17:02gregory.p.smithsetmessages: + msg155846
2012-03-14 10:23:47pitrousetnosy: + pitrou
messages: + msg155741
2012-03-13 23:15:45gregory.p.smithsetmessages: + msg155688
2011-09-16 19:57:49dmalcolmsetmessages: + msg144154
2011-09-16 19:48:02dmalcolmsetmessages: + msg144153
2011-09-16 19:46:48dmalcolmsetfiles: + 0004-_hashlib-Add-selftest-for-FIPS-mode-and-usedforsecur.patch
2011-09-16 19:46:39dmalcolmsetfiles: + 0003-Add-optional-usedforsecurity-argument-in-various-pla.patch
2011-09-16 19:46:24dmalcolmsetfiles: + 0002-Add-error-handling-to-initialization-of-_hashlib.patch
2011-09-16 19:45:46dmalcolmsetfiles: + 0001-Rework-_hashlib-caching-moving-per-hash-cached-data-.patch

messages: + msg144152
2011-01-03 19:53:51pitrousetversions: + Python 3.3, - Python 3.2
2010-12-14 19:07:37r.david.murraysettype: enhancement
2010-07-12 18:48:15dmalcolmsetfiles: + py3k-hashlib-fips-issue9216.patch
keywords: + patch
messages: + msg110124

stage: needs patch -> patch review
2010-07-10 17:00:04gregory.p.smithsetmessages: + msg109891
2010-07-10 00:22:13dmalcolmcreate