classification
Title: FIPS support for hashlib
Type: enhancement Stage: patch review
Components: Library (Lib) Versions: Python 3.4, Python 3.3
process
Status: open Resolution:
Dependencies: Superseder:
Assigned To: Nosy List: christian.heimes, dholth, dmalcolm, gregory.p.smith, lukecarrier, pitrou, rpetrov
Priority: normal Keywords: patch

Created on 2010-07-10 00:22 by dmalcolm, last changed 2012-10-07 13:55 by rpetrov.

Files
File name Uploaded Description Edit
py3k-hashlib-fips-issue9216.patch dmalcolm, 2010-07-12 18:48 Patch against py3k branch review
0001-Rework-_hashlib-caching-moving-per-hash-cached-data-.patch dmalcolm, 2011-09-16 19:45
0002-Add-error-handling-to-initialization-of-_hashlib.patch dmalcolm, 2011-09-16 19:46
0003-Add-optional-usedforsecurity-argument-in-various-pla.patch dmalcolm, 2011-09-16 19:46
0004-_hashlib-Add-selftest-for-FIPS-mode-and-usedforsecur.patch dmalcolm, 2011-09-16 19:46
virtualenv_distribute lukecarrier, 2012-06-29 00:39
Messages (14)
msg109808 - (view) Author: Dave Malcolm (dmalcolm) (Python committer) Date: 2010-07-10 00:22
(taking the liberty of adding gregory.p.smith to the "nosy" list; hope that's OK)

This is a higher-level take on issue 9146.

Some versions of OpenSSL have a FIPS mode that can refuse the use of non-certified hashes.

The idea is that FIPS mode should prevent the use of non-certified hashes for security uses.  For example, MD5 shouldn't be used for signatures these days (see e.g. http://www.kb.cert.org/vuls/id/836068).

However, there are legitimate non-security uses of these hashes.  For example, one might use MD5 hashes of objects to places them in bins for later retrieval, purely as a speed optimization (e.g. files in directories on a filesystem).

I'm working on a patch to hashlib which would better support this, but it involves an API expansion, and I wanted to sound things out first.

The API idea is to introduce a new keyword argument, say "usedforsecurity" to hashlib.new() and to the named hashlib constructors, such as hashlib.md5().  This would default to True.  If code is using these hashes in FIPS mode, the developer needs to override this: usedforsecurity=False to mark the callsite as a non-security-sensitive location.  Internally, this would lead to the EVP_MD_CTX being initialized with EVP_MD_CTX_FLAG_NON_FIPS_ALLOW.

This way, if you run unaudited code in an environment that cares about FIPS, the code will raise exceptions if it uses a non-valid hash, but during code audit the callsites can be marked clearly as "usedforsecurity=False", and be used as before.

In non-FIPS environments, the flag would be ignored.

Am I right in thinking that the _hashlib module should be treated as an implementation detail here?  The entry points within _hashlib are likely to double, with a pair of pre-initialized contexts, one with the flag, one without.

Does this sound reasonable?  Thanks.
msg109891 - (view) Author: Gregory P. Smith (gregory.p.smith) * (Python committer) Date: 2010-07-10 17:00
That sounds fine to me and I do like the usedforsecurity annotation on the API.  I'll gladly review any patches.
msg110124 - (view) Author: Dave Malcolm (dmalcolm) (Python committer) Date: 2010-07-12 18:48
Attached is a patch against the py3k branch which implements this.

I've checked that it builds against openssl-0.9.8o.tar.gz, openssl-1.0.0a.tar.gz, and against Fedora 12 and 13's heavily-patched openssl-1.0.0. The bulk of my testing has been against Fedora's openssl.

I've added selftests to try to verify the new API.  I try to detect if the OpenSSL enforces FIPS, via trying to run "openssl md5" as a subprocess, and seeing if I can trigger an error.

With FIPS enforcement off, all tests pass when built against 0.9.8o and 1.0.0a and F13's 1.0.0, other than those for FIPS enforcement itself, which skip.

With FIPS enforcement on, all tests pass when built against F13's openssl.  (I haven't yet figured out how to get the fips selftest to pass for the other builds, it's testing checksums against the wrong libcrypto for some reason; see caveat below):
$ ./python Lib/test/test_hashlib.py
$ OPENSSL_FORCE_FIPS_MODE=1 ./python Lib/test/test_hashlib.py

For all of the various contexts stored in _hashopenssl.c, we now store two: one with the override flag, one without.  This required some reworking of the various preprocessor magic in that file, so I've gathered everything related to an algorithm into a structure, and moved most of the logic into functions, rather than macros.  I'm assuming that these will get inlined under optimization, and that the bulk of the time that you're trying to optimize out are the EVP lookups and initializations, rather than function call overhead.

How's this looking?

Do I need to add a dummy "usedforsecurity" arg to all of the non-openssl message digest implementations within the tree?


Unfortunately, if fips mode is on, and the fips selftest fails for the openssl library, every hash use will fail, both with and without the flag:
  ValueError: error:2D07D06A:FIPS routines:EVP_DigestInit_ex:fips selftest failed
and this leads to a crippled hashlib module.  It's not clear to me if there's a good way to handle this.  (Having said that, a site that has the technical expertise to opt-in to FIPS mode is hopefully able to diagnose this, and fix their openssl library)
msg144152 - (view) Author: Dave Malcolm (dmalcolm) (Python committer) Date: 2011-09-16 19:45
I've refreshed this patch against the latest version of the code in hg.

In an attempt to make it easier to review, I've split it up into four (so far) thematic patches, which apply in sequence.
msg144153 - (view) Author: Dave Malcolm (dmalcolm) (Python committer) Date: 2011-09-16 19:48
[and yes, I used git to generate the 4 patches; sorry ]
msg144154 - (view) Author: Dave Malcolm (dmalcolm) (Python committer) Date: 2011-09-16 19:57
The cumulative effect of the above patches (to _hashlib) are equivalent to what I've applied downstream to python 2 in RHEL 6.0 and Fedora 17 onwards, and python 3 in Fedora 17 onwards.

In those environments I've additionally patched hashlib to only use _hashlib, rather than falling back on _md5 etc, since otherwise you get confusing error messages from hashlib.md5() when it defers to _md5 due to FIPS enforcement.  In my downstream builds we can be sure of building against OpenSSL, but this other part of the patch seems less appropriate for upstream python, given that upstream python tries to be flexible in terms of its dependencies.

Hope this makes sense.
msg155688 - (view) Author: Gregory P. Smith (gregory.p.smith) * (Python committer) Date: 2012-03-13 23:15
quick summary of comments from pycon sprints discussion:

this looks pretty good.  i like the 0001 refactoring cleanup.  a couple things to fix in error handling (better messages and some bogus handling in the test). dmalcolm has the notes on what to do.

do it and commit away or ask for more review as you see fit.
msg155741 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2012-03-14 10:23
Patch 0002:

- cached_info->error_msg doesn't seem deallocated anywhere?

Patch 0003:

- "usedforsecurity" is a poor name IMO; make it shorter and/or PEP8-ize it ("used_for_security")
- the 2-element context array thing is obscure: why not distinct "ctx" and "ctx_non_fips" members?
- "this could fail, e.g. low on memory, or encodings": doesn't it lack an error-handling path, then?

Patch 0004:

- openssl_can_enforce_fips(): instead of calling OpenSSL in a subprocess, perhaps it's possible to expose a public flag in the hashlib module (e.g. "hashlib.HAS_FIPS")? or is this info not fetchable programmatically?
- openssl_can_enforce_fips() needs to check the subprocess return code, in case another error happened
- run_command_with_fips_enforcement() should use the assert_python_ok() and assert_python_failure() functions from Lib/test/script_helper.py

Overall:

- please put back the unconditional tests for the "usedforsecurity" argument (even when FIPS can't be enforced)
- the patches lack docs (Doc/library/hashlib.rst)
- please commit all this as a single commit, not 4 different ones
msg155846 - (view) Author: Gregory P. Smith (gregory.p.smith) * (Python committer) Date: 2012-03-15 03:17
My summary of our discussion was pretty terse. :)  dmalcolm has more detailed TODO list notes that include things like the error cases and .rst documentation.

As for how to commit it, i'd make 0001 its own commit as it is a useful refactoring otherwise unrelated to this change.

I'll leave it entirely up to dmalcolm how many commits he wants 0002 onward to be.  No need to be picky.

usedforsecurity vs used_for_security, agreed, used_for_security is better.

dmalcolm was going to make an enum to index the two element array, that'd give meaningful names instead of 0 or 1.  simply using two named variables would also work but it would require the loop in 0003 to be expanded or turned into a small static method for the body (not a bad idea) instead.  i'm fine with either.
msg164308 - (view) Author: Luke Carrier (lukecarrier) Date: 2012-06-29 00:39
I've not done enough digging on the issue I'm presently experiencing to draw any conclusions make any suggestions, but this change seems to break the present distribute module (version 0.6.27). It appears it will likely break a great deal of other code too.

I've pasted the relevant output here and attached the full traceback.
  File "/usr/lib64/python3.2/hashlib.py", line 112, in __get_openssl_constructor
    f(usedforsecurity=False)
TypeError: openssl_md5() takes no keyword arguments

Whilst I agree with the notion behind this change, Fedora's quick actions have led to me spending the best part of an hour of the night before ship day diagnosing issues caused by undocumented (or at least under-documented) changes to code I haven't written or interfaced with. _Please_ publicise the change a little better? Pretty please!?
msg164328 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2012-06-29 12:25
> _Please_ publicise the change a little better? Pretty please!?

This changes haven't been committed in Python, so you probably want to post on the Fedora bug tracker instead.
msg166575 - (view) Author: Daniel Holth (dholth) (Python committer) Date: 2012-07-27 15:38
While you are at it, can you edit the docs to put md5() at the bottom of the page at the back of the list in a 2-point font and raise a DeprecationWarning("This function is totally lame, and it is slower than SHA-3, get with the program.") the first time it is used? I don't agree that md5 has a legitimate place in systems designed after 1996.
msg166576 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2012-07-27 15:41
> While you are at it, can you edit the docs to put md5() at the bottom
> of the page at the back of the list in a 2-point font and raise a
> DeprecationWarning("This function is totally lame, and it is slower
> than SHA-3, get with the program.") the first time it is used?

Please... don't make suggestions unrelated to the issue. Open a new issue instead.
msg172302 - (view) Author: Roumen Petrov (rpetrov) Date: 2012-10-07 13:55
Everything in this issue posted until now has to be managed as vendor patch.
History
Date User Action Args
2012-10-07 13:55:43rpetrovsetnosy: + rpetrov
messages: + msg172302
2012-10-06 23:49:16christian.heimessetnosy: + christian.heimes

versions: + Python 3.4
2012-07-27 15:41:51pitrousetmessages: + msg166576
2012-07-27 15:38:22dholthsetnosy: + dholth
messages: + msg166575
2012-06-29 12:25:20pitrousetmessages: + msg164328
2012-06-29 00:39:38lukecarriersetfiles: + virtualenv_distribute
nosy: + lukecarrier
messages: + msg164308

2012-03-15 03:17:02gregory.p.smithsetmessages: + msg155846
2012-03-14 10:23:47pitrousetnosy: + pitrou
messages: + msg155741
2012-03-13 23:15:45gregory.p.smithsetmessages: + msg155688
2011-09-16 19:57:49dmalcolmsetmessages: + msg144154
2011-09-16 19:48:02dmalcolmsetmessages: + msg144153
2011-09-16 19:46:48dmalcolmsetfiles: + 0004-_hashlib-Add-selftest-for-FIPS-mode-and-usedforsecur.patch
2011-09-16 19:46:39dmalcolmsetfiles: + 0003-Add-optional-usedforsecurity-argument-in-various-pla.patch
2011-09-16 19:46:24dmalcolmsetfiles: + 0002-Add-error-handling-to-initialization-of-_hashlib.patch
2011-09-16 19:45:46dmalcolmsetfiles: + 0001-Rework-_hashlib-caching-moving-per-hash-cached-data-.patch

messages: + msg144152
2011-01-03 19:53:51pitrousetversions: + Python 3.3, - Python 3.2
2010-12-14 19:07:37r.david.murraysettype: enhancement
2010-07-12 18:48:15dmalcolmsetfiles: + py3k-hashlib-fips-issue9216.patch
keywords: + patch
messages: + msg110124

stage: needs patch -> patch review
2010-07-10 17:00:04gregory.p.smithsetmessages: + msg109891
2010-07-10 00:22:13dmalcolmcreate