Issue 42982: Update suggested number of iterations for pbkdf2_hmac()

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

This issue has been migrated to GitHub: https://github.com/python/cpython/issues/87148

classification

Title:	Update suggested number of iterations for pbkdf2_hmac()
Type:		Stage:	commit review
Components:	Documentation	Versions:

process

Status:	closed	Resolution:	fixed
Dependencies:		Superseder:
Assigned To:	gregory.p.smith	Nosy List:	alex, april, christian.heimes, docs@python, gregory.p.smith, illia-v, miss-islington, ned.deily, reaperhulk, rhettinger, zach.ware
Priority:	normal	Keywords:	patch

Created on 2021-01-20 20:06 by illia-v, last changed 2022-04-11 14:59 by admin. This issue is now closed.

Pull Requests
URL	Status	Linked	Edit
PR 24276	merged	illia-v, 2021-01-20 20:16
PR 30951	merged	miss-islington, 2022-01-27 08:39
PR 30966	merged	gregory.p.smith, 2022-01-27 19:34
PR 30968	merged	miss-islington, 2022-01-27 20:18

Messages (20)
msg385365 - (view)	Author: Illia Volochii (illia-v) *	Date: 2021-01-20 20:06
Documentation [1] suggests using at least 100,000 iterations of SHA-256 as of 2013. Currently, it is 2021, and it is common to use much more iterations. For example, Django will use 260,000 by default in the next 3.2 LTS release and 320,000 in 4.0 [2][3]. I suggest suggesting at least 250,000 iterations that is a somewhat round number close to the one used by modern libraries. [1] https://docs.python.org/3/library/hashlib.html#hashlib.pbkdf2_hmac [2] https://github.com/django/django/commit/f2187a227f7a3c80282658e699ae9b04023724e5 [3] https://github.com/django/django/commit/a948d9df394aafded78d72b1daa785a0abfeab48
msg385442 - (view)	Author: Christian Heimes (christian.heimes) *	Date: 2021-01-21 19:14
Is there any scientific research or mathematical proof for 250,000 iteration?
msg385455 - (view)	Author: Illia Volochii (illia-v) *	Date: 2021-01-21 22:39
I didn't find any. I think it is based on some benchmarks like `openssl speed sha`.
msg385939 - (view)	Author: Raymond Hettinger (rhettinger) *	Date: 2021-01-29 20:59
FWIW, OnePass uses 100,000. https://support.1password.com/pbkdf2/ Also, I don't think an additional time factor of 2.5x would make substantial difference in security, but it may make a noticeable difference in user authentication time.
msg385944 - (view)	Author: Illia Volochii (illia-v) *	Date: 2021-01-29 21:40
> FWIW, OnePass uses 100,000. https://support.1password.com/pbkdf2/ There is a history section on that page. And current 100,000 is ten times more than 1Password used in 2013 when the suggestion was added to the documentation. > Also, I don't think an additional time factor of 2.5x would make substantial difference in security, but it may make a noticeable difference in user authentication time. 2.5x difference can be substantial if x is hours, days, or years :)
msg385992 - (view)	Author: Christian Heimes (christian.heimes) *	Date: 2021-01-30 18:30
PBKDF2-HMAC is a serialized algorithm. It cannot be parallized. That means the runtime depends on single core-performance. The single core-performance of desktop and server CPUs hasn't improved much in the last decade. Modern CPUs have more cores, larger caches, and better IPC. Intel Nehalem architecture from 2009 had up to 3.33 GHz. Fast 2020 Comet Lake CPUs have up to 3.7 GHz base frequence and about 5GHz turbo.
msg386605 - (view)	Author: Illia Volochii (illia-v) *	Date: 2021-02-07 20:32
Clock rate is not the only indicator. Some new instructions supporting SHA were introduced during the last decade. https://software.intel.com/content/www/us/en/develop/articles/intel-sha-extensions.html https://software.intel.com/content/www/us/en/develop/articles/improving-openssl-performance.html https://www.intel.com/content/dam/www/public/us/en/documents/white-papers/sha-256-implementations-paper.pdf
msg411524 - (view)	Author: April King (april)	Date: 2022-01-24 22:42
Django uses 390,000 iterations as of late 2021, as does the Python Cryptography project. We should be aligned with their recommendations, or at least a good deal closer than we are now. 390,000 actually makes it a conservative recommendation for key derivation, as that number of rounds takes ~133ms to compute on my M1 versus 36ms. Usually you're shooting for ~250ms. Being off by ~50% is probably okay, being off by this much is considerably worse. Anyways, I'd be happy to make such a PR if folks are amenable to it.
msg411566 - (view)	Author: Christian Heimes (christian.heimes) *	Date: 2022-01-25 08:50
My question from last year has not been answered yet. Is there any valid scientific research on the number of rounds or duration? I neither know nor do I understand how Django came up with the numbers. PyCA cryptography copied the numbers without questioning them. Were does 250ms come from? 250ms at 100% CPU load sound way too costly for a website login and too fast for a password manager. For comparison Argon2's default runtime on my laptop is 50ms.
msg411610 - (view)	Author: April King (april)	Date: 2022-01-25 15:14
Django probably stores and computes more passwords than every other Python framework combined, and it doesn't provide you any control over the number of iterations. And it hasn't for years. If this were truly a problem, wouldn't their users be complaining about it constantly? Werkzeug was doing 150,000 iterations as of 0.15.x, released three years ago, and does 260,000 iterations today. Again, no complaints or issues. In practicality, this is almost never a problem - user logins and password changes are extremely rare events compared to all other activity, and so the computation time is essentially irrelevant outside response time for that individual user. No matter how many users, the systems are scaling such that the computation time of that rare event remains a fraction of overall CPU use.
msg411623 - (view)	Author: Paul Kehrer (reaperhulk)	Date: 2022-01-25 15:46
NIST provides no official guidance on iteration count other than NIST SP 800-132 Appendix A.2.2, which states "The number of iterations should be set as high as can be tolerated for the environment, while maintaining acceptable performance." I can think of no better resource for what constitutes acceptable performance at the highest iteration count than popular packages like Django. Django's choice (and lack of evidence that they've had any cause to revert due to performance issues) argues that 390k iterations is a reasonable number in 2022. Certainly the 100k suggested in these docs as of 2013 is no longer best practice as we've seen 9 years of computational improvement in the intervening time. I would, additionally, suggest that the documentation recommend the use of scrypt where possible over any iteration count of PBKDF2, but increasing the iteration count is still a useful improvement to the docs!
msg411624 - (view)	Author: Christian Heimes (christian.heimes) *	Date: 2022-01-25 15:56
You are arguing from the perspective of a Django/werkzeug developer and you are using experiential domain knowledge to argue for higher recommendation. I'm asking for a scientific answer. Based on my experience 100k PBKDF2 HMAC-SHA256 rounds is already a DoS issue for some use cases. For other uses cases even 500k rounds is not the right answer, because the application should rather use a different algorithm all together. If you are concerned about PBKDF2's strength, then better switch to Scrypt or Argon2. They are better suited against GPU-based crackers. PBKDF2 is still required for FIPS compliance, but most people can (and should!) ignore FIPS.
msg411644 - (view)	Author: Alex Gaynor (alex) *	Date: 2022-01-25 17:48
Sticking with 100k is not scientific though ;-) Empiricism is science! I'm probably the person responsible for Django's process, which is to increase by some % (10% or 20% IIRC) every release. As you point out, the exact value one should use is a function of context, which we don't have as documentation authors. However, what we can do is try to select a value that's most likely to be practical for many users and will in-turn protect their users data most. 100k isn't that value, and taking inspiration from places that have had their values tested by many users is intuitive to me.
msg411789 - (view)	Author: Zachary Ware (zach.ware) *	Date: 2022-01-26 20:12
Rather than suggesting an actual number, perhaps we should link to an external resources that covers how to choose the number? Or we leave it vague and say "The number of iterations should be chosen based on the hash algorithm and computing power; there is no universal recommendation, but hundreds of thousands of iterations may be reasonable." This avoids bikeshedding a specific number, but still gives a general idea of the magnitude of number involved.
msg411844 - (view)	Author: Gregory P. Smith (gregory.p.smith) *	Date: 2022-01-27 08:39
New changeset 897ce9018775bcd679fb49aa17258f8f6e818e23 by Illia Volochii in branch 'main': bpo-42982: Improve the text on suggested number of iterations of PBKDF2 (GH-24276) https://github.com/python/cpython/commit/897ce9018775bcd679fb49aa17258f8f6e818e23
msg411845 - (view)	Author: Gregory P. Smith (gregory.p.smith) *	Date: 2022-01-27 08:41
I reworked the PR and went with less specific text and linking to the NIST 800 132 appendix as guidance on how people should determine what is right for them. there is no one right number. it is application specific. thanks for everyone's valuable input!
msg411846 - (view)	Author: miss-islington (miss-islington)	Date: 2022-01-27 09:02
New changeset 1ecc98dedb7ae77c2d806a70b52dfecdac39ff5b by Miss Islington (bot) in branch '3.10': bpo-42982: Improve the text on suggested number of iterations of PBKDF2 (GH-24276) https://github.com/python/cpython/commit/1ecc98dedb7ae77c2d806a70b52dfecdac39ff5b
msg411879 - (view)	Author: April King (april)	Date: 2022-01-27 14:16
The code snippet still uses 100000. Given that many people will simply copy-and-paste without questioning, should we update that too?
msg411916 - (view)	Author: miss-islington (miss-islington)	Date: 2022-01-27 20:18
New changeset ace0aa2a2793ba4a2b03e56c4ec375c5470edee8 by Gregory P. Smith in branch 'main': bpo-42982: update pbkdf2 example & add another link (GH-30966) https://github.com/python/cpython/commit/ace0aa2a2793ba4a2b03e56c4ec375c5470edee8
msg414293 - (view)	Author: Ned Deily (ned.deily) *	Date: 2022-03-01 20:56
New changeset 7dbb2f8eaf07c105f4d2bb0fe61763463e68372d by Miss Islington (bot) in branch '3.10': bpo-42982: update pbkdf2 example & add another link (GH-30966) (#30968) https://github.com/python/cpython/commit/7dbb2f8eaf07c105f4d2bb0fe61763463e68372d

History
Date	User	Action	Args
2022-04-11 14:59:40	admin	set	github: 87148
2022-03-01 20:56:40	ned.deily	set	nosy: + ned.deily messages: + msg414293
2022-01-27 20:18:46	miss-islington	set	pull_requests: + pull_request29146
2022-01-27 20:18:36	miss-islington	set	messages: + msg411916
2022-01-27 19:34:10	gregory.p.smith	set	pull_requests: + pull_request29145
2022-01-27 14:16:30	april	set	messages: + msg411879
2022-01-27 09:02:02	miss-islington	set	messages: + msg411846
2022-01-27 08:41:18	gregory.p.smith	set	status: open -> closed messages: + msg411845 assignee: docs@python -> gregory.p.smith resolution: fixed stage: patch review -> commit review
2022-01-27 08:39:22	gregory.p.smith	set	nosy: + gregory.p.smith messages: + msg411844
2022-01-27 08:39:19	miss-islington	set	nosy: + miss-islington pull_requests: + pull_request29130
2022-01-26 20:12:02	zach.ware	set	nosy: + zach.ware messages: + msg411789
2022-01-25 17:48:06	alex	set	nosy: + alex messages: + msg411644
2022-01-25 15:56:53	christian.heimes	set	messages: + msg411624
2022-01-25 15:46:07	reaperhulk	set	nosy: + reaperhulk messages: + msg411623
2022-01-25 15:14:07	april	set	messages: + msg411610
2022-01-25 08:50:50	christian.heimes	set	messages: + msg411566
2022-01-24 22:42:04	april	set	nosy: + april messages: + msg411524
2021-02-07 20:32:14	illia-v	set	messages: + msg386605
2021-01-30 18:30:14	christian.heimes	set	messages: + msg385992
2021-01-29 21:40:15	illia-v	set	messages: + msg385944
2021-01-29 20:59:48	rhettinger	set	nosy: + rhettinger messages: + msg385939
2021-01-21 22:39:09	illia-v	set	messages: + msg385455
2021-01-21 19:14:30	christian.heimes	set	nosy: + christian.heimes messages: + msg385442
2021-01-20 20:16:38	illia-v	set	keywords: + patch stage: patch review pull_requests: + pull_request23099
2021-01-20 20:06:39	illia-v	create