New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update suggested number of iterations for pbkdf2_hmac() #87148
Comments
Documentation [1] suggests using at least 100,000 iterations of SHA-256 as of 2013. Currently, it is 2021, and it is common to use much more iterations. I suggest suggesting at least 250,000 iterations that is a somewhat round number close to the one used by modern libraries. [1] https://docs.python.org/3/library/hashlib.html#hashlib.pbkdf2_hmac |
Is there any scientific research or mathematical proof for 250,000 iteration? |
I didn't find any. I think it is based on some benchmarks like |
FWIW, OnePass uses 100,000. https://support.1password.com/pbkdf2/ Also, I don't think an additional time factor of 2.5x would make substantial difference in security, but it may make a noticeable difference in user authentication time. |
There is a history section on that page. And current 100,000 is ten times more than 1Password used in 2013 when the suggestion was added to the documentation.
2.5x difference can be substantial if x is hours, days, or years :) |
PBKDF2-HMAC is a serialized algorithm. It cannot be parallized. That means the runtime depends on single core-performance. The single core-performance of desktop and server CPUs hasn't improved much in the last decade. Modern CPUs have more cores, larger caches, and better IPC. Intel Nehalem architecture from 2009 had up to 3.33 GHz. Fast 2020 Comet Lake CPUs have up to 3.7 GHz base frequence and about 5GHz turbo. |
Clock rate is not the only indicator. Some new instructions supporting SHA were introduced during the last decade. https://software.intel.com/content/www/us/en/develop/articles/intel-sha-extensions.html |
Django uses 390,000 iterations as of late 2021, as does the Python Cryptography project. We should be aligned with their recommendations, or at least a good deal closer than we are now. 390,000 actually makes it a conservative recommendation for key derivation, as that number of rounds takes ~133ms to compute on my M1 versus 36ms. Usually you're shooting for ~250ms. Being off by ~50% is probably okay, being off by this much is considerably worse. Anyways, I'd be happy to make such a PR if folks are amenable to it. |
My question from last year has not been answered yet. Is there any valid scientific research on the number of rounds or duration? I neither know nor do I understand how Django came up with the numbers. PyCA cryptography copied the numbers without questioning them. Were does 250ms come from? 250ms at 100% CPU load sound way too costly for a website login and too fast for a password manager. For comparison Argon2's default runtime on my laptop is 50ms. |
Django probably stores and computes more passwords than every other Python framework combined, and it doesn't provide you any control over the number of iterations. And it hasn't for years. If this were truly a problem, wouldn't their users be complaining about it constantly? Werkzeug was doing 150,000 iterations as of 0.15.x, released three years ago, and does 260,000 iterations today. Again, no complaints or issues. In practicality, this is almost never a problem - user logins and password changes are extremely rare events compared to all other activity, and so the computation time is essentially irrelevant outside response time for that individual user. No matter how many users, the systems are scaling such that the computation time of that rare event remains a fraction of overall CPU use. |
NIST provides no official guidance on iteration count other than NIST SP 800-132 Appendix A.2.2, which states "The number of iterations should be set as high as can be tolerated for the environment, while maintaining acceptable performance." I can think of no better resource for what constitutes acceptable performance at the highest iteration count than popular packages like Django. Django's choice (and lack of evidence that they've had any cause to revert due to performance issues) argues that 390k iterations is a reasonable number in 2022. Certainly the 100k suggested in these docs as of 2013 is no longer best practice as we've seen 9 years of computational improvement in the intervening time. I would, additionally, suggest that the documentation recommend the use of scrypt where possible over any iteration count of PBKDF2, but increasing the iteration count is still a useful improvement to the docs! |
You are arguing from the perspective of a Django/werkzeug developer and you are using experiential domain knowledge to argue for higher recommendation. I'm asking for a scientific answer. Based on my experience 100k PBKDF2 HMAC-SHA256 rounds is already a DoS issue for some use cases. For other uses cases even 500k rounds is not the right answer, because the application should rather use a different algorithm all together. If you are concerned about PBKDF2's strength, then better switch to Scrypt or Argon2. They are better suited against GPU-based crackers. PBKDF2 is still required for FIPS compliance, but most people can (and should!) ignore FIPS. |
Sticking with 100k is not scientific though ;-) Empiricism is science! I'm probably the person responsible for Django's process, which is to increase by some % (10% or 20% IIRC) every release. As you point out, the exact value one should use is a function of context, which we don't have as documentation authors. However, what we can do is try to select a value that's most likely to be practical for many users and will in-turn protect their users data most. 100k isn't that value, and taking inspiration from places that have had their values tested by many users is intuitive to me. |
Rather than suggesting an actual number, perhaps we should link to an external resources that covers how to choose the number? Or we leave it vague and say "The number of iterations should be chosen based on the hash algorithm and computing power; there is no universal recommendation, but hundreds of thousands of iterations may be reasonable." This avoids bikeshedding a specific number, but still gives a general idea of the magnitude of number involved. |
I reworked the PR and went with less specific text and linking to the NIST 800 132 appendix as guidance on how people should determine what is right for them. there is no one right number. it is application specific. thanks for everyone's valuable input! |
The code snippet still uses 100000. Given that many people will simply copy-and-paste without questioning, should we update that too? |
Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.
Show more details
GitHub fields:
bugs.python.org fields:
The text was updated successfully, but these errors were encountered: