Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update suggested number of iterations for pbkdf2_hmac() #87148

Closed
illia-v mannequin opened this issue Jan 20, 2021 · 20 comments
Closed

Update suggested number of iterations for pbkdf2_hmac() #87148

illia-v mannequin opened this issue Jan 20, 2021 · 20 comments
Assignees
Labels
docs Documentation in the Doc dir

Comments

@illia-v
Copy link
Mannequin

illia-v mannequin commented Jan 20, 2021

BPO 42982
Nosy @rhettinger, @gpshead, @april, @tiran, @ned-deily, @alex, @zware, @miss-islington, @illia-v
PRs
  • bpo-42982: Increase suggested number of iterations of PBKDF2 to 250,000 #24276
  • [3.10] bpo-42982: Improve the text on suggested number of iterations of PBKDF2 (GH-24276) #30951
  • bpo-42982: update pbkdf2 example & add another link #30966
  • [3.10] bpo-42982: update pbkdf2 example & add another link (GH-30966) #30968
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = 'https://github.com/gpshead'
    closed_at = <Date 2022-01-27.08:41:18.359>
    created_at = <Date 2021-01-20.20:06:39.244>
    labels = ['docs']
    title = 'Update suggested number of iterations for pbkdf2_hmac()'
    updated_at = <Date 2022-03-01.20:56:40.688>
    user = 'https://github.com/illia-v'

    bugs.python.org fields:

    activity = <Date 2022-03-01.20:56:40.688>
    actor = 'ned.deily'
    assignee = 'gregory.p.smith'
    closed = True
    closed_date = <Date 2022-01-27.08:41:18.359>
    closer = 'gregory.p.smith'
    components = ['Documentation']
    creation = <Date 2021-01-20.20:06:39.244>
    creator = 'illia-v'
    dependencies = []
    files = []
    hgrepos = []
    issue_num = 42982
    keywords = ['patch']
    message_count = 20.0
    messages = ['385365', '385442', '385455', '385939', '385944', '385992', '386605', '411524', '411566', '411610', '411623', '411624', '411644', '411789', '411844', '411845', '411846', '411879', '411916', '414293']
    nosy_count = 11.0
    nosy_names = ['rhettinger', 'gregory.p.smith', 'april', 'christian.heimes', 'ned.deily', 'alex', 'docs@python', 'zach.ware', 'reaperhulk', 'miss-islington', 'illia-v']
    pr_nums = ['24276', '30951', '30966', '30968']
    priority = 'normal'
    resolution = 'fixed'
    stage = 'commit review'
    status = 'closed'
    superseder = None
    type = None
    url = 'https://bugs.python.org/issue42982'
    versions = []

    @illia-v
    Copy link
    Mannequin Author

    illia-v mannequin commented Jan 20, 2021

    Documentation [1] suggests using at least 100,000 iterations of SHA-256 as of 2013.

    Currently, it is 2021, and it is common to use much more iterations.
    For example, Django will use 260,000 by default in the next 3.2 LTS release and 320,000 in 4.0 [2][3].

    I suggest suggesting at least 250,000 iterations that is a somewhat round number close to the one used by modern libraries.

    [1] https://docs.python.org/3/library/hashlib.html#hashlib.pbkdf2_hmac
    [2] django/django@f2187a2
    [3] django/django@a948d9d

    @illia-v illia-v mannequin assigned docspython Jan 20, 2021
    @illia-v illia-v mannequin added the docs Documentation in the Doc dir label Jan 20, 2021
    @illia-v illia-v mannequin assigned docspython Jan 20, 2021
    @illia-v illia-v mannequin added the docs Documentation in the Doc dir label Jan 20, 2021
    @tiran
    Copy link
    Member

    tiran commented Jan 21, 2021

    Is there any scientific research or mathematical proof for 250,000 iteration?

    @illia-v
    Copy link
    Mannequin Author

    illia-v mannequin commented Jan 21, 2021

    I didn't find any. I think it is based on some benchmarks like openssl speed sha.

    @rhettinger
    Copy link
    Contributor

    FWIW, OnePass uses 100,000. https://support.1password.com/pbkdf2/

    Also, I don't think an additional time factor of 2.5x would make substantial difference in security, but it may make a noticeable difference in user authentication time.

    @illia-v
    Copy link
    Mannequin Author

    illia-v mannequin commented Jan 29, 2021

    FWIW, OnePass uses 100,000. https://support.1password.com/pbkdf2/

    There is a history section on that page. And current 100,000 is ten times more than 1Password used in 2013 when the suggestion was added to the documentation.

    Also, I don't think an additional time factor of 2.5x would make substantial difference in security, but it may make a noticeable difference in user authentication time.

    2.5x difference can be substantial if x is hours, days, or years :)

    @tiran
    Copy link
    Member

    tiran commented Jan 30, 2021

    PBKDF2-HMAC is a serialized algorithm. It cannot be parallized. That means the runtime depends on single core-performance. The single core-performance of desktop and server CPUs hasn't improved much in the last decade. Modern CPUs have more cores, larger caches, and better IPC. Intel Nehalem architecture from 2009 had up to 3.33 GHz. Fast 2020 Comet Lake CPUs have up to 3.7 GHz base frequence and about 5GHz turbo.

    @illia-v
    Copy link
    Mannequin Author

    illia-v mannequin commented Feb 7, 2021

    @april
    Copy link
    Mannequin

    april mannequin commented Jan 24, 2022

    Django uses 390,000 iterations as of late 2021, as does the Python Cryptography project. We should be aligned with their recommendations, or at least a good deal closer than we are now.

    390,000 actually makes it a conservative recommendation for key derivation, as that number of rounds takes ~133ms to compute on my M1 versus 36ms. Usually you're shooting for ~250ms.

    Being off by ~50% is probably okay, being off by this much is considerably worse.

    Anyways, I'd be happy to make such a PR if folks are amenable to it.

    @tiran
    Copy link
    Member

    tiran commented Jan 25, 2022

    My question from last year has not been answered yet. Is there any valid scientific research on the number of rounds or duration? I neither know nor do I understand how Django came up with the numbers. PyCA cryptography copied the numbers without questioning them.

    Were does 250ms come from? 250ms at 100% CPU load sound way too costly for a website login and too fast for a password manager. For comparison Argon2's default runtime on my laptop is 50ms.

    @april
    Copy link
    Mannequin

    april mannequin commented Jan 25, 2022

    Django probably stores and computes more passwords than every other Python framework combined, and it doesn't provide you any control over the number of iterations. And it hasn't for years. If this were truly a problem, wouldn't their users be complaining about it constantly?

    Werkzeug was doing 150,000 iterations as of 0.15.x, released three years ago, and does 260,000 iterations today. Again, no complaints or issues.

    In practicality, this is almost never a problem - user logins and password changes are extremely rare events compared to all other activity, and so the computation time is essentially irrelevant outside response time for that individual user. No matter how many users, the systems are scaling such that the computation time of that rare event remains a fraction of overall CPU use.

    @reaperhulk
    Copy link
    Mannequin

    reaperhulk mannequin commented Jan 25, 2022

    NIST provides no official guidance on iteration count other than NIST SP 800-132 Appendix A.2.2, which states "The number of iterations should be set as high as can be tolerated for the environment, while maintaining acceptable performance."

    I can think of no better resource for what constitutes acceptable performance at the highest iteration count than popular packages like Django. Django's choice (and lack of evidence that they've had any cause to revert due to performance issues) argues that 390k iterations is a reasonable number in 2022. Certainly the 100k suggested in these docs as of 2013 is no longer best practice as we've seen 9 years of computational improvement in the intervening time.

    I would, additionally, suggest that the documentation recommend the use of scrypt where possible over any iteration count of PBKDF2, but increasing the iteration count is still a useful improvement to the docs!

    @tiran
    Copy link
    Member

    tiran commented Jan 25, 2022

    You are arguing from the perspective of a Django/werkzeug developer and you are using experiential domain knowledge to argue for higher recommendation.

    I'm asking for a scientific answer. Based on my experience 100k PBKDF2 HMAC-SHA256 rounds is already a DoS issue for some use cases. For other uses cases even 500k rounds is not the right answer, because the application should rather use a different algorithm all together.

    If you are concerned about PBKDF2's strength, then better switch to Scrypt or Argon2. They are better suited against GPU-based crackers. PBKDF2 is still required for FIPS compliance, but most people can (and should!) ignore FIPS.

    @alex
    Copy link
    Member

    alex commented Jan 25, 2022

    Sticking with 100k is not scientific though ;-) Empiricism is science!

    I'm probably the person responsible for Django's process, which is to increase by some % (10% or 20% IIRC) every release.

    As you point out, the exact value one should use is a function of context, which we don't have as documentation authors. However, what we can do is try to select a value that's most likely to be practical for many users and will in-turn protect their users data most. 100k isn't that value, and taking inspiration from places that have had their values tested by many users is intuitive to me.

    @zware
    Copy link
    Member

    zware commented Jan 26, 2022

    Rather than suggesting an actual number, perhaps we should link to an external resources that covers how to choose the number?

    Or we leave it vague and say "The number of iterations should be chosen based on the hash algorithm and computing power; there is no universal recommendation, but hundreds of thousands of iterations may be reasonable." This avoids bikeshedding a specific number, but still gives a general idea of the magnitude of number involved.

    @gpshead
    Copy link
    Member

    gpshead commented Jan 27, 2022

    New changeset 897ce90 by Illia Volochii in branch 'main':
    bpo-42982: Improve the text on suggested number of iterations of PBKDF2 (GH-24276)
    897ce90

    @gpshead
    Copy link
    Member

    gpshead commented Jan 27, 2022

    I reworked the PR and went with less specific text and linking to the NIST 800 132 appendix as guidance on how people should determine what is right for them.

    there is no one right number. it is application specific.

    thanks for everyone's valuable input!

    @gpshead gpshead closed this as completed Jan 27, 2022
    @gpshead gpshead assigned gpshead and unassigned docspython Jan 27, 2022
    @gpshead gpshead closed this as completed Jan 27, 2022
    @gpshead gpshead assigned gpshead and unassigned docspython Jan 27, 2022
    @miss-islington
    Copy link
    Contributor

    New changeset 1ecc98d by Miss Islington (bot) in branch '3.10':
    bpo-42982: Improve the text on suggested number of iterations of PBKDF2 (GH-24276)
    1ecc98d

    @april
    Copy link
    Mannequin

    april mannequin commented Jan 27, 2022

    The code snippet still uses 100000. Given that many people will simply copy-and-paste without questioning, should we update that too?

    @miss-islington
    Copy link
    Contributor

    New changeset ace0aa2 by Gregory P. Smith in branch 'main':
    bpo-42982: update pbkdf2 example & add another link (GH-30966)
    ace0aa2

    @ned-deily
    Copy link
    Member

    New changeset 7dbb2f8 by Miss Islington (bot) in branch '3.10':
    bpo-42982: update pbkdf2 example & add another link (GH-30966) (bpo-30968)
    7dbb2f8

    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    docs Documentation in the Doc dir
    Projects
    None yet
    Development

    No branches or pull requests

    7 participants