Issue 47175: Allow applications to tune the condition that triggers a GIL release and implementation choice in hashlib

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

This issue has been migrated to GitHub: https://github.com/python/cpython/issues/91331

classification

Title:	Allow applications to tune the condition that triggers a GIL release and implementation choice in hashlib
Type:	enhancement	Stage:	needs patch
Components:		Versions:

process

Status:	open	Resolution:
Dependencies:		Superseder:
Assigned To:		Nosy List:	gregory.p.smith
Priority:	low	Keywords:

Created on 2022-03-30 21:28 by gregory.p.smith, last changed 2022-04-11 14:59 by admin.

Messages (1)
msg416401 - (view)	Author: Gregory P. Smith (gregory.p.smith) *	Date: 2022-03-30 21:28
## Background All `hashlib` computations and `binascii.crc32` and `zlib.crc32` release the GIL around their computational core. But they use a hard coded length check to determine when to do so, or always do it. That already accomplishes the larger good of releasing the GIL on big computations. But _probably_ just serves to slow down smaller ones when a GIL release adds more overhead than a context switch to another thread could meaningfully provide in terms of forward progress. ## Desire 1 Determine if a threshold should exist at all (should we just always release the GIL for these?) and if so, allow it to be tuned on a per algorithm basis. This comes at the same time as other enhancements like bpo-47102 and its windows and macos cousins could shift us towards using OS kernel APIs for a subset of algorithms where available - which may effectively "always release" the GIL on OS APIs that are virtual IO call based such as bpo-47102's. ## Desire 2 When multiple implementations of an algorithm may be available, allow the user to configure data length thresholds for when each one is triggered. Without meaningfully slowing most things down by adding such logic. Different implementations have different setup and finalization time which can make them more useful for large data rather than tiny. --- I'm marking this low priority as it veers towards over-optimization. :) Creating benchmarks and a thing to live in Tools/ that people could run on their target platform to provide a tuning suggestion of what thresholds work best for their needs. Related inspiring work: OSes often benchmark several algorithm implementations up front to pick a "best" to use for a given platform (ex: see what the Linux kernel does for hashes and raid algorithms). This extends that idea and acknowledges latency as important, not exclusively thru-put.

msg416401 - (view)

Author: Gregory P. Smith (gregory.p.smith) * (Python committer)

Date: 2022-03-30 21:28

## Background

All `hashlib` computations and `binascii.crc32` and `zlib.crc32` release the GIL around their computational core. But they use a hard coded length check to determine when to do so, or always do it.

That already accomplishes the larger good of releasing the GIL on big computations. But _probably_ just serves to slow down smaller ones when a GIL release adds more overhead than a context switch to another thread could meaningfully provide in terms of forward progress.

## Desire 1

Determine if a threshold should exist at all (should we just always release the GIL for these?) and if so, allow it to be tuned on a per algorithm basis.

This comes at the same time as other enhancements like bpo-47102 and its windows and macos cousins could shift us towards using OS kernel APIs for a subset of algorithms where available - which may effectively "always release" the GIL on OS APIs that are virtual IO call based such as bpo-47102's.

## Desire 2

When multiple implementations of an algorithm may be available, allow the user to configure data length thresholds for when each one is triggered. Without meaningfully slowing most things down by adding such logic. Different implementations have different setup and finalization time which can make them more useful for large data rather than tiny.

---

I'm marking this low priority as it veers towards over-optimization. :) Creating benchmarks and a thing to live in Tools/ that people could run on their target platform to provide a tuning suggestion of what thresholds work best for their needs.

Related inspiring work: OSes often benchmark several algorithm implementations up front to pick a "best" to use for a given platform (ex: see what the Linux kernel does for hashes and raid algorithms). This extends that idea and acknowledges latency as important, not exclusively thru-put.

History
Date	User	Action	Args
2022-04-11 14:59:57	admin	set	github: 91331
2022-03-30 21:28:38	gregory.p.smith	create