classification
Title: new method random.getrandbytes()
Type: enhancement Stage: patch review
Components: Versions: Python 3.3
process
Status: closed Resolution: rejected
Dependencies: Superseder:
Assigned To: rhettinger Nosy List: amaury.forgeotdarc, jcea, nadeem.vawda, pitrou, rhettinger
Priority: low Keywords: patch

Created on 2011-11-13 16:43 by amaury.forgeotdarc, last changed 2013-07-12 06:33 by rhettinger. This issue is now closed.

Files
File name Uploaded Description Edit
getrandbytes.patch amaury.forgeotdarc, 2011-11-13 21:14 review
Messages (9)
msg147557 - (view) Author: Amaury Forgeot d'Arc (amaury.forgeotdarc) * (Python committer) Date: 2011-11-13 16:43
I noticed that several usages of random.getrandbits() actually need bytes.  A few examples:
- Lib/test/test_zlib.py calls "random.getrandbits(8 * _1M).to_bytes()"
- Twisted uses the %x format and then call .decode('hex')
Another examples found with Code Search:
- struct.pack("Q", random.getrandbits(64))
- for i in range(8): ClientChallenge+= chr(random.getrandbits(8))
- key = sha1(str(random.getrandbits(8*20))).digest()

random.getrandbits() itself is not a cheap call: it ends with a call to _PyLong_FromByteArray, and transformation to bytes will involve more copy, conversions from 30bit digits (or 15bit digits) to bytes, or worse.

This patch adds random.getrandbytes(), which creates a bytes string directly from calls to genrand_int32().
msg147562 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2011-11-13 17:02
Good idea, IMO.

+            cum = set()
+            for i in range(100):
+                val = getbytes(span)
+                cum |= set(i for i in range(span) if val[i])
+            self.assertEqual(len(cum), span)

I find this test a bit strange. Also, perhaps it should be bitwise rather than bytewise.
msg147567 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2011-11-13 19:01
How would this work for other random number generators that don't supply genrand_int32()?

The API for random is supposed to be easily subclassable and reusable for other random number generators.  The requirements for those generators is intentionally kept minimal (i.e. they only need to supply random(), seed(), getstate() and setstate() and optionally getrandbits()).

I'm -0 on making this API fatter.
msg147569 - (view) Author: Amaury Forgeot d'Arc (amaury.forgeotdarc) * (Python committer) Date: 2011-11-13 19:15
> How would this work for other random number generators that don't
> supply genrand_int32()?

genrand_int32 is an internal function, only available in C for the Mersenne Twister generator.
random.SystemRandom() should provide getrandbytes as well, it would just call os.urandom(); I don't know other generators.
msg147570 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2011-11-13 19:50
> genrand_int32 is an internal function, 
> only available in C for the Mersenne Twister generator.

Yes, I know.  I'm the one added that code ;-)

>  I don't know other generators.

The problem is that we need an API that will accommodate other random number generators and not be specific to the MersenneTwister.  Right now, the starting point for everything in the random module is an underlying generator supplying a random() method returning a float and an optional getrandombits() method which returns an int (possibly a long int).  The latter is easily convertible to bytes with to_bytes() which uses a fast O(n) algorithm. ISTM, the getrandbytes() proposal amounts to a micro-optimization of existing capabilities, and it comes at the expense of fattening the API.
msg147571 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2011-11-13 20:41
> The problem is that we need an API that will accommodate other random
> number generators and not be specific to the MersenneTwister.  Right
> now, the starting point for everything in the random module is an
> underlying generator supplying a random() method returning a float and
> an optional getrandombits() method which returns an int (possibly a
> long int).

Well, you can provide a default getrandombytes() which calls into
getrandombits(), and specialize it in the case genrand_int32() exists.

>   The latter is easily convertible to bytes with to_bytes() which uses
> a fast O(n) algorithm.

Well, O(n) doesn't necessarily equate fast. Can Amaury post benchmark
numbers of getrandbits().to_bytes() vs getrandbytes()?
msg147573 - (view) Author: Amaury Forgeot d'Arc (amaury.forgeotdarc) * (Python committer) Date: 2011-11-13 21:14
./python -m timeit -s "from random import getrandbits" "getrandbits(8000000).to_bytes(1000000, 'little')"
10 loops, best of 3: 25 msec per loop

./python -m timeit -s "from random import getrandbytes" "getrandbytes(1000000)"
100 loops, best of 3: 9.66 msec per loop

For the Mersenne Twister, getrandbytes() is ~2.5 times faster (A length of 1000 gives exactly the same ratio)
When applied to the SytemRandom object, the difference is less impressive (probably because os.urandom is slow) but getrandbytes is still 20ms faster.

Updated patch to add getrandbytes at the module level, and to SystemRandom.
msg147583 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2011-11-14 02:57
The differential cost of generating n random bytes is negligible compared to actually doing anything with the bytes once their generated.  

This optimization is close to being a total waste (saving 15 milliseconds for the abnormal case of generating 1 million random bytes).
msg192924 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2013-07-12 06:33
Closing for the reasons listed above.
History
Date User Action Args
2013-07-12 06:33:29rhettingersetstatus: open -> closed
resolution: rejected
messages: + msg192924
2011-11-14 17:47:49rhettingersetpriority: normal -> low
2011-11-14 12:50:50jceasetnosy: + jcea
2011-11-14 02:57:30rhettingersetmessages: + msg147583
2011-11-13 21:15:12amaury.forgeotdarcsetfiles: - getrandbytes.patch
2011-11-13 21:14:57amaury.forgeotdarcsetfiles: + getrandbytes.patch

messages: + msg147573
2011-11-13 20:41:34pitrousetmessages: + msg147571
2011-11-13 19:50:03rhettingersetmessages: + msg147570
2011-11-13 19:15:38amaury.forgeotdarcsetmessages: + msg147569
2011-11-13 19:01:54rhettingersetmessages: + msg147567
2011-11-13 18:55:51rhettingersetassignee: rhettinger
2011-11-13 17:02:54pitrousetversions: - Python 3.2
nosy: + pitrou

messages: + msg147562

stage: patch review
2011-11-13 16:48:37nadeem.vawdasetnosy: + nadeem.vawda
2011-11-13 16:43:41amaury.forgeotdarccreate