Message 310007 - Python tracker

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author	rhettinger
Recipients	Eric Appelt, berker.peksag, christian.heimes, martin.panter, python-dev, rhettinger, serhiy.storchaka, tim.peters, vstinner
Date	2018-01-15.18:42:46
SpamBayes Score	-1.0
Marked as misclassified	Yes
Message-id	<1516041766.27.0.467229070634.issue26163@psf.upfronthosting.co.za>
In-reply-to

Content
Messages (3) msg309956 - (view) Author: Johnny Dude (JohnnyD) Date: 2018-01-15 01:08 When using a tuple that include a string the results are not consistent when invoking a new interpreter or process. For example executing the following on a linux machine will yield different results: python3.6 -c 'import random; random.seed(("a", 1)); print(random.random())" Please note that the doc string of random.seed states: "Initialize internal state from hashable object." Python documentation does not. (https://docs.python.org/3.6/library/random.html#random.seed) This is very confusing, I hope you can fix the behavior, not the doc string. msg309957 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2018-01-15 01:13 random.seed(str) uses: if version == 2 and isinstance(a, (str, bytes, bytearray)): if isinstance(a, str): a = a.encode() a += _sha512(a).digest() a = int.from_bytes(a, 'big') Whereas for other types, random.seed(obj) uses hash(obj), and hash is randomized by default in Python 3. Yeah, the random.seed() documentation should describe the implementation and explain that hash(obj) is used and that the hash function is randomized by default: https://docs.python.org/dev/library/random.html#random.seed msg310006 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2018-01-15 10:41 I'm getting a nice improvement in dispersion statistics by shuffling in higher bits right at the end: /* Disperse patterns arising in nested frozensets / + hash ^= (hash >> 11) ^ (~hash >> 25); hash = hash 69069U + 907133923UL; Results for range() check: range range baseline new 1st percentile 35.06% 40.63% 1st decile 48.03% 51.34% mean 61.47% 63.24% median 63.24% 65.58% Test code for the letter_range() test: letter letter baseline new 1st percentile 39.59% 40.14% 1st decile 50.90% 51.07% mean 63.02% 63.04% median 65.21% 65.23% def letter_range(n): return string.ascii_letters[:n] def powerset(s): for i in range(len(s)+1): yield from map(frozenset, itertools.combinations(s, i)) # range() check for i in range(10000): for n in range(5, 19): t = 2 ** n mask = t - 1 u = len({h & mask for h in map(hash, powerset(range(i, i+n)))}) print(u/t100) # letter_range() check needs to be restarted (reseeded on every run) for n in range(5, 19): t = 2 * n mask = t - 1 u = len({h & mask for h in map(hash, powerset(letter_range(n)))}) print(u/t)

Messages (3)
msg309956 - (view)	Author: Johnny Dude (JohnnyD)	Date: 2018-01-15 01:08
When using a tuple that include a string the results are not consistent when invoking a new interpreter or process.

For example executing the following on a linux machine will yield different results:
python3.6 -c 'import random; random.seed(("a", 1)); print(random.random())"

Please note that the doc string of random.seed states: "Initialize internal state from hashable object."

Python documentation does not. (https://docs.python.org/3.6/library/random.html#random.seed)

This is very confusing, I hope you can fix the behavior, not the doc string.
msg309957 - (view)	Author: STINNER Victor (vstinner) * (Python committer)	Date: 2018-01-15 01:13
random.seed(str) uses:

        if version == 2 and isinstance(a, (str, bytes, bytearray)):
            if isinstance(a, str):
                a = a.encode()
            a += _sha512(a).digest()
            a = int.from_bytes(a, 'big')

Whereas for other types, random.seed(obj) uses hash(obj), and hash is randomized by default in Python 3.

Yeah, the random.seed() documentation should describe the implementation and explain that hash(obj) is used and that the hash function is randomized by default:
https://docs.python.org/dev/library/random.html#random.seed
msg310006 - (view)	Author: Raymond Hettinger (rhettinger) * (Python committer)	Date: 2018-01-15 10:41
I'm getting a nice improvement in dispersion statistics by shuffling in higher bits right at the end:

     /* Disperse patterns arising in nested frozensets */
  +  hash ^= (hash >> 11) ^ (~hash >> 25);
     hash = hash * 69069U + 907133923UL;

Results for range() check:

                     range       range
                    baseline      new
  1st percentile     35.06%      40.63%
  1st decile         48.03%      51.34%
  mean               61.47%      63.24%      
  median             63.24%      65.58% 

Test code for the letter_range() test:

                     letter      letter
                    baseline      new
  1st percentile     39.59%      40.14%
  1st decile         50.90%      51.07%
  mean               63.02%      63.04%      
  median             65.21%      65.23% 


    def letter_range(n):
        return string.ascii_letters[:n]

    def powerset(s):
        for i in range(len(s)+1):
            yield from map(frozenset, itertools.combinations(s, i))

    # range() check
    for i in range(10000):
        for n in range(5, 19):
            t = 2 ** n
            mask = t - 1
            u = len({h & mask for h in map(hash, powerset(range(i, i+n)))})
            print(u/t*100)

    # letter_range() check needs to be restarted (reseeded on every run)
    for n in range(5, 19):
        t = 2 ** n
        mask = t - 1
        u = len({h & mask for h in map(hash, powerset(letter_range(n)))})
        print(u/t)

History
Date	User	Action	Args
2018-01-15 18:43:56	rhettinger	unlink	issue26163 messages
2018-01-15 18:42:46	rhettinger	set	recipients: + rhettinger, tim.peters, vstinner, christian.heimes, python-dev, berker.peksag, martin.panter, serhiy.storchaka, Eric Appelt
2018-01-15 18:42:46	rhettinger	set	messageid: <1516041766.27.0.467229070634.issue26163@psf.upfronthosting.co.za>
2018-01-15 18:42:46	rhettinger	link	issue26163 messages
2018-01-15 18:42:46	rhettinger	create