Author rhettinger
Recipients Eric Appelt, berker.peksag, christian.heimes, martin.panter, python-dev, rhettinger, serhiy.storchaka, tim.peters, vstinner
Date 2018-01-15.18:42:46
SpamBayes Score -1.0
Marked as misclassified Yes
Message-id <1516041766.27.0.467229070634.issue26163@psf.upfronthosting.co.za>
In-reply-to
Content
Messages (3)
msg309956 - (view)	Author: Johnny Dude (JohnnyD)	Date: 2018-01-15 01:08
When using a tuple that include a string the results are not consistent when invoking a new interpreter or process.

For example executing the following on a linux machine will yield different results:
python3.6 -c 'import random; random.seed(("a", 1)); print(random.random())"

Please note that the doc string of random.seed states: "Initialize internal state from hashable object."

Python documentation does not. (https://docs.python.org/3.6/library/random.html#random.seed)

This is very confusing, I hope you can fix the behavior, not the doc string.
msg309957 - (view)	Author: STINNER Victor (vstinner) * (Python committer)	Date: 2018-01-15 01:13
random.seed(str) uses:

        if version == 2 and isinstance(a, (str, bytes, bytearray)):
            if isinstance(a, str):
                a = a.encode()
            a += _sha512(a).digest()
            a = int.from_bytes(a, 'big')

Whereas for other types, random.seed(obj) uses hash(obj), and hash is randomized by default in Python 3.

Yeah, the random.seed() documentation should describe the implementation and explain that hash(obj) is used and that the hash function is randomized by default:
https://docs.python.org/dev/library/random.html#random.seed
msg310006 - (view)	Author: Raymond Hettinger (rhettinger) * (Python committer)	Date: 2018-01-15 10:41
I'm getting a nice improvement in dispersion statistics by shuffling in higher bits right at the end:

     /* Disperse patterns arising in nested frozensets */
  +  hash ^= (hash >> 11) ^ (~hash >> 25);
     hash = hash * 69069U + 907133923UL;

Results for range() check:

                     range       range
                    baseline      new
  1st percentile     35.06%      40.63%
  1st decile         48.03%      51.34%
  mean               61.47%      63.24%      
  median             63.24%      65.58% 

Test code for the letter_range() test:

                     letter      letter
                    baseline      new
  1st percentile     39.59%      40.14%
  1st decile         50.90%      51.07%
  mean               63.02%      63.04%      
  median             65.21%      65.23% 


    def letter_range(n):
        return string.ascii_letters[:n]

    def powerset(s):
        for i in range(len(s)+1):
            yield from map(frozenset, itertools.combinations(s, i))

    # range() check
    for i in range(10000):
        for n in range(5, 19):
            t = 2 ** n
            mask = t - 1
            u = len({h & mask for h in map(hash, powerset(range(i, i+n)))})
            print(u/t*100)

    # letter_range() check needs to be restarted (reseeded on every run)
    for n in range(5, 19):
        t = 2 ** n
        mask = t - 1
        u = len({h & mask for h in map(hash, powerset(letter_range(n)))})
        print(u/t)
History
Date User Action Args
2018-01-15 18:43:56rhettingerunlinkissue26163 messages
2018-01-15 18:42:46rhettingersetrecipients: + rhettinger, tim.peters, vstinner, christian.heimes, python-dev, berker.peksag, martin.panter, serhiy.storchaka, Eric Appelt
2018-01-15 18:42:46rhettingersetmessageid: <1516041766.27.0.467229070634.issue26163@psf.upfronthosting.co.za>
2018-01-15 18:42:46rhettingerlinkissue26163 messages
2018-01-15 18:42:46rhettingercreate