Simplify and optimize random_seed() #60700

serhiy-storchaka · 2012-11-17T21:32:45Z

BPO	16496
Nosy	@loewis, @rhettinger, @jcea, @mdickinson, @pitrou, @asvetlov, @skrah, @serhiy-storchaka
Files	random_seed.patch random_seed_2.patch random_seed_3.patch random_seed_metd.patch random_seed_metd2.patch

^{Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.}

Show more details

GitHub fields:

assignee = 'https://github.com/mdickinson'
closed_at = <Date 2012-12-21.21:53:37.176>
created_at = <Date 2012-11-17.21:32:44.652>
labels = ['extension-modules', 'performance']
title = 'Simplify and optimize random_seed()'
updated_at = <Date 2012-12-21.21:53:37.176>
user = 'https://github.com/serhiy-storchaka'

bugs.python.org fields:

activity = <Date 2012-12-21.21:53:37.176>
actor = 'mark.dickinson'
assignee = 'mark.dickinson'
closed = True
closed_date = <Date 2012-12-21.21:53:37.176>
closer = 'mark.dickinson'
components = ['Extension Modules']
creation = <Date 2012-11-17.21:32:44.652>
creator = 'serhiy.storchaka'
dependencies = []
files = ['28014', '28019', '28021', '28027', '28107']
hgrepos = []
issue_num = 16496
keywords = ['patch']
message_count = 21.0
messages = ['175812', '175820', '175821', '175827', '175846', '175847', '175848', '175849', '175850', '175851', '175852', '175881', '175883', '175884', '175886', '175887', '175888', '175897', '175899', '176341', '177900']
nosy_count = 9.0
nosy_names = ['loewis', 'rhettinger', 'jcea', 'mark.dickinson', 'pitrou', 'asvetlov', 'skrah', 'python-dev', 'serhiy.storchaka']
pr_nums = []
priority = 'normal'
resolution = 'fixed'
stage = 'patch review'
status = 'closed'
superseder = None
type = 'performance'
url = 'https://bugs.python.org/issue16496'
versions = ['Python 3.4']

serhiy-storchaka · 2012-11-17T21:32:44Z

Now random_seed() use an ineffective quadratic-time algorithm. It can reuse _PyLong_AsByteArray(), decreasing code size and increasing performance.

loewis · 2012-11-17T22:42:55Z

Why is this a problem?

serhiy-storchaka · 2012-11-17T22:48:38Z

This is not a problem at all. This just a code cleanup. Removing a code duplicate. Performance boost is a side effect.

I confused by an old code comment about _PyLong_AsByteArray(). I don't see any difficulties.

loewis · 2012-11-17T23:16:48Z

Ah. The cleanup looks good, from a shallow glance. I haven't verified yet that the change is actually correct.

mdickinson · 2012-11-18T08:32:41Z

Patch looks correct and looks good to me, modulo a couple of nitpicks (see Rietveld comments). This seems like a nice cleanup.

The patch introduces a new dependence on PY_UINT32_T, which is something we haven't so far used elsewhere in Python beyond the float<->string conversions. That's quite a big change: it means that it's conceivable that the random module could now fail to build on platforms where it used to work.

It also changes the signature of 'init_by_array', which is one of the core routines taken directly from the MT implementation.

serhiy-storchaka · 2012-11-18T08:41:38Z

Are there any platforms without 32-bit integers (PY_UINT32_T can be uint32_t, unsigned int or long)? PyUCS4 also should be 32-bit, therefore Python requires such type.

serhiy-storchaka · 2012-11-18T09:00:10Z

Patch updated to conform with Mark's nitpicks.

What I really doubt is that now same integer seed on little-endian and big-endian give different random sequences. Is this important? If yes, I can add bytes-swapping. On other hand, non-integer seed already give platform-depending results (the hashing used).

What you think about using a buffer instead a hash for bytes and bytearrays (and all other seeds supported buffer protocol)? It can increase entropy. Of course, it should be another issue if you approve it.

mdickinson · 2012-11-18T09:08:38Z

PyUCS4 also should be 32-bit, therefore Python requires such type.

Hmm, okay. I wonder whether PY_UINT32_T should have been used there, to avoid doing the same checks in multiple places.

What I really doubt is that now same integer seed on little-endian and
big-endian give different random sequences. Is this important?

Yes, I think it's important that this code change doesn't change the random sequence if the seed is unchanged, on any platform (32 / 64-bit, big versus little endian).

mdickinson · 2012-11-18T09:14:43Z

Thanks for the updated patch. A couple more comments:

you've got potential overflow when computing keysize from bits, on platforms where sizeof(size_t) > sizeof(unsigned long).
please could you move the check for PY_UINT32_T nearer the top of the file, along with the other #defines.

serhiy-storchaka · 2012-11-18T09:36:17Z

Patch updated. Now random_seed() is platform-independed for integer arguments.

serhiy-storchaka · 2012-11-18T09:39:28Z

Oops, typo.

mdickinson · 2012-11-18T15:30:24Z

I'm still uncomfortable with the init_by_array signature changes and the use of PY_UINT32_T. How about something like the attached instead? It keeps the central idea (use _PyLong_NumBits and _PyLong_AsByteArray) but doesn't require any signature changes or special casing for bigendian machines.

serhiy-storchaka · 2012-11-18T15:55:54Z

I don't think that the preservation of the signature of the auxiliary private static function is worth it. I'm uncomfortable with such patch. But do as you feel comfortable.

mdickinson · 2012-11-18T16:19:21Z

I'm uncomfortable with such patch.

Any particular reason? It's direct and straightforward, and eliminates the quadratic behaviour.

serhiy-storchaka · 2012-11-18T16:49:04Z

The code is larger. There is one additional allocation. CPU tacts wasted for uint32->ulong conversion (and in any case all numbers in the generator are 32-bit). One additional ValeError/OverflowError. Apparently my feeling of comfort is different from your own. ;)

Hmm, reviewing your code I found errors in my code too. I guess I'm more captious as a critic than as an author.

mdickinson · 2012-11-18T17:06:25Z

Apparently my feeling of comfort is different from your own. ;)

Yes: I tend to favour direct, readable, and portable code over unnecessarily optimized code. To address the specific points:

The code is larger.

Very slightly. It's (IMO) more readable and comprehensible though.

There is one additional allocation.

Yep. Is this a problem?

CPU tacts wasted for uint32->ulong conversion.

random.seed is hardly going to be a bottleneck in most applications. Again, I don't see that as a problem.

One additional ValeError/OverflowError.

That's not really additional: it should really have already been there in the original code.

pitrou · 2012-11-18T17:07:38Z

I agree with Mark, we don't need to micro-optimize here. Also, +1 for not needing PY_UINT32_T.

serhiy-storchaka · 2012-11-18T17:45:11Z

I tend to favour direct, readable, and portable code over unnecessarily optimized code.

And my feeling of directness, readability, and portability also slightly differs. I agree that code size and additional operations not of importance here. I say only that it makes me feel less comfortable. It doesn't matter.

I were left some nitpicks on Rietveld.

skrah · 2012-11-18T18:09:44Z

Is PY_UINT32_T a big problem? I hope that one day we can use the
C99 types directly. Visual Studio finally supports stdint.h, and
I'm not aware of any compiler that does not.

Consider cdecimal as a trial balloon: It compiles on all obscure
snakebite platforms, and I'm almost willing to bet that there won't
be any reports due to the C99 types.

mdickinson · 2012-11-25T11:40:29Z

Updated patch: adds in the missing checks for PyMem_Malloc errors.

python-dev · 2012-12-21T21:53:08Z

New changeset db75553ff333 by Mark Dickinson in branch 'default':
Simplify random_seed to use _PyLong_AsByteArray. Closes issue bpo-16496.
http://hg.python.org/cpython/rev/db75553ff333

serhiy-storchaka added extension-modules C modules in the Modules dir performance Performance or resource usage labels Nov 17, 2012

mdickinson self-assigned this Nov 19, 2012

mdickinson closed this as completed Dec 21, 2012

ezio-melotti transferred this issue from another repository Apr 10, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Simplify and optimize random_seed() #60700

Simplify and optimize random_seed() #60700

serhiy-storchaka commented Nov 17, 2012

serhiy-storchaka commented Nov 17, 2012

loewis mannequin commented Nov 17, 2012

serhiy-storchaka commented Nov 17, 2012

loewis mannequin commented Nov 17, 2012

mdickinson commented Nov 18, 2012

serhiy-storchaka commented Nov 18, 2012

serhiy-storchaka commented Nov 18, 2012

mdickinson commented Nov 18, 2012

mdickinson commented Nov 18, 2012

serhiy-storchaka commented Nov 18, 2012

serhiy-storchaka commented Nov 18, 2012

mdickinson commented Nov 18, 2012

serhiy-storchaka commented Nov 18, 2012

mdickinson commented Nov 18, 2012

serhiy-storchaka commented Nov 18, 2012

mdickinson commented Nov 18, 2012

pitrou commented Nov 18, 2012

serhiy-storchaka commented Nov 18, 2012

skrah mannequin commented Nov 18, 2012

mdickinson commented Nov 25, 2012

python-dev mannequin commented Dec 21, 2012

Simplify and optimize random_seed() #60700

Simplify and optimize random_seed() #60700

Comments

serhiy-storchaka commented Nov 17, 2012

serhiy-storchaka commented Nov 17, 2012

loewis mannequin commented Nov 17, 2012

serhiy-storchaka commented Nov 17, 2012

loewis mannequin commented Nov 17, 2012

mdickinson commented Nov 18, 2012

serhiy-storchaka commented Nov 18, 2012

serhiy-storchaka commented Nov 18, 2012

mdickinson commented Nov 18, 2012

mdickinson commented Nov 18, 2012

serhiy-storchaka commented Nov 18, 2012

serhiy-storchaka commented Nov 18, 2012

mdickinson commented Nov 18, 2012

serhiy-storchaka commented Nov 18, 2012

mdickinson commented Nov 18, 2012

serhiy-storchaka commented Nov 18, 2012

mdickinson commented Nov 18, 2012

pitrou commented Nov 18, 2012

serhiy-storchaka commented Nov 18, 2012

skrah mannequin commented Nov 18, 2012

mdickinson commented Nov 25, 2012

python-dev mannequin commented Dec 21, 2012