classification
Title: Simplify and optimize random_seed()
Type: performance Stage: patch review
Components: Extension Modules Versions: Python 3.4
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: mark.dickinson Nosy List: asvetlov, jcea, loewis, mark.dickinson, pitrou, python-dev, rhettinger, serhiy.storchaka, skrah
Priority: normal Keywords: patch

Created on 2012-11-17 21:32 by serhiy.storchaka, last changed 2012-12-21 21:53 by mark.dickinson. This issue is now closed.

Files
File name Uploaded Description Edit
random_seed.patch serhiy.storchaka, 2012-11-17 21:32 review
random_seed_2.patch serhiy.storchaka, 2012-11-18 09:00 review
random_seed_3.patch serhiy.storchaka, 2012-11-18 09:39 review
random_seed_metd.patch mark.dickinson, 2012-11-18 15:30 review
random_seed_metd2.patch mark.dickinson, 2012-11-25 11:40 review
Messages (21)
msg175812 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2012-11-17 21:32
Now random_seed() use an ineffective quadratic-time algorithm.  It can reuse _PyLong_AsByteArray(), decreasing code size and increasing performance.
msg175820 - (view) Author: Martin v. Löwis (loewis) * (Python committer) Date: 2012-11-17 22:42
Why is this a problem?
msg175821 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2012-11-17 22:48
This is not a problem at all. This just a code cleanup. Removing a code duplicate. Performance boost is a side effect.

I confused by an old code comment about _PyLong_AsByteArray(). I don't see any difficulties.
msg175827 - (view) Author: Martin v. Löwis (loewis) * (Python committer) Date: 2012-11-17 23:16
Ah. The cleanup looks good, from a shallow glance. I haven't verified yet that the change is actually correct.
msg175846 - (view) Author: Mark Dickinson (mark.dickinson) * (Python committer) Date: 2012-11-18 08:32
Patch looks correct and looks good to me, modulo a couple of nitpicks (see Rietveld comments).  This seems like a nice cleanup.

The patch introduces a new dependence on PY_UINT32_T, which is something we haven't so far used elsewhere in Python beyond the float<->string conversions.  That's quite a big change: it means that it's conceivable that the random module could now fail to build on platforms where it used to work.

It also changes the signature of 'init_by_array', which is one of the core routines taken directly from the MT implementation.
msg175847 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2012-11-18 08:41
Are there any platforms without 32-bit integers (PY_UINT32_T can be uint32_t, unsigned int or long)? PyUCS4 also should be 32-bit, therefore Python requires such type.
msg175848 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2012-11-18 09:00
Patch updated to conform with Mark's nitpicks.

What I really doubt is that now same integer seed on little-endian and big-endian give different random sequences. Is this important? If yes, I can add bytes-swapping. On other hand, non-integer seed already give platform-depending results (the hashing used).

What you think about using a buffer instead a hash for bytes and bytearrays (and all other seeds supported buffer protocol)? It can increase entropy. Of course, it should be another issue if you approve it.
msg175849 - (view) Author: Mark Dickinson (mark.dickinson) * (Python committer) Date: 2012-11-18 09:08
> PyUCS4 also should be 32-bit, therefore Python requires such type.

Hmm, okay.  I wonder whether PY_UINT32_T should have been used there, to avoid doing the same checks in multiple places.

> What I really doubt is that now same integer seed on little-endian and
> big-endian give different random sequences. Is this important?

Yes, I think it's important that this code change doesn't change the random sequence if the seed is unchanged, on any platform (32 / 64-bit, big versus little endian).
msg175850 - (view) Author: Mark Dickinson (mark.dickinson) * (Python committer) Date: 2012-11-18 09:14
Thanks for the updated patch.  A couple more comments:

- you've got potential overflow when computing keysize from bits, on platforms where sizeof(size_t) > sizeof(unsigned long).

- please could you move the check for PY_UINT32_T nearer the top of the file, along with the other #defines.
msg175851 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2012-11-18 09:36
Patch updated. Now random_seed() is platform-independed for integer arguments.
msg175852 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2012-11-18 09:39
Oops, typo.
msg175881 - (view) Author: Mark Dickinson (mark.dickinson) * (Python committer) Date: 2012-11-18 15:30
I'm still uncomfortable with the init_by_array signature changes and the use of PY_UINT32_T.  How about something like the attached instead?  It keeps the central idea (use _PyLong_NumBits and _PyLong_AsByteArray) but doesn't require any signature changes or special casing for bigendian machines.
msg175883 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2012-11-18 15:55
I don't think that the preservation of the signature of the auxiliary private static function is worth it.  I'm uncomfortable with such patch.  But do as you feel comfortable.
msg175884 - (view) Author: Mark Dickinson (mark.dickinson) * (Python committer) Date: 2012-11-18 16:19
> I'm uncomfortable with such patch.

Any particular reason?  It's direct and straightforward, and eliminates the quadratic behaviour.
msg175886 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2012-11-18 16:49
The code is larger.  There is one additional allocation.  CPU tacts wasted for uint32->ulong conversion (and in any case all numbers in the generator are 32-bit).  One additional ValeError/OverflowError.  Apparently my feeling of comfort is different from your own. ;)

Hmm, reviewing your code I found errors in my code too.  I guess I'm more captious as a critic than as an author.
msg175887 - (view) Author: Mark Dickinson (mark.dickinson) * (Python committer) Date: 2012-11-18 17:06
> Apparently my feeling of comfort is different from your own. ;)

Yes:  I tend to favour direct, readable, and portable code over unnecessarily optimized code.  To address the specific points:

> The code is larger.

Very slightly.  It's (IMO) more readable and comprehensible though.

> There is one additional allocation.

Yep.  Is this a problem?

> CPU tacts wasted for uint32->ulong conversion.

random.seed is hardly going to be a bottleneck in most applications.  Again, I don't see that as a problem.

> One additional ValeError/OverflowError.

That's not really additional:  it should really have already been there in the original code.
msg175888 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2012-11-18 17:07
I agree with Mark, we don't need to micro-optimize here. Also, +1 for not needing PY_UINT32_T.
msg175897 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2012-11-18 17:45
> I tend to favour direct, readable, and portable code over unnecessarily optimized code.

And my feeling of directness, readability, and portability also slightly differs.  I agree that code size and additional operations not of importance here.  I say only that it makes me feel less comfortable.  It doesn't matter.

I were left some nitpicks on Rietveld.
msg175899 - (view) Author: Stefan Krah (skrah) * (Python committer) Date: 2012-11-18 18:09
Is PY_UINT32_T a big problem? I hope that one day we can use the
C99 types directly. Visual Studio finally supports stdint.h, and
I'm not aware of any compiler that does not.

Consider cdecimal as a trial balloon: It compiles on all obscure
snakebite platforms, and I'm almost willing to bet that there won't
be any reports due to the C99 types.
msg176341 - (view) Author: Mark Dickinson (mark.dickinson) * (Python committer) Date: 2012-11-25 11:40
Updated patch: adds in the missing checks for PyMem_Malloc errors.
msg177900 - (view) Author: Roundup Robot (python-dev) Date: 2012-12-21 21:53
New changeset db75553ff333 by Mark Dickinson in branch 'default':
Simplify random_seed to use _PyLong_AsByteArray.  Closes issue #16496.
http://hg.python.org/cpython/rev/db75553ff333
History
Date User Action Args
2012-12-21 21:53:37mark.dickinsonsetstatus: open -> closed
resolution: fixed
2012-12-21 21:53:08python-devsetnosy: + python-dev
messages: + msg177900
2012-12-03 15:39:03asvetlovsetnosy: + asvetlov
2012-11-26 17:13:59jceasetnosy: + jcea
2012-11-25 11:40:29mark.dickinsonsetfiles: + random_seed_metd2.patch

messages: + msg176341
2012-11-19 18:18:01mark.dickinsonsetassignee: mark.dickinson
2012-11-18 18:09:44skrahsetnosy: + skrah
messages: + msg175899
2012-11-18 17:45:10serhiy.storchakasetmessages: + msg175897
2012-11-18 17:07:38pitrousetnosy: + pitrou
messages: + msg175888
2012-11-18 17:06:24mark.dickinsonsetmessages: + msg175887
2012-11-18 16:49:03serhiy.storchakasetmessages: + msg175886
2012-11-18 16:19:21mark.dickinsonsetmessages: + msg175884
2012-11-18 15:55:53serhiy.storchakasetmessages: + msg175883
2012-11-18 15:30:24mark.dickinsonsetfiles: + random_seed_metd.patch

messages: + msg175881
2012-11-18 09:39:28serhiy.storchakasetfiles: + random_seed_3.patch

messages: + msg175852
2012-11-18 09:38:26serhiy.storchakasetfiles: - random_seed_3.patch
2012-11-18 09:36:17serhiy.storchakasetfiles: + random_seed_3.patch

messages: + msg175851
2012-11-18 09:14:42mark.dickinsonsetmessages: + msg175850
2012-11-18 09:08:38mark.dickinsonsetmessages: + msg175849
2012-11-18 09:00:10serhiy.storchakasetfiles: + random_seed_2.patch

messages: + msg175848
2012-11-18 08:41:38serhiy.storchakasetmessages: + msg175847
2012-11-18 08:32:41mark.dickinsonsetmessages: + msg175846
2012-11-17 23:16:47loewissetmessages: + msg175827
2012-11-17 22:48:38serhiy.storchakasetmessages: + msg175821
2012-11-17 22:42:55loewissetnosy: + loewis
messages: + msg175820
2012-11-17 21:32:44serhiy.storchakacreate