New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Simplify and optimize random_seed() #60700
Comments
Now random_seed() use an ineffective quadratic-time algorithm. It can reuse _PyLong_AsByteArray(), decreasing code size and increasing performance. |
Why is this a problem? |
This is not a problem at all. This just a code cleanup. Removing a code duplicate. Performance boost is a side effect. I confused by an old code comment about _PyLong_AsByteArray(). I don't see any difficulties. |
Ah. The cleanup looks good, from a shallow glance. I haven't verified yet that the change is actually correct. |
Patch looks correct and looks good to me, modulo a couple of nitpicks (see Rietveld comments). This seems like a nice cleanup. The patch introduces a new dependence on PY_UINT32_T, which is something we haven't so far used elsewhere in Python beyond the float<->string conversions. That's quite a big change: it means that it's conceivable that the random module could now fail to build on platforms where it used to work. It also changes the signature of 'init_by_array', which is one of the core routines taken directly from the MT implementation. |
Are there any platforms without 32-bit integers (PY_UINT32_T can be uint32_t, unsigned int or long)? PyUCS4 also should be 32-bit, therefore Python requires such type. |
Patch updated to conform with Mark's nitpicks. What I really doubt is that now same integer seed on little-endian and big-endian give different random sequences. Is this important? If yes, I can add bytes-swapping. On other hand, non-integer seed already give platform-depending results (the hashing used). What you think about using a buffer instead a hash for bytes and bytearrays (and all other seeds supported buffer protocol)? It can increase entropy. Of course, it should be another issue if you approve it. |
Hmm, okay. I wonder whether PY_UINT32_T should have been used there, to avoid doing the same checks in multiple places.
Yes, I think it's important that this code change doesn't change the random sequence if the seed is unchanged, on any platform (32 / 64-bit, big versus little endian). |
Thanks for the updated patch. A couple more comments:
|
Patch updated. Now random_seed() is platform-independed for integer arguments. |
Oops, typo. |
I'm still uncomfortable with the init_by_array signature changes and the use of PY_UINT32_T. How about something like the attached instead? It keeps the central idea (use _PyLong_NumBits and _PyLong_AsByteArray) but doesn't require any signature changes or special casing for bigendian machines. |
I don't think that the preservation of the signature of the auxiliary private static function is worth it. I'm uncomfortable with such patch. But do as you feel comfortable. |
Any particular reason? It's direct and straightforward, and eliminates the quadratic behaviour. |
The code is larger. There is one additional allocation. CPU tacts wasted for uint32->ulong conversion (and in any case all numbers in the generator are 32-bit). One additional ValeError/OverflowError. Apparently my feeling of comfort is different from your own. ;) Hmm, reviewing your code I found errors in my code too. I guess I'm more captious as a critic than as an author. |
Yes: I tend to favour direct, readable, and portable code over unnecessarily optimized code. To address the specific points:
Very slightly. It's (IMO) more readable and comprehensible though.
Yep. Is this a problem?
random.seed is hardly going to be a bottleneck in most applications. Again, I don't see that as a problem.
That's not really additional: it should really have already been there in the original code. |
I agree with Mark, we don't need to micro-optimize here. Also, +1 for not needing PY_UINT32_T. |
And my feeling of directness, readability, and portability also slightly differs. I agree that code size and additional operations not of importance here. I say only that it makes me feel less comfortable. It doesn't matter. I were left some nitpicks on Rietveld. |
Is PY_UINT32_T a big problem? I hope that one day we can use the Consider cdecimal as a trial balloon: It compiles on all obscure |
Updated patch: adds in the missing checks for PyMem_Malloc errors. |
New changeset db75553ff333 by Mark Dickinson in branch 'default': |
Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.
Show more details
GitHub fields:
bugs.python.org fields:
The text was updated successfully, but these errors were encountered: