Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Simplify and optimize random_seed() #60700

Closed
serhiy-storchaka opened this issue Nov 17, 2012 · 21 comments
Closed

Simplify and optimize random_seed() #60700

serhiy-storchaka opened this issue Nov 17, 2012 · 21 comments
Assignees
Labels
extension-modules C modules in the Modules dir performance Performance or resource usage

Comments

@serhiy-storchaka
Copy link
Member

BPO 16496
Nosy @loewis, @rhettinger, @jcea, @mdickinson, @pitrou, @asvetlov, @skrah, @serhiy-storchaka
Files
  • random_seed.patch
  • random_seed_2.patch
  • random_seed_3.patch
  • random_seed_metd.patch
  • random_seed_metd2.patch
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = 'https://github.com/mdickinson'
    closed_at = <Date 2012-12-21.21:53:37.176>
    created_at = <Date 2012-11-17.21:32:44.652>
    labels = ['extension-modules', 'performance']
    title = 'Simplify and optimize random_seed()'
    updated_at = <Date 2012-12-21.21:53:37.176>
    user = 'https://github.com/serhiy-storchaka'

    bugs.python.org fields:

    activity = <Date 2012-12-21.21:53:37.176>
    actor = 'mark.dickinson'
    assignee = 'mark.dickinson'
    closed = True
    closed_date = <Date 2012-12-21.21:53:37.176>
    closer = 'mark.dickinson'
    components = ['Extension Modules']
    creation = <Date 2012-11-17.21:32:44.652>
    creator = 'serhiy.storchaka'
    dependencies = []
    files = ['28014', '28019', '28021', '28027', '28107']
    hgrepos = []
    issue_num = 16496
    keywords = ['patch']
    message_count = 21.0
    messages = ['175812', '175820', '175821', '175827', '175846', '175847', '175848', '175849', '175850', '175851', '175852', '175881', '175883', '175884', '175886', '175887', '175888', '175897', '175899', '176341', '177900']
    nosy_count = 9.0
    nosy_names = ['loewis', 'rhettinger', 'jcea', 'mark.dickinson', 'pitrou', 'asvetlov', 'skrah', 'python-dev', 'serhiy.storchaka']
    pr_nums = []
    priority = 'normal'
    resolution = 'fixed'
    stage = 'patch review'
    status = 'closed'
    superseder = None
    type = 'performance'
    url = 'https://bugs.python.org/issue16496'
    versions = ['Python 3.4']

    @serhiy-storchaka
    Copy link
    Member Author

    Now random_seed() use an ineffective quadratic-time algorithm. It can reuse _PyLong_AsByteArray(), decreasing code size and increasing performance.

    @serhiy-storchaka serhiy-storchaka added extension-modules C modules in the Modules dir performance Performance or resource usage labels Nov 17, 2012
    @loewis
    Copy link
    Mannequin

    loewis mannequin commented Nov 17, 2012

    Why is this a problem?

    @serhiy-storchaka
    Copy link
    Member Author

    This is not a problem at all. This just a code cleanup. Removing a code duplicate. Performance boost is a side effect.

    I confused by an old code comment about _PyLong_AsByteArray(). I don't see any difficulties.

    @loewis
    Copy link
    Mannequin

    loewis mannequin commented Nov 17, 2012

    Ah. The cleanup looks good, from a shallow glance. I haven't verified yet that the change is actually correct.

    @mdickinson
    Copy link
    Member

    Patch looks correct and looks good to me, modulo a couple of nitpicks (see Rietveld comments). This seems like a nice cleanup.

    The patch introduces a new dependence on PY_UINT32_T, which is something we haven't so far used elsewhere in Python beyond the float<->string conversions. That's quite a big change: it means that it's conceivable that the random module could now fail to build on platforms where it used to work.

    It also changes the signature of 'init_by_array', which is one of the core routines taken directly from the MT implementation.

    @serhiy-storchaka
    Copy link
    Member Author

    Are there any platforms without 32-bit integers (PY_UINT32_T can be uint32_t, unsigned int or long)? PyUCS4 also should be 32-bit, therefore Python requires such type.

    @serhiy-storchaka
    Copy link
    Member Author

    Patch updated to conform with Mark's nitpicks.

    What I really doubt is that now same integer seed on little-endian and big-endian give different random sequences. Is this important? If yes, I can add bytes-swapping. On other hand, non-integer seed already give platform-depending results (the hashing used).

    What you think about using a buffer instead a hash for bytes and bytearrays (and all other seeds supported buffer protocol)? It can increase entropy. Of course, it should be another issue if you approve it.

    @mdickinson
    Copy link
    Member

    PyUCS4 also should be 32-bit, therefore Python requires such type.

    Hmm, okay. I wonder whether PY_UINT32_T should have been used there, to avoid doing the same checks in multiple places.

    What I really doubt is that now same integer seed on little-endian and
    big-endian give different random sequences. Is this important?

    Yes, I think it's important that this code change doesn't change the random sequence if the seed is unchanged, on any platform (32 / 64-bit, big versus little endian).

    @mdickinson
    Copy link
    Member

    Thanks for the updated patch. A couple more comments:

    • you've got potential overflow when computing keysize from bits, on platforms where sizeof(size_t) > sizeof(unsigned long).

    • please could you move the check for PY_UINT32_T nearer the top of the file, along with the other #defines.

    @serhiy-storchaka
    Copy link
    Member Author

    Patch updated. Now random_seed() is platform-independed for integer arguments.

    @serhiy-storchaka
    Copy link
    Member Author

    Oops, typo.

    @mdickinson
    Copy link
    Member

    I'm still uncomfortable with the init_by_array signature changes and the use of PY_UINT32_T. How about something like the attached instead? It keeps the central idea (use _PyLong_NumBits and _PyLong_AsByteArray) but doesn't require any signature changes or special casing for bigendian machines.

    @serhiy-storchaka
    Copy link
    Member Author

    I don't think that the preservation of the signature of the auxiliary private static function is worth it. I'm uncomfortable with such patch. But do as you feel comfortable.

    @mdickinson
    Copy link
    Member

    I'm uncomfortable with such patch.

    Any particular reason? It's direct and straightforward, and eliminates the quadratic behaviour.

    @serhiy-storchaka
    Copy link
    Member Author

    The code is larger. There is one additional allocation. CPU tacts wasted for uint32->ulong conversion (and in any case all numbers in the generator are 32-bit). One additional ValeError/OverflowError. Apparently my feeling of comfort is different from your own. ;)

    Hmm, reviewing your code I found errors in my code too. I guess I'm more captious as a critic than as an author.

    @mdickinson
    Copy link
    Member

    Apparently my feeling of comfort is different from your own. ;)

    Yes: I tend to favour direct, readable, and portable code over unnecessarily optimized code. To address the specific points:

    The code is larger.

    Very slightly. It's (IMO) more readable and comprehensible though.

    There is one additional allocation.

    Yep. Is this a problem?

    CPU tacts wasted for uint32->ulong conversion.

    random.seed is hardly going to be a bottleneck in most applications. Again, I don't see that as a problem.

    One additional ValeError/OverflowError.

    That's not really additional: it should really have already been there in the original code.

    @pitrou
    Copy link
    Member

    pitrou commented Nov 18, 2012

    I agree with Mark, we don't need to micro-optimize here. Also, +1 for not needing PY_UINT32_T.

    @serhiy-storchaka
    Copy link
    Member Author

    I tend to favour direct, readable, and portable code over unnecessarily optimized code.

    And my feeling of directness, readability, and portability also slightly differs. I agree that code size and additional operations not of importance here. I say only that it makes me feel less comfortable. It doesn't matter.

    I were left some nitpicks on Rietveld.

    @skrah
    Copy link
    Mannequin

    skrah mannequin commented Nov 18, 2012

    Is PY_UINT32_T a big problem? I hope that one day we can use the
    C99 types directly. Visual Studio finally supports stdint.h, and
    I'm not aware of any compiler that does not.

    Consider cdecimal as a trial balloon: It compiles on all obscure
    snakebite platforms, and I'm almost willing to bet that there won't
    be any reports due to the C99 types.

    @mdickinson mdickinson self-assigned this Nov 19, 2012
    @mdickinson
    Copy link
    Member

    Updated patch: adds in the missing checks for PyMem_Malloc errors.

    @python-dev
    Copy link
    Mannequin

    python-dev mannequin commented Dec 21, 2012

    New changeset db75553ff333 by Mark Dickinson in branch 'default':
    Simplify random_seed to use _PyLong_AsByteArray. Closes issue bpo-16496.
    http://hg.python.org/cpython/rev/db75553ff333

    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    extension-modules C modules in the Modules dir performance Performance or resource usage
    Projects
    None yet
    Development

    No branches or pull requests

    3 participants