This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: Py3 BIF random.choices() is O(N**2) but I've written O(N) code for the same task
Type: performance Stage: resolved
Components: Library (Lib) Versions: Python 3.8, Python 3.7
process
Status: closed Resolution: not a bug
Dependencies: Superseder:
Assigned To: rhettinger Nosy List: ammar2, eamanu, mark.dickinson, rhettinger, shawn_berry, steven.daprano
Priority: normal Keywords:

Created on 2019-02-13 01:08 by shawn_berry, last changed 2022-04-11 14:59 by admin. This issue is now closed.

Files
File name Uploaded Description Edit
Improved_Py3_BIF_random_dot_choices.py shawn_berry, 2019-02-13 01:08 Improves Py3 BIF random.choices() from O(N**2) to O(N)
Repositories containing patches
https://github.com/shawnberry/Improved_random.choices/blob/master/Improved_Py3_BIF_random_dot_choices.py
Messages (5)
msg335379 - (view) Author: shawnberry (shawn_berry) * Date: 2019-02-13 01:08
Please see my GitHub page https://github.com/shawnberry/Improved_random.choices/blob/master/Improved_Py3_BIF_random_dot_choices.py
for code that reduces Py3 BIF random.choices() from O(N**2) to O(N).
This is my first suggestion to improve Python code.
Thanks,
shawnberry121@gmail.com
msg335381 - (view) Author: Ammar Askar (ammar2) * (Python committer) Date: 2019-02-13 01:27
I can't speak to the complexity of the choices function but if you're proposing an alternative implementation for choices, it would be wise to show the benefits empirically. That is, benchmarks and an explanation of why your implementation would be better than just an example of your code among an example. Additionally, you need to expand your implementation to match all the functionality of the current function and make a pull request to the main Python repository.

Adding Raymond to nosy, the original author of random.choices
msg335382 - (view) Author: Steven D'Aprano (steven.daprano) * (Python committer) Date: 2019-02-13 01:45
What's "BIF" mean? You use that term multiple times but I have never heard it before.

I'm sorry, I don't understand your code (and don't have time to study it in detail to decipher it). It would help if you factored out your new implementation of choices() into a separate function.

I see that your timing code it not reliable for timing individual function calls. You include your setup code that builds the input to choices() as well as the time it takes to print messages, so we can't really draw any reliable conclusions about the speed of choices() from your timings.

To give an analogy: you start the stopwatch at home. Climb into the shower and wash, get dressed, have breakfast, drive to work, park the car, buy a coffee at the shop next door, go to your office, greet your fellow co-workers, and finally stop the stopwatch. And then you say that the total time measured was the time it took you to drive to work alone.

Finally, I can't actually work out what part of your code is intended as a replacement for the choices() function. You should factor it out into a separate function, then time the two function calls:

    choices(dataset, weights)
    shawn_choices(dataset, weights)

alone (without timing the setup of the datasets, or the I/O, or any other irrelevant costs). You probably should use the timeit module for the actual timing.

As for the versions of Python, 3.6 and older are in "bug fix only" mode, so this will not apply to anything older than 3.7.
msg335389 - (view) Author: shawnberry (shawn_berry) * Date: 2019-02-13 07:41
Hi PSF,

Upon further review, random.choices() is O(N).  I was looping through the
values in random.choices() to make it O(N**2) in my code.

Next time I will thoroughly check a function on its own before raising any
issue.  Fyi, BIF is built-in function.  Please close this issue.

Best,
Shawn

On Tue, Feb 12, 2019 at 5:45 PM Steven D'Aprano <report@bugs.python.org>
wrote:

>
> Steven D'Aprano <steve+python@pearwood.info> added the comment:
>
> What's "BIF" mean? You use that term multiple times but I have never heard
> it before.
>
> I'm sorry, I don't understand your code (and don't have time to study it
> in detail to decipher it). It would help if you factored out your new
> implementation of choices() into a separate function.
>
> I see that your timing code it not reliable for timing individual function
> calls. You include your setup code that builds the input to choices() as
> well as the time it takes to print messages, so we can't really draw any
> reliable conclusions about the speed of choices() from your timings.
>
> To give an analogy: you start the stopwatch at home. Climb into the shower
> and wash, get dressed, have breakfast, drive to work, park the car, buy a
> coffee at the shop next door, go to your office, greet your fellow
> co-workers, and finally stop the stopwatch. And then you say that the total
> time measured was the time it took you to drive to work alone.
>
> Finally, I can't actually work out what part of your code is intended as a
> replacement for the choices() function. You should factor it out into a
> separate function, then time the two function calls:
>
>     choices(dataset, weights)
>     shawn_choices(dataset, weights)
>
> alone (without timing the setup of the datasets, or the I/O, or any other
> irrelevant costs). You probably should use the timeit module for the actual
> timing.
>
> As for the versions of Python, 3.6 and older are in "bug fix only" mode,
> so this will not apply to anything older than 3.7.
>
> ----------
> nosy: +steven.daprano
> versions:  -Python 3.4, Python 3.5, Python 3.6
>
> _______________________________________
> Python tracker <report@bugs.python.org>
> <https://bugs.python.org/issue35980>
> _______________________________________
>
msg335392 - (view) Author: Mark Dickinson (mark.dickinson) * (Python committer) Date: 2019-02-13 07:45
Closing as requested by OP.
History
Date User Action Args
2022-04-11 14:59:11adminsetgithub: 80161
2019-02-13 07:49:16SilentGhostsetresolution: not a bug
2019-02-13 07:45:19mark.dickinsonsetstatus: open -> closed

nosy: + mark.dickinson
messages: + msg335392

stage: resolved
2019-02-13 07:43:32rhettingersetassignee: rhettinger
2019-02-13 07:41:12shawn_berrysetmessages: + msg335389
2019-02-13 01:45:06steven.dapranosetnosy: + steven.daprano

messages: + msg335382
versions: - Python 3.4, Python 3.5, Python 3.6
2019-02-13 01:43:55eamanusetnosy: + eamanu
2019-02-13 01:27:22ammar2setnosy: + rhettinger, ammar2
messages: + msg335381
2019-02-13 01:08:43shawn_berrycreate