Issue38477
This issue tracker has been migrated to GitHub,
and is currently read-only.
For more information,
see the GitHub FAQs in the Python's Developer Guide.
Created on 2019-10-15 01:42 by Louis Huemiller, last changed 2022-04-11 14:59 by admin. This issue is now closed.
Files | ||||
---|---|---|---|---|
File name | Uploaded | Description | Edit | |
20191005b_configuration | Louis Huemiller, 2019-10-15 01:42 | |||
20191011b_chart.svg | Louis Huemiller, 2019-10-16 04:20 |
Messages (9) | |||
---|---|---|---|
msg354682 - (view) | Author: Louis Huemiller (Louis Huemiller) | Date: 2019-10-15 01:42 | |
Have created a program that by brute force determines all reachable states of a 2-by-2 MagicCube, sometimes referred to as a 2x2 Rubick's cube. On one of my servers, which is described in the attached file 20191011b_configuration, this program takes 19520.03 seconds to execute under Python2.7.15+, while under 3.8.0rc1 it takes 21887.53 seconds. This program is taking 12.13% longer to execute with 3.8.0rc1 versus 2.7.15+. The exact versions of Python as reported by sys.version are: 2.7.15+ (default, Oct 7 2019, 17:39:04) [GCC 7.4.0] 3.8.0rc1 (default, Oct 1 2019, 21:48:24) [GCC 7.4.0] Used the following script to execute this program under Python 2.7, 3.6, 3.7, and 3.8: mkdir -p ./results echo "======== git log ========" > ./results/20191011b_configuration git log -n4 >> ./results/20191011b_configuration echo "" >> ./results/20191011b_configuration echo "======== uname ========" >> ./results/20191011b_configuration uname -a >> ./results/20191011b_configuration echo "" >> ./results/20191011b_configuration echo "======== lsb_release ========" >> ./results/20191011b_configuration lsb_release -a >> ./results/20191011b_configuration echo "" >> ./results/20191011b_configuration echo "======== lshw ========" >> ./results/20191011b_configuration sudo lshw >> ./results/20191011b_configuration make clean make ./target/permutations2x2 > \ ./results/20191011b_magiccube2x2_permutations_cxx_unordered_map 2>&1 python3.8 ./permutations2x2 > \ ./results/20191011b_magiccube2x2_permutations_python3.8.0rc1 2>&1 python3.7 ./permutations2x2 > \ ./results/20191011b_magiccube2x2_permutations_python3.7 2>&1 python3.6 ./permutations2x2 > \ ./results/20191011b_magiccube2x2_permutations_python3.6 2>&1 python2.7 ./permutations2x2 > ./results/20191011b_magiccube2x2_permutations_python2.7 2>&1 Doing egrep "^# Total_Time:" on each of the output files, shows that the various executions took the following amount of time: cxx_unordered_map:# Total_Time: 1098.25 python2.7:# Total_Time: 19520.03 python3.6:# Total_Time: 19562.28 python3.7:# Total_Time: 17012.67 python3.8.0rc1:# Total_Time: 21887.53 Under Python2.7 and Python3.6 the program ran in approximately the same amount of time, while under Python3.7 it ran 12.8% faster than execution with Python2.7. Finally, the python3.8.0rc1 execution is noticeably slower than all the other executions, at 12.1% lower than Python2.7 and 28.6% slower than Python3.7. The source for the magic_cube2x2 program is available under gist.github.com at: https://gist.github.com/lhuemill/f1273291e5f5e85e4b42e5c7614e60ef This program creates a fairly large dictionary of 88.179840M entries. Each entry has a key of a string of length 29 and a value string of with a maximum length of 22 characters. A quick back-of-the-envelope calculation shows that this dictionary would use approximately 9GB of memory, but for some reason, all of the Python executions use approximately 22.6GB of memory. Even the C++ version is using 20.2GB when the permutations are stored in an unordered_map<string, string> and 20.55GB when stored in a map<string, string>. Quite surprising that the C++ is using over twice the expected amount of memory. CAUTION: When executing this program with less than the needed amount of physical memory, be careful that swap is not backed by a flash device. Doing so will likely cause a significant amount of I/O to the flash device and likely quickly wear it out. |
|||
msg354767 - (view) | Author: Louis Huemiller (Louis Huemiller) | Date: 2019-10-16 04:20 | |
Attached is a chart, which shows the results from the runs mentioned in the initial post. This chart was produced through the use of: ../permutations2x2_chart_results \ -f ./20191011b_chart.svg \ C++unordered_map:./20191011b_magiccube2x2_permutations_cxx_unordered_map \ Python2.7:./20191011b_magiccube2x2_permutations_python2.7 \ Python3.6:./20191011b_magiccube2x2_permutations_python3.6 \ Python3.7:./20191011b_magiccube2x2_permutations_python3.7 \ Python3.8.0rc1:./20191011b_magiccube2x2_permutations_python3.8.0rc1 The permutations2x2_chart_results program is present in the gist.github repository mentioned in the initial post. |
|||
msg354768 - (view) | Author: Louis Huemiller (Louis Huemiller) | Date: 2019-10-16 04:32 | |
Each of the Python runs mentioned in the initial post took around 4 hours to execute. Although not as accurate, the issue can be demonstrated in less than 5 minutes through the use of: # If needed use the following to install Python3.8.0rc1 $ apt-get update $ sudo apt-get upgrade $ sudo add-apt-repository ppa:deadsnakes/ppa $ sudo apt update $ sudo apt install python3.8 # Use Python3.7 to produce baseline. $ python3.7 permutations2x2 --max_depth 6 | egrep "^# Total_Time" # Perform Python3.8.0rc1 execution python3.8 permutations2x2 --max_depth 6 | egrep "^# Total_Time" When I did this on my server the Python3.7 execution produced: # Total_Time: 62.46 While the Python3.8.0rc1 execution produced: # Total_Time: 79.72 By running the program with "--max_depth 6" specified, the program only traversed through permutations that are up to 6 moves from the initial position. Although much less than running through an unlimited number of moves, the above run still showed the Python3.8.0rc1 run as 26.7% ((79.72 - 62.46) / 62.46) slower than the Python3.7 execution. Which is close to the 28.6% reduction in performance seen from runs with unlimited depth of moves. |
|||
msg354772 - (view) | Author: Inada Naoki (methane) * | Date: 2019-10-16 06:34 | |
I can not confirm performance regression in 3.8.0. $ time ~/pyenv/versions/3.7.3/bin/python permutations2x2 --max_depth 6 | egrep "^# Total_Time" ; time ~/pyenv/versions/3.8.0/bin/python permutations2x2 --max_depth 6 | egrep "^# Total_Time" # Total_Time: 50.24 real 0m50.665s user 0m49.729s sys 0m4.354s # Total_Time: 47.97 real 0m48.502s user 0m47.380s sys 0m4.506s I suspect that this is a binary specific issue. Would you try "perf record python ..."? |
|||
msg354773 - (view) | Author: Inada Naoki (methane) * | Date: 2019-10-16 06:35 | |
Ah, my bad. I meant `perf stat python ...`. |
|||
msg354774 - (view) | Author: Inada Naoki (methane) * | Date: 2019-10-16 06:47 | |
Deadsneak Python 3.8: >>> sysconfig.get_config_var('CFLAGS') '-Wno-unused-result -Wsign-compare -DNDEBUG -g -fwrapv -O2 -Wall -g -fstack-protector-strong -Wformat -Werror=format-security' Disco /usr/bin/python3.7: >>> sysconfig.get_config_var('CFLAGS') '-Wno-unused-result -Wsign-compare -DNDEBUG -g -fwrapv -O2 -Wall -g -fstack-protector-strong -Wformat -Werror=format-security -g -flto -fuse-linker-plugin -ffat-lto-objects' So the performance difference comes from optimization options. |
|||
msg354777 - (view) | Author: Gregory P. Smith (gregory.p.smith) * | Date: 2019-10-16 07:07 | |
If I compile my own 3.7.5 and 3.8.0 both with --enable-optimizations, I am unable to reproduce this. greg@zoonaut:~/sandbox/python/cpython/lh$ ../b37/python permutations2x2 --max_depth 6 | egrep "^# Total_Time" # Total_Time: 75.92 greg@zoonaut:~/sandbox/python/cpython/lh$ ../b38/python permutations2x2 --max_depth 6 | egrep "^# Total_Time" # Total_Time: 84.53 greg@zoonaut:~/sandbox/python/cpython/lh$ ../b37/python permutations2x2 --max_depth 6 | egrep "^# Total_Time" # Total_Time: 79.90 greg@zoonaut:~/sandbox/python/cpython/lh$ ../b38/python permutations2x2 --max_depth 6 | egrep "^# Total_Time" # Total_Time: 84.19 greg@zoonaut:~/sandbox/python/cpython/lh$ ../b37/python permutations2x2 --max_depth 6 | egrep "^# Total_Time" # Total_Time: 86.49 greg@zoonaut:~/sandbox/python/cpython/lh$ ../b37/python permutations2x2 --max_depth 6 | egrep "^# Total_Time" # Total_Time: 83.95 greg@zoonaut:~/sandbox/python/cpython/lh$ ../b38/python permutations2x2 --max_depth 6 | egrep "^# Total_Time" # Total_Time: 78.85 greg@zoonaut:~/sandbox/python/cpython/lh$ ../b38/python permutations2x2 --max_depth 6 | egrep "^# Total_Time" # Total_Time: 78.49 which if you add up the 3.7 and 3.8 times over their four runs each, they both come out to ~81 seconds average. So I don't see a regression. I'd say that benchmark is unstable. Also, making sure you're comparing identically prepared builds of cpython rather than something made by different distro package owners. |
|||
msg354828 - (view) | Author: Louis Huemiller (Louis Huemiller) | Date: 2019-10-17 02:29 | |
Thank you for looking into the potential issue. Instead of getting Python3.8 from ppa:deadsnakes/ppa, I downloaded the source from: https://www.python.org/ftp/python/3.8.0/Python-3.8.0rc1.tgz Then did the following to build and install it: $ cd /usr/src $ sudo tar xzf ~/Python-3.8.0rc1.tgz $ cd Python-3.8.0rc1/ $ sudo ./configure --enable-optimizations $ sudo apt-get install zlib1g-dev $ sudo -H make altinstall Then re-running the magiccube2x2 permutations with a max_depth of 6 I obtained: $ python3.7 permutations2x2 --max_depth 6 | egrep "^# Total_Time" # Total_Time: 61.90 $ python3.8 permutations2x2 --max_depth 6 | egrep "^# Total_Time" # Total_Time: 82.45 Unfortunately, this still shows the Python3.8 execution as significantly slower, in this case 33.2% slower. Not sure of the cause, but perhaps I wasn't completely successful at removing the prior installation from ppa:deadsnakes/ppa. I removed it through the use of: $ sudo apt remove python3.8 $ sudo add-apt-repository -r ppa:deadsnakes/ppa $ shutdown -r now Next step I will completly re-install this server and then only install Python3.8 from https://www.python.org/ftp/python/3.8.0/Python-3.8.0rc1.tgz. |
|||
msg354832 - (view) | Author: Louis Huemiller (Louis Huemiller) | Date: 2019-10-17 06:33 | |
Reinstalled the server I'd been using for this issue, with Ubuntu 18.04.3 LTS Server amd64. This installation already came with Python3.6.8. Downloaded and installed Python3.7.3 and 3.8.0 from: https://www.python.org/ftp/python/3.7.3/Python-3.7.3.tgz https://www.python.org/ftp/python/3.8.0/Python-3.8.0.tgz The installation process was somewhat complicated, with several to be discovered prerequisites. Used "apt install" to install the following needed dependencies: build-essential checkinstall libreadline-gplv2-dev libncursesw5-dev libssl-dev libsqlite3-dev tk-dev libgdbm-dev libc6-dev libbz2-dev zlib1g-dev libffi-dev Then was able to do the following to build and install the downloaded Python3.7.3 and 3.8.0: $ sudo ./configure --enable-optimizations $ sudo -H make altinstall Then downloaded a clone of my magiccube2x2 repository and obtained the following results: $ python3.6 ./permutations2x2 --max_depth 6 | egrep "^# Total_Time:" # Total_Time: 69.02 $ python3.7 ./permutations2x2 --max_depth 6 | egrep "^# Total_Time:" # Total_Time: 84.52 $ python3.8 ./permutations2x2 --max_depth 6 | egrep "^# Total_Time:" # Total_Time: 83.14 In this case the runtimes under both of the Python versions that I downloaded the source for and built are significantly slower than the run with python3.6, which came already as part of Ubuntu 18.04.3. Instead of using the version of python3.7.3 that I built, was able to do: $ sudo apt install python3.7 to get one that had already been built for Ubuntu. Doing this allowed the following result: $ /usr/bin/python3.7 ./permutations2x2 --max_depth 6 \ | egrep "^# Total_Time:" # Total_Time: 62.39 Which is better than even the Python3.6 result. Unfortunately, there is not currently a pre-built Ubuntu package for Python3.8 and the one from ppa:deadsnakes/ppa, I already know has the performance issue. I did some searching on wiki.python.org and I was unable to find any instructions for how to build python from the source. Perhaps the instructions are already there but I was unable to find them. Perhaps this issue should be re-opened as a documentation error. I could really use instructions on how to built a performance version of Python and I suspect several others could also use this. For example, I'd like to pass that information to the maintainers of the deadsnakes repository. |
History | |||
---|---|---|---|
Date | User | Action | Args |
2022-04-11 14:59:21 | admin | set | github: 82658 |
2019-10-17 06:33:30 | Louis Huemiller | set | messages: + msg354832 |
2019-10-17 02:29:26 | Louis Huemiller | set | messages: + msg354828 |
2019-10-16 07:14:08 | gregory.p.smith | set | status: open -> closed resolution: not a bug stage: resolved |
2019-10-16 07:07:25 | gregory.p.smith | set | status: closed -> open nosy: + gregory.p.smith, - methane messages: + msg354777 resolution: not a bug -> (no value) stage: resolved -> (no value) |
2019-10-16 06:47:31 | methane | set | status: open -> closed resolution: not a bug messages: + msg354774 stage: resolved |
2019-10-16 06:35:21 | methane | set | messages: + msg354773 |
2019-10-16 06:34:40 | methane | set | messages: + msg354772 |
2019-10-16 05:47:03 | methane | set | nosy:
+ methane |
2019-10-16 04:32:13 | Louis Huemiller | set | messages: + msg354768 |
2019-10-16 04:20:06 | Louis Huemiller | set | files:
+ 20191011b_chart.svg messages: + msg354767 |
2019-10-15 01:42:50 | Louis Huemiller | create |