Message338317
On windows, it seems 32bit builds (3.7.2/3.8.0a2) don't using SSE2 sufficiently.
I test on 3.8 branch, python38.dll only uses XMM register 28 times. The official build is the same.
After enable this option, python38.dll uses XMM register 11,704 times.
--- a/PCbuild/pythoncore.vcxproj
+++ b/PCbuild/pythoncore.vcxproj
@@ -88,6 +88,7 @@
<AdditionalIncludeDirectories Condition="$(IncludeExternals)">$(zlibDir);%(AdditionalIncludeDirectories)</AdditionalIncludeDirectories>
<PreprocessorDefinitions>_USRDLL;Py_BUILD_CORE;Py_ENABLE_SHARED;MS_DLL_ID="$(SysWinVer)";%(PreprocessorDefinitions)</PreprocessorDefinitions>
<PreprocessorDefinitions Condition="$(IncludeExternals)">_Py_HAVE_ZLIB;%(PreprocessorDefinitions)</PreprocessorDefinitions>
+ <EnableEnhancedInstructionSet Condition="'$(Configuration)|$(Platform)'=='Release|Win32'">StreamingSIMDExtensions2</EnableEnhancedInstructionSet>
</ClCompile>
<Link>
<AdditionalDependencies>version.lib;shlwapi.lib;ws2_32.lib;%(AdditionalDependencies)</AdditionalDependencies>
x86 instruction set has only a few number of registers.
In my understanding, using XMM registers on 32bit build will brings a small speed up.
I'm not an expert of this kind knowledge, sorry if I'm wrong. |
|
Date |
User |
Action |
Args |
2019-03-19 04:36:08 | malin | set | recipients:
+ malin, paul.moore, tim.golden, zach.ware, steve.dower |
2019-03-19 04:36:08 | malin | set | messageid: <1552970168.14.0.453401262015.issue36357@roundup.psfhosted.org> |
2019-03-19 04:36:08 | malin | link | issue36357 messages |
2019-03-19 04:36:07 | malin | create | |
|