This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author malin
Recipients malin, paul.moore, steve.dower, tim.golden, zach.ware
Date 2019-03-19.04:36:07
SpamBayes Score -1.0
Marked as misclassified Yes
Message-id <>
On windows, it seems 32bit builds (3.7.2/3.8.0a2) don't using SSE2 sufficiently.

I test on 3.8 branch, python38.dll only uses XMM register 28 times. The official build is the same.
After enable this option, python38.dll uses XMM register 11,704 times.

--- a/PCbuild/pythoncore.vcxproj
+++ b/PCbuild/pythoncore.vcxproj
@@ -88,6 +88,7 @@
       <AdditionalIncludeDirectories Condition="$(IncludeExternals)">$(zlibDir);%(AdditionalIncludeDirectories)</AdditionalIncludeDirectories>
       <PreprocessorDefinitions Condition="$(IncludeExternals)">_Py_HAVE_ZLIB;%(PreprocessorDefinitions)</PreprocessorDefinitions>
+      <EnableEnhancedInstructionSet Condition="'$(Configuration)|$(Platform)'=='Release|Win32'">StreamingSIMDExtensions2</EnableEnhancedInstructionSet>

x86 instruction set has only a few number of registers.
In my understanding, using XMM registers on 32bit build will brings a small speed up.
I'm not an expert of this kind knowledge, sorry if I'm wrong.
Date User Action Args
2019-03-19 04:36:08malinsetrecipients: + malin, paul.moore, tim.golden, zach.ware, steve.dower
2019-03-19 04:36:08malinsetmessageid: <>
2019-03-19 04:36:08malinlinkissue36357 messages
2019-03-19 04:36:07malincreate