classification
Title: Support compiling with clang-cl on Windows
Type: enhancement Stage: patch review
Components: Build, Windows Versions: Python 3.8
process
Status: open Resolution:
Dependencies: Superseder:
Assigned To: Nosy List: Ethan Smith, gregory.p.smith, paul.moore, pmpp, steve.dower, tim.golden, tritium, zach.ware
Priority: normal Keywords: patch

Created on 2018-04-25 04:36 by Ethan Smith, last changed 2018-06-13 10:29 by Ethan Smith.

Pull Requests
URL Status Linked Edit
PR 6761 closed Ethan Smith, 2018-05-11 00:29
PR 7680 open Ethan Smith, 2018-06-13 10:29
Messages (10)
msg315721 - (view) Author: Ethan Smith (Ethan Smith) * Date: 2018-04-25 04:36
The clang folks have been hard at work making an ABI compatible backend to clang for Windows. Additionally they have created a cl compatible driver for clang, which can be used in lieu of cl itself. Clang-cl has been adopted to build Chrome on Windows http://blog.llvm.org/2018/03/clang-is-now-used-to-build-chrome-for.html, so I think it is stable enough to be considered for use.

Clang-cl has several advantages, such as computed goto support and many other optimizations which would make Python faster on Windows.

I would be happy to start contributing patches to further this goal, I already have a couple of small patches.
msg315723 - (view) Author: Alex Walters (tritium) * Date: 2018-04-25 11:06
Is this the same as the clang/llvm C1 that you can enable from inside Visual Studio?
msg315725 - (view) Author: Ethan Smith (Ethan Smith) * Date: 2018-04-25 11:15
No, this is provided from llvm.org. You can find it as e.g. "Clang for Windows (64-bit)" here: http://releases.llvm.org/download.html#6.0.0

The Clang/C2 in Visual Studio is very different, and deprecated anyway.
msg315726 - (view) Author: Alex Walters (tritium) * Date: 2018-04-25 12:57
When supporting platforms comes up, there's a usual list of questions, especially for windows.  I can remember two of them off the top of my head:

* Are you suggesting that CPython's build system move away from MSVC as the platform compiler for Windows?

* Are you able to provide a machine to run buildbots on?
msg315749 - (view) Author: Ethan Smith (Ethan Smith) * Date: 2018-04-25 17:29
>* Are you suggesting that CPython's build system move away from MSVC as the platform compiler for Windows?

Not immediately, I don't think we should give up on the stability that currently exists with the cl based compilation. However, I think once CPython on clang-cl becomes stable, it will be compelling to switch. Clang-cl has the benefit of backwards compatibility with existing MSVC compiled c extensions, while generating a faster interpreter (perhaps 30% faster or more!).

* Are you able to provide a machine to run buildbots on?

I'm afraid not, I am just a college student :)
msg315920 - (view) Author: Steve Dower (steve.dower) * (Python committer) Date: 2018-04-29 21:58
Feel free to start creating patches so we can get an idea of what the changes would look like. Hopefully it's not that dramatic.

Be very careful making performance claims without benchmarks to back it up, and ideally against multiple sets of hardware (MSVC is designed and tested to perform well across a range of processors, often by engineers who work for the manufacturer - intuition would suggest that an open source compiler is probably not 30% better all the time). Don't focus on the number right now, but do try to collect the justification before you expect or encourage others to do the work.

Since the ABI is compatible, there should be no problem enabling extensions to be built using this compiler (assuming someone is willing to become a distutils maintainer, as there are currently none). You don't need to ask here to create a third-party library that enables this.

I haven't heard any complaints about access to the compilers being an issue recently, so the only reasons to switch the interpreter itself would be source compatibility (essentially, the clang clib is better than our custom Win32 code) or performance. But we need a positive reason to switch support, not just the ability.
msg315924 - (view) Author: Ethan Smith (Ethan Smith) * Date: 2018-04-30 02:16
> Feel free to start creating patches so we can get an idea of what the changes would look like. Hopefully it's not that dramatic.


Okay, will do. I have a few smaller patches to start with. Clang-cl tries to be as compatible as possible with cl, so I don't expect drastic changes. I'm currently trying to figure out an include issue with timeval, but so far the patches have been few and small.


> Be very careful making performance claims without benchmarks to back it up, and ideally against multiple sets of hardware (MSVC is designed and tested to perform well across a range of processors, often by engineers who work for the manufacturer - intuition would suggest that an open source compiler is probably not 30% better all the time). Don't focus on the number right now, but do try to collect the justification before you expect or encourage others to do the work.


I did not mean to say it would make Python 30% faster in all cases, I meant "up to 30% faster". This number is based on benchmarks of CPython with and without computed goto, and my own experiments of benchmarks comparing CPython in the WSL, and native Windows CPython releases on x86_64. But your point is well taken, and I will of course benchmark Python compiled with clang-cl once I have a complete working version.


> Since the ABI is compatible, there should be no problem enabling extensions to be built using this compiler (assuming someone is willing to become a distutils maintainer, as there are currently none). You don't need to ask here to create a third-party library that enables this.


When you say "someone to become a distutils maintainer" you mean for clang-cl specifically? If that is the case, I'm happy to add support and commit to continuing to work on clang-cl support in distutils, as I expect to use it a fair amount.


> I haven't heard any complaints about access to the compilers being an issue recently, so the only reasons to switch the interpreter itself would be source compatibility (essentially, the clang clib is better than our custom Win32 code) or performance. But we need a positive reason to switch support, not just the ability.


I agree there should be a good reason to move away from the MSVC compiler. The decision to move can be re-evaluated when there is a good argument to warrant it.
msg316250 - (view) Author: Gregory P. Smith (gregory.p.smith) * (Python committer) Date: 2018-05-07 04:34
FWIW, I would _love_ to see this.  But I don't wrangle Windows myself so I can't usefully offer anything other than being happy to volunteer to run a Clang on Windows buildbot VM once there is something to actually be run.
msg317656 - (view) Author: Ethan Smith (Ethan Smith) * Date: 2018-05-25 04:23
After wrangling with some missing compiler intrinsics, I've been able to get CPython to build with an almost vanilla clang-cl!

I plan on upstreaming the patches to the LLVM project once I clean them up a bit. After that I will clean up the CPython patches and send a PR.

I also ran performance with master built on MSVC compared to my branch on clang-cl with computed-goto enabled (I wasn't sure if there are other things that may be possible to turn on, computed goto seemed an obvious win). The results are decent, but some things, like json loads, are much slower (not sure why that is). The full report:


msvc.json
=========

Performance version: 0.6.1
Report on Windows-10-10.0.17672-SP0
Number of logical CPUs: 12
Start date: 2018-05-24 03:40:09.082701
End date: 2018-05-24 04:08:57.993717

clang2goto.json
===============

Performance version: 0.6.1
Report on Windows-10-10.0.17672-SP0
Number of logical CPUs: 12
Start date: 2018-05-24 04:29:01.214005
End date: 2018-05-24 04:57:08.774299

### 2to3 ###
Mean +- std dev: 675 ms +- 31 ms -> 655 ms +- 32 ms: 1.03x faster
Significant (t=3.55)

### chameleon ###
Mean +- std dev: 19.5 ms +- 0.5 ms -> 18.1 ms +- 0.7 ms: 1.08x faster
Significant (t=13.19)

### chaos ###
Mean +- std dev: 230 ms +- 6 ms -> 209 ms +- 8 ms: 1.10x faster
Significant (t=16.39)

### crypto_pyaes ###
Mean +- std dev: 212 ms +- 8 ms -> 197 ms +- 8 ms: 1.07x faster
Significant (t=9.72)

### deltablue ###
Mean +- std dev: 15.2 ms +- 0.6 ms -> 14.2 ms +- 0.5 ms: 1.07x faster
Significant (t=10.23)

### django_template ###
Mean +- std dev: 222 ms +- 9 ms -> 210 ms +- 8 ms: 1.06x faster
Significant (t=8.10)

### dulwich_log ###
Mean +- std dev: 235 ms +- 13 ms -> 230 ms +- 12 ms: 1.02x faster
Significant (t=2.18)

### fannkuch ###
Mean +- std dev: 905 ms +- 11 ms -> 802 ms +- 15 ms: 1.13x faster
Significant (t=42.95)

### float ###
Mean +- std dev: 226 ms +- 9 ms -> 197 ms +- 8 ms: 1.15x faster
Significant (t=18.71)

### go ###
Mean +- std dev: 485 ms +- 10 ms -> 445 ms +- 8 ms: 1.09x faster
Significant (t=24.60)

### hexiom ###
Mean +- std dev: 19.9 ms +- 0.9 ms -> 18.3 ms +- 0.8 ms: 1.08x faster
Significant (t=9.51)

### html5lib ###
Mean +- std dev: 156 ms +- 9 ms -> 149 ms +- 9 ms: 1.05x faster
Significant (t=4.31)

### json_dumps ###
Mean +- std dev: 23.4 ms +- 1.2 ms -> 23.0 ms +- 1.1 ms: 1.02x faster
Not significant

### json_loads ###
Mean +- std dev: 49.3 us +- 2.2 us -> 93.2 us +- 8.7 us: 1.89x slower
Significant (t=-37.79)

### logging_format ###
Mean +- std dev: 25.3 us +- 1.3 us -> 23.4 us +- 1.2 us: 1.08x faster
Significant (t=8.48)

### logging_silent ###
Mean +- std dev: 368 ns +- 14 ns -> 340 ns +- 21 ns: 1.08x faster
Significant (t=8.69)

### logging_simple ###
Mean +- std dev: 23.1 us +- 1.4 us -> 20.6 us +- 0.9 us: 1.12x faster
Significant (t=11.66)

### mako ###
Mean +- std dev: 36.7 ms +- 1.8 ms -> 36.0 ms +- 1.7 ms: 1.02x faster
Not significant

### meteor_contest ###
Mean +- std dev: 189 ms +- 9 ms -> 175 ms +- 9 ms: 1.08x faster
Significant (t=9.09)

### nbody ###
Mean +- std dev: 274 ms +- 12 ms -> 222 ms +- 8 ms: 1.24x faster
Significant (t=28.22)

### nqueens ###
Mean +- std dev: 198 ms +- 8 ms -> 174 ms +- 8 ms: 1.14x faster
Significant (t=16.67)

### pathlib ###
Mean +- std dev: 343 ms +- 19 ms -> 338 ms +- 18 ms: 1.02x faster
Not significant

### pickle ###
Mean +- std dev: 20.9 us +- 0.8 us -> 19.9 us +- 0.5 us: 1.05x faster
Significant (t=8.91)

### pickle_dict ###
Mean +- std dev: 50.0 us +- 1.9 us -> 51.2 us +- 3.0 us: 1.02x slower
Significant (t=-2.62)

### pickle_list ###
Mean +- std dev: 7.61 us +- 0.32 us -> 7.06 us +- 0.36 us: 1.08x faster
Significant (t=8.92)

### pickle_pure_python ###
Mean +- std dev: 964 us +- 52 us -> 879 us +- 43 us: 1.10x faster
Significant (t=9.72)

### pidigits ###
Mean +- std dev: 257 ms +- 5 ms -> 254 ms +- 9 ms: 1.01x faster
Not significant

### python_startup ###
Mean +- std dev: 69.6 ms +- 8.3 ms -> 69.5 ms +- 6.3 ms: 1.00x faster
Not significant

### python_startup_no_site ###
Mean +- std dev: 57.7 ms +- 6.6 ms -> 58.2 ms +- 6.0 ms: 1.01x slower
Not significant

### raytrace ###
Mean +- std dev: 1.00 sec +- 0.02 sec -> 0.94 sec +- 0.02 sec: 1.07x faster
Significant (t=21.49)

### regex_compile ###
Mean +- std dev: 335 ms +- 5 ms -> 306 ms +- 10 ms: 1.10x faster
Significant (t=20.75)

### regex_dna ###
Mean +- std dev: 237 ms +- 7 ms -> 266 ms +- 7 ms: 1.13x slower
Significant (t=-23.71)

### regex_effbot ###
Mean +- std dev: 4.42 ms +- 0.17 ms -> 4.82 ms +- 0.20 ms: 1.09x slower
Significant (t=-12.07)

### regex_v8 ###
Mean +- std dev: 45.2 ms +- 15.5 ms -> 39.7 ms +- 2.8 ms: 1.14x faster
Significant (t=2.74)

### richards ###
Mean +- std dev: 152 ms +- 8 ms -> 142 ms +- 9 ms: 1.07x faster
Significant (t=6.19)

### scimark_fft ###
Mean +- std dev: 665 ms +- 12 ms -> 593 ms +- 12 ms: 1.12x faster
Significant (t=32.36)

### scimark_lu ###
Mean +- std dev: 327 ms +- 11 ms -> 324 ms +- 11 ms: 1.01x faster
Not significant

### scimark_monte_carlo ###
Mean +- std dev: 205 ms +- 7 ms -> 192 ms +- 8 ms: 1.07x faster
Significant (t=9.22)

### scimark_sor ###
Mean +- std dev: 386 ms +- 11 ms -> 351 ms +- 11 ms: 1.10x faster
Significant (t=18.36)

### scimark_sparse_mat_mult ###
Mean +- std dev: 8.39 ms +- 0.31 ms -> 7.19 ms +- 0.40 ms: 1.17x faster
Significant (t=18.44)

### spectral_norm ###
Mean +- std dev: 279 ms +- 8 ms -> 238 ms +- 7 ms: 1.17x faster
Significant (t=29.08)

### sqlalchemy_declarative ###
Mean +- std dev: 250 ms +- 12 ms -> 245 ms +- 10 ms: 1.02x faster
Significant (t=2.73)

### sqlalchemy_imperative ###
Mean +- std dev: 47.2 ms +- 2.4 ms -> 47.4 ms +- 2.5 ms: 1.00x slower
Not significant

### sqlite_synth ###
Mean +- std dev: 5.37 us +- 0.25 us -> 5.22 us +- 0.21 us: 1.03x faster
Significant (t=3.60)

### sympy_expand ###
Mean +- std dev: 710 ms +- 16 ms -> 671 ms +- 20 ms: 1.06x faster
Significant (t=11.92)

### sympy_integrate ###
Mean +- std dev: 31.9 ms +- 1.3 ms -> 30.5 ms +- 1.4 ms: 1.05x faster
Significant (t=5.51)

### sympy_str ###
Mean +- std dev: 313 ms +- 9 ms -> 297 ms +- 8 ms: 1.05x faster
Significant (t=10.39)

### sympy_sum ###
Mean +- std dev: 151 ms +- 6 ms -> 143 ms +- 6 ms: 1.05x faster
Significant (t=6.70)

### telco ###
Mean +- std dev: 11.2 ms +- 0.4 ms -> 10.8 ms +- 0.4 ms: 1.04x faster
Significant (t=5.45)

### unpack_sequence ###
Mean +- std dev: 87.1 ns +- 3.8 ns -> 67.3 ns +- 4.3 ns: 1.29x faster
Significant (t=26.90)

### unpickle ###
Mean +- std dev: 27.4 us +- 1.9 us -> 25.3 us +- 1.5 us: 1.08x faster
Significant (t=6.46)

### unpickle_list ###
Mean +- std dev: 6.81 us +- 0.29 us -> 6.15 us +- 0.30 us: 1.11x faster
Significant (t=12.38)

### unpickle_pure_python ###
Mean +- std dev: 740 us +- 31 us -> 696 us +- 38 us: 1.06x faster
Significant (t=6.82)

### xml_etree_generate ###
Mean +- std dev: 197 ms +- 13 ms -> 190 ms +- 11 ms: 1.04x faster
Significant (t=3.34)

### xml_etree_iterparse ###
Mean +- std dev: 177 ms +- 12 ms -> 174 ms +- 11 ms: 1.02x faster
Not significant

### xml_etree_parse ###
Mean +- std dev: 229 ms +- 13 ms -> 229 ms +- 13 ms: 1.00x faster
Not significant

### xml_etree_process ###
Mean +- std dev: 161 ms +- 11 ms -> 154 ms +- 10 ms: 1.04x faster
Significant (t=3.45)
msg318890 - (view) Author: Ethan Smith (Ethan Smith) * Date: 2018-06-07 04:55
I sent my patches to clang-cl upstream [1]. It seems they want to implement Hardware Lock Elision (which is used by some MSVC compiler intrinsics in pyatomic.h) before implementing the needed intrinsics.

I have found temporary replacements that do not elide locks, but have effectively the same functional purpose as those intrinsics, so I should have a full PR for CPython ready soon.

[1] https://reviews.llvm.org/D47672
History
Date User Action Args
2018-06-13 10:29:58Ethan Smithsetpull_requests: + pull_request7292
2018-06-07 04:55:45Ethan Smithsetmessages: + msg318890
2018-05-25 04:23:37Ethan Smithsetmessages: + msg317656
2018-05-11 00:29:12Ethan Smithsetkeywords: + patch
stage: test needed -> patch review
pull_requests: + pull_request6448
2018-05-07 04:34:33gregory.p.smithsetnosy: + gregory.p.smith
messages: + msg316250
2018-04-30 02:16:02Ethan Smithsetmessages: + msg315924
2018-04-29 21:58:15steve.dowersetmessages: + msg315920
2018-04-27 19:01:50terry.reedysetnosy: + paul.moore, tim.golden, zach.ware, steve.dower

components: + Windows
stage: test needed
2018-04-25 17:29:59Ethan Smithsetmessages: + msg315749
2018-04-25 12:57:26tritiumsetmessages: + msg315726
2018-04-25 11:15:13Ethan Smithsetmessages: + msg315725
2018-04-25 11:06:04tritiumsetnosy: + tritium
messages: + msg315723
2018-04-25 04:57:10pmppsetnosy: + pmpp
2018-04-25 04:36:44Ethan Smithcreate